[Efficient AI Study] AI Model Compression Community Study and Meetup
From May through July, the SqueezeBits team ran the Efficient AI online study and hosted an offline meetup in late July. In this post, we look back on that journey in AI model compression and share reflections from the people who joined us.
How the Efficient AI Study Ran
AI models evolve almost daily, and new compression techniques keep arriving. But there are not many chances for people in the field to gather and have deeper conversations. SqueezeBits has hosted the Efficient AI Study every year for people interested in AI optimization to build a sustainable AI community. Through the online study and the offline event, we revisited the topics from the study in person and made time to network.
The Efficient AI Study reached its sixth edition this year and ran every Thursday at 8 p.m. for two months. Even though the sessions took place late in the evening after work, more than 170 people joined us from many different backgrounds, from college students to office workers to industry experts. Our survey showed that the top reason for joining was keeping up with the latest trends at 49.7%, and the topic of greatest interest was large language model (LLM) compression at 46.9%.
This year's study featured eight speakers and paper reviews on a wide range of topics, including Generative Multimodal Models and Efficient Generative LLM Inference. The speakers quickly broke down recent research and explained it from their own perspectives.
(When we recruit study members, we also invite paper review presenters. We do not place tight limits on paper topics as long as they fall under AI model compression, but we especially encourage people who want to dig into recent research and technologies. 🤓)
What stayed with us most while running the study was seeing participants stay until the end of every session. We found ourselves thinking, "They are staying up late because they want to learn just a little more," and that dedication earned our respect. It also motivated us to keep growing.
We also asked Dahoon Park, a PhD researcher at Korea University, who gave the Week 3 presentation in this study, to share his experience. (We are sharing a summarized version here because of space.)
Q. How did you first join the Efficient AI Study? 🤓
It started with a recommendation from my advisor. My advisor saw a promotional post on LinkedIn and said, "There's a quantization study. Why don't you check it out?" I had already wanted to keep studying quantization, so I joined the Efficient AI Study.
Q. You presented several times. That takes courage. What made you volunteer?
At first, I mainly wanted to introduce people to paper topics that do not get much attention. Now I care more about organizing my thoughts again by explaining them in a way people can easily understand. Preparing a presentation helps me understand both the code and the paper more deeply, and new ideas sometimes come up when I shape the flow of the talk, so I try to present whenever I get the chance.
Q. Did the Efficient AI Study help with your research at school?
Our lab focuses mainly on hardware, so we often miss software research trends. Even when strong papers on model compression come out, we sometimes discover them too late. After joining the study, I was able to keep up with model compression trends on my own. The papers are difficult and complex, so it is hard to study them alone, but the presenters explain them clearly. That helps me understand useful research step by step.
My first paper focused on the Microscailing (MX) format. At the Efficient AI meetup, I attended a talk by Professor Eunhyuk Park of POSTECH, and that led me to merge OWQ (Outlier-Aware Weight Quantization) into the MX format for my first paper. Without the Efficient AI Study, my first paper might have taken longer to come out.
Networking Through the Efficient AI Offline Meetup
On July 26, 2025, more than 120 people gathered in the seminar room in the basement of Dreamplus Gangnam despite temperatures above 36 degrees Celsius. We hosted the Efficient AI offline meetup there. People pushed through the midday summer heat on a Saturday to continue the conversations from the online study, face-to-face, and to learn the latest research trends in AI model compression. We truly felt the burning interest in AI model compression. 🔥
This event consisted of three presentations and a networking session.
Professor Jungwook Choi of Hanyang University's Department of Electronic Engineering introduced the latest research trends in quantization for large language models (LLMs). PhD researcher Jiwon Song of Seoul National University's VLSI LAB presented a paper on Efficient LRM Inference, a model designed for complex reasoning tasks. Finally, Dr. Hongseok Kim, Chief Software Architect at Rebellions, shared hardware optimization strategies for LLMs from the perspective of a Korean NPU startup.
Another highlight of this meetup was Qualcomm Korea's AI Hub demo booth.
The booth stayed busy with attendees interested in edge AI, and Qualcomm Korea, the event's official sponsor, offered snacks and coffee to everyone who joined us. That made the meetup feel even livelier. 🍩☕
We also wanted to capture firsthand reactions from people who attended this Efficient AI offline meetup. Seungwoo Son, who works on Galaxy AI model compression at Samsung Research, and Eunju Jeon, who researches LLM serving in the Cloud Research Team at Samsung SDS, shared their thoughts with us through an interview and email. (We are sharing a summarized version here because of space.)
Q. How did the Efficient AI offline meetup feel to you?
Seungwoo Son: I really enjoyed it. I liked the range of topics from the speakers, and I found it especially useful to hear about areas I do not usually encounter. At work, I usually go deep into a limited area, so it felt refreshing to survey many topics broadly in one place.
Eunju Jeon: I liked being able to hear directly from speakers who are well known in the industry. The latest research from the study may not apply to day-to-day work right away, but it can help when I choose my next research topic. The meetup also let me talk freely with people in the industry and hear different perspectives on the same topic.
Q. How does an offline meetup help people who work in the field?
Seungwoo Son: Model compression is such a heavily studied field that I sometimes wonder whether there is much left to discover. Even so, new methods and approaches keep appearing, and simply following that flow makes me feel that I am still growing. This was my first offline meetup, and I was glad that I could quickly understand the latest research directions in one place.
Q. Finally, is there anything you would like to say to the SqueezeBits team? 😺
Eunju Jeon: It is not easy to keep up with the latest model compression technologies from other companies. Even when companies present papers, it is hard to find them if those papers are not broadly promoted. I hope SqueezeBits can create more opportunities for companies across the industry to gather and share their technical work.
I have been impressed by how consistently this study has continued over the years. I hope the Efficient AI Study keeps growing into a flagship event that helps lead industry trends.
Because so many people showed interest, we hope to see you again in future study sessions and at the next meetup for even deeper conversations. Thanks to everyone who joined us for this study. See you soon! 🙌
This post was written by Sangmin, who led the study operations.
💡The Efficient AI Study continues. Check the channels below for the latest updates!
We are looking for people who want to build the AI community with SqueezeBits. View the job posting