2026 Efficient AI Offline Meetup
Wrap up 8 weeks of online studies and take a look at how SqueezeBits makes an effort to maintain the AI compression community to expand!
May 28, 2026
Contents
An Eight-Week Online Study Journey🔥 Offline Meetup: Bringing Online Energy to the VenueQualcomm’s Vision for the On-Device AI EraModular’s Next-Generation AI Stack: Mojo and MAXHow to Build Robots That Move More Smoothly and IntelligentlyLRAgent: Lighter and Faster Multi-Agent AI SystemsFrom AI Model Core to System Layer: Full-Stack Optimization in Practice👀 Networking and Demo Sessions You Could See and Experience Firsthand🚀 The Efficient AI Community Keeps GrowingHello!
I'm Goeun Kang, Marketing Manager at SqueezeBits. 🙌
On a warm weekend in May, we hosted the Efficient AI Offline Meetup again this year, following last year’s event. This also marked the seventh edition of the study series.
SqueezeBits has run an annual online study series under the theme of “Efficient AI” to grow the AI optimization community, and we have wrapped up each journey with an offline meetup.
New papers come out every day, but digesting all of them alone is harder than it sounds. Everyone also has different criteria for what matters most, so the time to exchange views and ask questions through the online study was especially valuable. Maybe that is why so many people joined us despite the late evening schedule.
An Eight-Week Online Study Journey
As soon as we opened registration for the online study on LinkedIn, applications poured in. More than 200 people joined us for this eight-week journey. From March to April, we ran eight sessions at 9 PM every Tuesday, with one speaker each week for a total of eight speakers.
This cycle was run a little differently. In the past, we would dive deep into one paper, but this time we organized each week around one theme and covered multiple papers broadly. Because AI advances so quickly, SqueezeBits engineers proposed looking at the overall flow of a technical topic instead of stopping at individual paper reviews. That is why four SqueezeBits engineers joined as speakers this time. 😎

One of the most memorable aspects of this cycle was the abundance of questions that followed each talk. Even in an online format, thoughtful inquiries continued to flow after the speakers concluded, and discussions often extended well beyond the planned one-hour time frame. Within the overarching theme of AI optimization, we explored various topics, including large language models (LLMs), multimodal systems, diffusion models, and world models. By the end of all eight sessions, we had developed an incredibly rich body of discussion.
A common piece of feedback at the end of the online study was that people appreciated getting a broad view of current optimization trends and technologies. Many participants also said that covering diverse papers and multiple optimization axes helped expand their overall perspective.
After each session, we made recordings available for about a week so people could continue studying. Almost every replay passed 100 views, which showed us how strongly people were engaged and made us want to bring that online momentum into an offline event as soon as possible.
🔥 Offline Meetup: Bringing Online Energy to the Venue
In addition to those we had already met online, we invited many people who couldn’t join the study but were interested in AI optimization, and we held the offline meetup on Saturday, May 16. Despite the sunny weekend forecast, many people still came to attend.
Since this meetup marked the conclusion of the online study, SqueezeBits CTO Taesu Kim opened by briefly walking through what we had covered over the past eight weeks.

If the online program focused on learning the latest theories and papers, the offline meetup focused on real stories about how Efficient AI is implemented and used in global companies, academia, and production teams.
This meetup was further strengthened by strong support from Qualcomm and Modular. In particular, Qualcomm Korea provided snacks and coffee for all attendees and made the venue even more welcoming. 🍩☕
Qualcomm’s Vision for the On-Device AI Era
Kyoung Min Cho, at Qualcomm Korea, shared Qualcomm’s hardware and software strategy for the on-device AI era. As AI expands from cloud-first deployments to on-device and edge environments, it has become clear that optimization technologies are becoming even more important for running AI efficiently within limited hardware resources.
What stood out most was Qualcomm’s “Dragonwing” infrastructure, which enables organizations to run large AI models securely in-house while addressing security and cost concerns, along with a platform that spans the full path from data collection to deployment. We also saw compelling examples of robots running on onboard AI, and Qualcomm’s support for local universities with development boards showed real commitment to growing the next generation of AI developers.

Modular’s Next-Generation AI Stack: Mojo and MAX
Judy Heflin from Modular, a U.S.-based AI unicorn, introduced Mojo and MAX at this Efficient AI Meetup as a next-generation AI development environment.
Across the AI industry, a wide range of accelerators and NPUs are emerging quickly beyond NVIDIA GPUs, so the need for hardware-flexible development environments is growing. Existing CUDA-based development can deliver high performance, but code complexity and maintainability remain major challenges.
To address this, Modular introduced Mojo, a programming language that lets teams build high-performance AI systems with syntax familiar to Python, and MAX, a framework that supports optimized AI inference across diverse hardware environments.
A major highlight of MAX was its integrated support within a single open-source ecosystem, from LLM serving to model optimization and even GPU kernel development. With support for a broad range of AI hardware beyond NVIDIA GPUs, it clearly reflects a future where AI development becomes more open and flexible.

How to Build Robots That Move More Smoothly and Intelligently
After the sponsor intros, the main speaker sessions began.
First, Professor Eunhyeok Park of POSTECH gave an overview of research accepted to NeurIPS 2025. In robotics today, learning by observing and imitating human behavior is widely used. The challenge is that even tiny errors in real environments can make robots behave erratically.
Professor Park proposed a method that allows robots to recognize environmental changes in real time and flexibly revise action plans accordingly. It was especially impressive that this could be applied directly at inference time, without additional retraining.

LRAgent: Lighter and Faster Multi-Agent AI Systems
Next, Hyesung Jeon from Seoul National University introduced LRAgent, a new method for addressing inefficiency when multiple AI agents run simultaneously.
Recent AI services are no longer powered by a single assistant. Instead, they rely on multiple assistants that collaborate, such as planning, search, and review agents. As the number of assistants increases, memory usage grows, and systems slow down.
This research focused on the KV cache structure that AI systems use to retain prior context. The team observed that different agents often store overlapping information, then proposed LRAgent: share overlapping memory while managing only distinct parts separately.
The result was up to a 2x speedup without any additional model retraining, offering a meaningful glimpse into the future of cost-efficient AI services. As agent-based services become more complex, this session showed clear potential for running more agents stably with fewer resources.

From AI Model Core to System Layer: Full-Stack Optimization in Practice
The final talk was delivered by Sungmin Lee from MOTIF, who introduced full-stack optimization techniques for training generative AI models faster and more efficiently.
As AI models become larger and more complex, GPU cost and memory consumption have become just as critical as raw model quality. To address this, MOTIF shared cases that optimized the entire stack, from model architecture to system-level execution.
They reduced memory usage with a proprietary computation structure (GDLA) that separates important signals from unnecessary ones, and cleanly resolved bottlenecks caused by data movement. It showed that true capability is not just making models bigger, but operating them efficiently.

👀 Networking and Demo Sessions You Could See and Experience Firsthand
Because every session, from sponsor segments to technical talks, was packed with practical content, attendees actively asked questions throughout. People did not stop at listening to presentations. They carried out deep discussions based on real production concerns and hands-on experience. Even after sessions ended, speakers and attendees naturally continued networking and sharing insights.
The networking time also included a variety of demos. Alongside demo videos for SqueezeBits’ Physical AI synthetic data augmentation platform, RoBoost, and Yetter, which accelerates generative AI image and video creation, attendees also explored diverse on-device AI solutions implemented on Qualcomm chips. It was exciting to see technologies that often stay in papers come alive on actual devices, from AI agents running autonomously on mini computers and boards to AI-powered home hubs.



🚀 The Efficient AI Community Keeps Growing
This community exists because so many people keep showing up, sharing perspectives, and engaging with us. Among them, stories from participants who experienced both the online study and the offline meetup made the value of this community especially clear.
A reflection from Hyochan Chong of Samsung Research, who spoke in the online study and joined the offline meetup, captured the meaning of this event particularly well.
“Preparing for the online study talk gave me a meaningful chance to reorganize the AI compression techniques I had studied into a larger context. At the offline meetup, it was valuable to hear how optimization methods I had long been interested in are applied in real products and production environments. Most of all, I loved being able to meet and exchange ideas directly with people facing similar challenges.”
Running AI efficiently and delivering reliable AI services is a shared challenge across the industry. As technology moves beyond labs into real products used in everyday life, we have to overcome practical constraints such as speed, cost, and power. This is exactly why SqueezeBits focuses on AI compression and optimization and why we are committed to growing this ecosystem together.
SqueezeBits will keep running the Efficient AI study and meetup programs so we can continue sharing these questions and connect the latest trends and real-world use cases more broadly.
Missed this one? Follow SqueezeBits on LinkedIn, where we post updates first! 😊 We look forward to talking about the future of AI with even more people.
Finally, thank you once again for joining this Efficient AI online study and Offline Meetup! 🙌
Share article