Intel® Gaudi® Hands-on Workshop | A Recap of the Gaudi Workshop with SqueezeBits x Lablup
As AI environments change quickly and the range of hardware options keeps expanding, many companies keep asking the same question.
"How can we run AI models faster and more efficiently?"
SqueezeBits keeps hosting hands-on sessions where people can test different AI hardware firsthand.

The First Hands-on Workshop with Intel® Gaudi®
This workshop gave participants time to optimize and run several practical AI models on Intel® Gaudi® inside Lablup's Backend.AI environment. Because this was the first hands-on workshop built around Intel® Gaudi®, both attendees and the organizers came in with high expectations.
.jpg%3FspaceId%3D23f4b38d-2def-440d-b962-b485f3d7fb97%3Ftable%3Dblock&w=1920&q=85)
Even though the workshop took place on a weekday afternoon, many people joined. We also prepared light refreshments and snacks so attendees could stay focused and feel comfortable throughout the session.

Workshop Agenda: From Concepts to Practice
The workshop opened with Intel's Minseok Kim, Industry Technical Sales Specialist, introducing a new standard for AI infrastructure through Intel® Gaudi®.

Lablup's GTM Lead, Jongmin Kim, then introduced Backend.AI, which served as the core platform that reliably allocated high-performance Gaudi® resources to each participant. Its "Start from URL" feature lets attendees type in a GitHub address. Backend.AI then set up the lab environment automatically, so each person could start working right away without a complex setup.

The next session introduced the familiar developer experience provided by the Intel Gaudi Software Stack. In short, the stack acts like a smart compiler that organizes and optimizes complex AI models for Gaudi® hardware. Developers could feel the performance gains right away with almost no changes to their existing PyTorch code.
Participants also saw that they could choose Eager Mode for immediate execution or Lazy Mode for an extra round of optimization before execution. Strong hardware, user-friendly software, and Backend.AI's resource management came together to create a smooth lab environment.
Hands-on Labs with Specialist Engineers in Each Session
After that, participants joined hands-on labs with specialist engineers for each session based on the preregistration survey. Because the topics reflected technologies that drew strong interest in production, the energy in the room kept building as the day went on.
This hands-on program focused on giving participants direct experience with AI model compression and optimization techniques that maximize Intel® Gaudi® hardware performance, led by SqueezeBits.

Diffusion: Making Image Generation Models Faster
In the Diffusion session, participants directly tested how image generation models run in an Intel® Gaudi® environment.
The lab used the Qwen-Image model with Hugging Face's Diffusers library. Participants added only a few settings to code they already used on GPUs and ran it on Gaudi®. The session made one point clear: the execution environment can change with just a few lines through the Optimum Habana interface, while the original GPU code stays almost intact.
Because image generation produces an immediate visual result, anyone in the room could quickly understand that hardware optimization leads to real performance differences.
The session went beyond simply getting the model to run. It also covered the fine-grained optimization steps that maximize inference performance. Participants saw concrete methods for speeding up image generation with hardware acceleration on Gaudi® and experienced both easy code migration and optimized inference performance.

Fine-Tuning: Training Only What You Need with PEFT (Parameter-Efficient Fine-Tuning)
In the Fine-Tuning session, participants focused on LoRA training with the Qwen3-0.6B model on top of DeepSpeed. Teams spend a major time and cost when they train a large language model (LLM) from scratch. In this session, the PEFT approach kept the core model intact and trained only the parts that mattered, which reduced resource use.
Participants also ran lightweight LoRA training themselves in Intel's Gaudi®-optimized DeepSpeed environment. They saw that practical fine-tuning is possible even with limited resources.
The session also covered GraLoRA, which improves on standard LoRA. Standard LoRA applies the same rank to every layer, but GraLoRA is a PEFT method designed to adjust training intensity by layer or module. That makes training more efficient with fewer parameters and less memory for the same performance target.
Through this session, participants saw firsthand that they could use LoRA and newer PEFT methods, such as GraLoRA, to tune existing models for specific goals. They could also apply those methods effectively on Intel® Gaudi®.

vLLM: Serving Large Language Models (LLMs) Efficiently
The vLLM session drew the most attention. It focused on more than simply running an LLM. Participants learned how vLLM, a representative serving framework, works on Intel® Gaudi® and practiced with it directly. They also confirmed that they could use a serving flow in Gaudi® that stays close to the familiar vLLM setup. That means a hardware change does not force a major change in serving methods. Most importantly, they could try the optimization techniques implemented specifically for Intel® Gaudi®.
The explanation and lab around quantization also stood out. Quantization is a key optimization technique for efficient LLM serving. In the Intel® Gaudi® vLLM environment, Intel Neural Compressor, Intel's own compression library, is integrated for quantization. Participants could apply quantization to the Qwen3-8B model directly inside the vLLM framework and confirm immediate performance gains through hands-on benchmarks.

Participant Feedback and Meaningful Outcomes
Survey results showed high satisfaction with the workshop overall. Many participants especially responded well to the hands-on format.
"I liked how clearly the team explained the example code in detail."
"It was my first time using an NPU, and it was easier than I expected."
"I liked being able to work with Gaudi, which is not easy to access directly, in the Backend.AI environment."
Participants worked directly with specialist engineers on the latest technology, and that gave the workshop high marks. Learning through actual labs instead of one-way theory sessions also raised satisfaction.
Many people also said it was meaningful to experience several NPU environments firsthand. That feedback showed that the workshop's purpose came across clearly.
Participants also said they wanted more hands-on workshops where they could work directly with different hardware environments. Through this workshop, we could once again feel the growing interest in AI model optimization and hardware utilization in Korea.

We Will Keep Supporting AI! You Can Experience Firsthand!
This workshop brought together Backend.AI's stable platform, Intel® Gaudi®'s high-performance AI hardware, and SqueezeBits' model optimization expertise. As AI environments keep changing quickly, opportunities to test different hardware firsthand and apply it to real work will matter even more.
SqueezeBits plans to keep running hands-on events where engineers can test new AI infrastructure in practical settings.
For updates on upcoming SqueezeBits events, follow us on LinkedIn. It's the fastest way to get news about upcoming workshops and technical events! 😄