Taesu Kim - SqueezeBits

Taesu Kim

[Intel Gaudi] #6. GEMM, Attention, vLLM on Gaudi

[Intel Gaudi] #6. GEMM, Attention, vLLM on Gaudi

Explore how Intel’s new Gaudi-3 compares to Gaudi-2, NVIDIA A100, and H100. We analyze real-world GEMM efficiency, attention performance, and LLM serving results to uncover what truly matters for AI inference and training workloads.

[Intel Gaudi] #5. FLUX.1 on Gaudi-2

[Intel Gaudi] #5. FLUX.1 on Gaudi-2

This article discusses inference efficiency when running the FLUX.1 models on Intel Gaudi-2 hardware.

TechIntel Gaudi

The Rise and Fall of ONNX (feat. PyTorch 2.0)

The Rise and Fall of ONNX (feat. PyTorch 2.0)

This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.

[Intel Gaudi] #3. Performance Evaluation with SynapseAI v1.19

[Intel Gaudi] #3. Performance Evaluation with SynapseAI v1.19

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

TechIntel Gaudi

[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching

[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching

This article provides a comparative analysis of automatic prefix caching.

TechvLLM vs TRT LLM

[Intel Gaudi] #2. Graph Compiler and Overall Performance Evaluation

[Intel Gaudi] #2. Graph Compiler and Overall Performance Evaluation

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

TechIntel Gaudi

[Intel Gaudi] #1. Introduction

[Intel Gaudi] #1. Introduction

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

TechIntel Gaudi

SqueezeBits

RSS·Powered by Inblog