Eunik Park - SqueezeBits

Eunik Park

Guided Decoding Performance on vLLM and SGLang

Guided Decoding Performance on vLLM and SGLang

The guide to LLM guided decoding! This deep-dive benchmark compares XGrammar and LLGuidance on vLLM and SGLang to help you find the optimal setup for generating structured output based on your use case.

OwLite Meets Qualcomm Neural Network: Unlocking On-Device AI Performance

OwLite Meets Qualcomm Neural Network: Unlocking On-Device AI Performance

At SqueezeBits we have been empowering developers to efficiently deploy complex AI models while minimizing performance trade-offs with OwLite toolkit. With OwLite v2.5, we're excited to announce official support for Qualcomm Neural Network (QNN) through seamless integration with Qualcomm AI Hub.

[vLLM vs TensorRT-LLM] #7. Weight-Activation Quantization

[vLLM vs TensorRT-LLM] #7. Weight-Activation Quantization

This article provides a comparative analysis of the effects of weight-activation quantization on vLLM and TensorRT-LLM frameworks.

TechvLLM vs TRT LLM

SqueezeBits

RSS·Powered by Inblog