SqueezeBits
Bringing NPUs into Production: Our Journey with Intel Gaudi
SqueezeBits partnered with Intel to optimize Gaudi-2 for generative AI workloads like vLLM and image generation. The result is Yetter: faster, cheaper inference for production AI.
How to Quantize Transformer-based model for TensorRT Deployment
This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.
How to Quantize YOLO models with OwLite
This article describes the experimental results of quantized YOLO models with OwLite.
OwLite: No More Compromising on AI Performance After Quantization
Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.