SqueezeBits
OwLite Meets Qualcomm Neural Network: Unlocking On-Device AI Performance
At SqueezeBits we have been empowering developers to efficiently deploy complex AI models while minimizing performance trade-offs with OwLite toolkit. With OwLite v2.5, we're excited to announce official support for Qualcomm Neural Network (QNN) through seamless integration with Qualcomm AI Hub.
Bringing NPUs into Production: Our Journey with Intel Gaudi
SqueezeBits has partnered with Intel to make Gaudi NPUs more usable in practice. We optimized LLMs and diffusion models for Gaudi-2 and created yetter, a generative AI API service.
How to Quantize Transformer-based model for TensorRT Deployment
This article describes the experimental results of quantized Vision Transformer model and its variants with OwLite.
How to Quantize YOLO models with OwLite
This article describes the experimental results of quantized YOLO models with OwLite.