SqueezeBits
How to Quantize YOLO models with OwLite
This article describes the experimental results of quantized YOLO models with OwLite.
OwLite: No More Compromising on AI Performance After Quantization
Discover how OwLite simplifies AI model optimization with seamless integration and secure architecture.
[Intel Gaudi] #5. FLUX.1 on Gaudi-2
This article discusses inference efficiency when running the FLUX.1 models on Intel Gaudi-2 hardware.
TensorRT-LLM Goes Open Source!
With TensorRT-LLM now open source, we can finally take a deep dive into the secret sauce behind its impressive performance.