SqueezeBits
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
A brief review of the research paper from our team, published at ICML 2024.
Feb 17, 2025
TechResearchThe Missing Piece of TensorRT-LLM
This article is about an open-source library for direct conversion of PyTorch models to TensorRT-LLM.
The Rise and Fall of ONNX (feat. PyTorch 2.0)
This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.
[vLLM vs TensorRT-LLM] #13. Vision-Language Models
This article provides a comparative analysis of serving vision-language models on vLLM and TensorRT-LLM.