SqueezeBits
When Should I Use Fits on Chips?
This article describes when to use Fits on Chips toolkit with specific use cases.
Fits on Chips: Saving LLM Costs Became Easier Than Ever
This article introduces Fits on Chips, an LLMOps toolkit for performance evaluation.
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
A brief review of the research paper from our team, published at ICML 2024.
Feb 17, 2025
TechResearchThe Missing Piece of TensorRT-LLM
This article is about an open-source library for direct conversion of PyTorch models to TensorRT-LLM.