|
Blog
Yetter
OwLite
Fits on Chips
SqueezeBits
Subscribe
Open main menu
Search posts...
Unlock the Potential of AI
Deploy your AI with Maximal Efficiency
Subscribe
Jiwoong Choi
Disaggregated Inference on Apple Silicon: NPU prefill and GPU decode
In this article, we introduce how to run LLMs efficiently on Apple Silicon with disaggregated inference technique.
Aug 26, 2025
Tech
The Missing Piece of TensorRT-LLM
This article is about an open-source library for direct conversion of PyTorch models to TensorRT-LLM.
Feb 10, 2025
Tech
Fits on Chips
SqueezeBits
RSS
·
Powered by Inblog