|
Blog
OwLite
Fits on Chips
SqueezeBits
Subscribe
Open main menu
Search posts...
Internal Traffic (traffic_type=internal)
Accessed from the dashboard.
This session is not logged.
SqueezeBits
Subscribe
Yeonjoon Jung
[vLLM vs TensorRT-LLM] #11. Speculative Decoding
This article provides a comparative analysis of speculative decoding.
Dec 09, 2024
Tech
[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks, focusing on batching configurations and thoroughly examining the effects of maximum batch size and maximum number of tokens.
Oct 11, 2024
Tech
[vLLM vs TensorRT-LLM] #1. An Overall Evaluation
This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks for serving LLMs, evaluating their performance based on key metrics like throughput, TTFT, and TPOT to offer insights for practitioners in optimizing LLM deployment strategies.
Oct 01, 2024
Tech
SqueezeBits
RSS
ยท
Powered by Inblog