Research - SqueezeBits

See All Tech Product vLLM vs TRT LLM Intel Gaudi Yetter OwLite Fits on Chips Biz&Insight Research

Vocabulary Trimming: An Easy and Effective Method for SLM Acceleration

Vocabulary Trimming: An Easy and Effective Method for SLM Acceleration

Trimming large multilingual vocabularies in Small Language Models (SLM) is a simple, low-risk way to boost efficiency to its limit. It accelerates the model inference significantly while keeping accuracy almost unchanged.

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

LoRA excels at efficient fine-tuning but suffers at higher ranks due to gradient entanglement. We introduce GraLoRA, which addresses these issues through finer-grained, block-wise updates, significantly enhancing performance and expressivity without overhead. GraLoRA outperforms LoRA across tasks, achieving up to +8.5% improvement in HumanEval+ Pass@1.

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

A brief review of the research paper from our team, published at ICML 2024.

Vocabulary Trimming: An Easy and Effective Method for SLM Acceleration

Vocabulary Trimming: An Easy and Effective Method for SLM Acceleration

Trimming large multilingual vocabularies in Small Language Models (SLM) is a simple, low-risk way to boost efficiency to its limit. It accelerates the model inference significantly while keeping accuracy almost unchanged.

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

LoRA excels at efficient fine-tuning but suffers at higher ranks due to gradient entanglement. We introduce GraLoRA, which addresses these issues through finer-grained, block-wise updates, significantly enhancing performance and expressivity without overhead. GraLoRA outperforms LoRA across tasks, achieving up to +8.5% improvement in HumanEval+ Pass@1.

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

A brief review of the research paper from our team, published at ICML 2024.

SqueezeBits

RSS·Powered by Inblog