https://arxiv.org/abs/2208.07339

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale (Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer)

retraining이나 finetuning 없이 llm 체크포인트를 그대로 불러와 8 bit quantization을 수행. 메모리 소모를 절반으로 줄이면서 성능은 그대로 유지합니다. 속도는 13B 이하에서는 오히려 감소하는데 그 이상에서는 향상되기도 하네요. 전용 커널이 들어가면 좀 더 나을 수 있을 것 같습니다.

#quantization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

220815 LLM.int8().md

220815 LLM.int8().md

Files

220815 LLM.int8().md

Latest commit

History

220815 LLM.int8().md

File metadata and controls