Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 377 Bytes

221109 Efficiently Scaling Transformer Inference.md

File metadata and controls

7 lines (4 loc) · 377 Bytes

https://arxiv.org/abs/2211.05102

Efficiently Scaling Transformer Inference (Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean)

PaLM 540B를 deploy하기 위한 작업을 하고 있나 보네요. inference efficiency를 위한 optimization 작업입니다.

#llm #efficiency