https://arxiv.org/abs/2211.05102
Efficiently Scaling Transformer Inference (Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean)
PaLM 540B를 deploy하기 위한 작업을 하고 있나 보네요. inference efficiency를 위한 optimization 작업입니다.
#llm #efficiency