Skip to content

Latest commit

 

History

History
178 lines (127 loc) · 18.3 KB

awesome_long_context_llm.md

File metadata and controls

178 lines (127 loc) · 18.3 KB

Awesom long-context llm

Survey

Papers

  • KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, arXiv, 2401.18079, arxiv, pdf, cication: -1

    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

  • Long-Context-Data-Engineering - FranxYao Star

    Implementation of paper Data Engineering for Scaling Language Models to 128K Context

  • LongAlign: A Recipe for Long Context Alignment of Large Language Models, arXiv, 2401.18058, arxiv, pdf, cication: -1

    Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li · (LongAlign - THUDM) Star

  • With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation, arXiv, 2401.11504, arxiv, pdf, cication: -1

    Y. Wang, D. Ma, D. Cai · (zhuanlan.zhihu)

  • E^2-LLM: Efficient and Extreme Length Extension of Large Language Models, arXiv, 2401.06951, arxiv, pdf, cication: -1

    Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su

  • Extending LLMs' Context Window with 100 Samples, arXiv, 2401.07004, arxiv, pdf, cication: -1

    Yikai Zhang, Junlong Li, Pengfei Liu · (Entropy-ABF - GAIR-NLP) Star

  • Transformers are Multi-State RNNs, arXiv, 2401.06104, arxiv, pdf, cication: -1

    Matanel Oren, Michael Hassid, Yossi Adi, Roy Schwartz · (TOVA - schwartz-lab-NLP) Star

  • Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models, arXiv, 2401.04658, arxiv, pdf, cication: -1

    Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong · (lightning-attention - OpenNLPLab) Star

  • Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache, arXiv, 2401.02669, arxiv, pdf, cication: -1

    Bin Lin, Tao Peng, Chen Zhang, Minmin Sun, Lanbo Li, Hanyu Zhao, Wencong Xiao, Qi Xu, Xiafei Qiu, Shen Li

  • LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, arXiv, 2401.01325, arxiv, pdf, cication: -1

    Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu · (qbitai)

  • Cached Transformers: Improving Transformers with Differentiable Memory Cache, arXiv, 2312.12742, arxiv, pdf, cication: -1

    Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo

  • Extending Context Window of Large Language Models via Semantic Compression, arXiv, 2312.09571, arxiv, pdf, cication: -1

    Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

  • Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention, arXiv, 2312.08618, arxiv, pdf, cication: -1

    Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu

  • Ultra-Long Sequence Distributed Transformer, arXiv, 2311.02382, arxiv, pdf, cication: -1

    Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley

  • HyperAttention: Long-context Attention in Near-Linear Time, arXiv, 2310.05869, arxiv, pdf, cication: 2

    Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

  • CLEX: Continuous Length Extrapolation for Large Language Models, arXiv, 2310.16450, arxiv, pdf, cication: -1

    Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing

  • TRAMS: Training-free Memory Selection for Long-range Language Modeling, arXiv, 2310.15494, arxiv, pdf, cication: -1

    Haofei Yu, Cunxiang Wang, Yue Zhang, Wei Bi

  • Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

    Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz · (mp.weixin.qq)

  • Scaling Laws of RoPE-based Extrapolation, arXiv, 2310.05209, arxiv, pdf, cication: -1

    Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin · (qbitai)

  • Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

    Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz

  • Ring Attention with Blockwise Transformers for Near-Infinite Context, arXiv, 2310.01889, arxiv, pdf, cication: -1

    Hao Liu, Matei Zaharia, Pieter Abbeel

  • EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation, arXiv, 2310.08185, arxiv, pdf, cication: -1

    Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei

  • CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending, arXiv, 2309.08646, arxiv, pdf, cication: -1

    Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li · (Collinear-Constrained-Attention - codefuse-ai) Star · (jiqizhixin)

  • Effective Long-Context Scaling of Foundation Models, arXiv, 2309.16039, arxiv, pdf, cication: 1

    Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz · (qbitai)

  • LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models, arXiv, 2308.16137, arxiv, pdf, cication: 3

    Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang

  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models, arXiv, 2309.14509, arxiv, pdf, cication: -1

    Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He

  • YaRN: Efficient Context Window Extension of Large Language Models, arXiv, 2309.00071, arxiv, pdf, cication: 9

    Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole · (yarn - jquesnelle) Star · (jiqizhixin)

  • In-context Autoencoder for Context Compression in a Large Language Model, arXiv, 2307.06945, arxiv, pdf, cication: 4

    Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

  • Focused Transformer: Contrastive Training for Context Scaling, arXiv, 2307.03170, arxiv, pdf, cication: 12

    Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

  • Lost in the Middle: How Language Models Use Long Contexts, arXiv, 2307.03172, arxiv, pdf, cication: 64

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

  • LongNet: Scaling Transformers to 1,000,000,000 Tokens, arXiv, 2307.02486, arxiv, pdf, cication: 15

    Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

  • Extending Context Window of Large Language Models via Positional Interpolation, arXiv, 2306.15595, arxiv, pdf, cication: 36

    Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian · (qbitai)

  • The Impact of Positional Encoding on Length Generalization in Transformers, arXiv, 2305.19466, arxiv, pdf, cication: 5

    Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy

  • Long-range Language Modeling with Self-retrieval, arXiv, 2306.13421, arxiv, pdf, cication: 3

    Ohad Rubin, Jonathan Berant

  • Block-State Transformers, arXiv, 2306.09539, arxiv, pdf, cication: 2

    Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin

  • LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, arXiv, 2306.15626, arxiv, pdf, cication: 14

    Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

  • GLIMMER: generalized late-interaction memory reranker, arXiv, 2306.10231, arxiv, pdf, cication: 1

    Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie

  • Augmenting Language Models with Long-Term Memory, arXiv, 2306.07174, arxiv, pdf, cication: 7

    Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei · (aka)

  • Sequence Parallelism: Long Sequence Training from System Perspective, arXiv, 2105.13120, arxiv, pdf, cication: 2

    Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

Projects

  • LLMLingua - microsoft Star

    To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

  • long-context - abacusai Star

    This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.

  • LLaMA rope_scaling

  • long_llama - cstankonrad Star

    LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Other

Extra reference