Awesom long-context llm

Awesom long-context llm
- Survey
- Papers
- Projects
- Other
- Extra reference

Survey

Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding, arXiv, 2312.17044, arxiv, pdf, cication: -1

Liang Zhao, Xiaocheng Feng, Xiachong Feng, Bing Qin, Ting Liu · (jiqizhixin)
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey, arXiv, 2311.12351, arxiv, pdf, cication: -1

Yunpeng Huang, Jingwei Xu, Zixu Jiang, Junyu Lai, Zenan Li, Yuan Yao, Taolue Chen, Lijuan Yang, Zhou Xin, Xiaoxing Ma · (long-llms-learning - Strivin0311)
The Transformer Family | Lil'Log
The Transformer Family Version 2.0 | Lil'Log
Attention? Attention! | Lil'Log

Papers

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, arXiv, 2401.18079, arxiv, pdf, cication: -1

Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami
Long-Context-Data-Engineering - FranxYao

Implementation of paper Data Engineering for Scaling Language Models to 128K Context
LongAlign: A Recipe for Long Context Alignment of Large Language Models, arXiv, 2401.18058, arxiv, pdf, cication: -1

Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li · (LongAlign - THUDM)
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation, arXiv, 2401.11504, arxiv, pdf, cication: -1

Y. Wang, D. Ma, D. Cai · (zhuanlan.zhihu)
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models, arXiv, 2401.06951, arxiv, pdf, cication: -1

Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su
Extending LLMs' Context Window with 100 Samples, arXiv, 2401.07004, arxiv, pdf, cication: -1

Yikai Zhang, Junlong Li, Pengfei Liu · (Entropy-ABF - GAIR-NLP)
Transformers are Multi-State RNNs, arXiv, 2401.06104, arxiv, pdf, cication: -1

Matanel Oren, Michael Hassid, Yossi Adi, Roy Schwartz · (TOVA - schwartz-lab-NLP)
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models, arXiv, 2401.04658, arxiv, pdf, cication: -1

Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong · (lightning-attention - OpenNLPLab)
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache, arXiv, 2401.02669, arxiv, pdf, cication: -1

Bin Lin, Tao Peng, Chen Zhang, Minmin Sun, Lanbo Li, Hanyu Zhao, Wencong Xiao, Qi Xu, Xiafei Qiu, Shen Li
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, arXiv, 2401.01325, arxiv, pdf, cication: -1

Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu · (qbitai)
Cached Transformers: Improving Transformers with Differentiable Memory Cache, arXiv, 2312.12742, arxiv, pdf, cication: -1

Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo
Extending Context Window of Large Language Models via Semantic Compression, arXiv, 2312.09571, arxiv, pdf, cication: -1

Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention, arXiv, 2312.08618, arxiv, pdf, cication: -1

Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu
Ultra-Long Sequence Distributed Transformer, arXiv, 2311.02382, arxiv, pdf, cication: -1

Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley
HyperAttention: Long-context Attention in Near-Linear Time, arXiv, 2310.05869, arxiv, pdf, cication: 2

Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh
CLEX: Continuous Length Extrapolation for Large Language Models, arXiv, 2310.16450, arxiv, pdf, cication: -1

Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing
TRAMS: Training-free Memory Selection for Long-range Language Modeling, arXiv, 2310.15494, arxiv, pdf, cication: -1

Haofei Yu, Cunxiang Wang, Yue Zhang, Wei Bi
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz · (mp.weixin.qq)
Scaling Laws of RoPE-based Extrapolation, arXiv, 2310.05209, arxiv, pdf, cication: -1

Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin · (qbitai)
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz
Ring Attention with Blockwise Transformers for Near-Infinite Context, arXiv, 2310.01889, arxiv, pdf, cication: -1

Hao Liu, Matei Zaharia, Pieter Abbeel
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation, arXiv, 2310.08185, arxiv, pdf, cication: -1

Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei
CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending, arXiv, 2309.08646, arxiv, pdf, cication: -1

Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li · (Collinear-Constrained-Attention - codefuse-ai) · (jiqizhixin)
Effective Long-Context Scaling of Foundation Models, arXiv, 2309.16039, arxiv, pdf, cication: 1

Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz · (qbitai)
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models, arXiv, 2308.16137, arxiv, pdf, cication: 3

Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models, arXiv, 2309.14509, arxiv, pdf, cication: -1

Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He
YaRN: Efficient Context Window Extension of Large Language Models, arXiv, 2309.00071, arxiv, pdf, cication: 9

Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole · (yarn - jquesnelle) · (jiqizhixin)
In-context Autoencoder for Context Compression in a Large Language Model, arXiv, 2307.06945, arxiv, pdf, cication: 4

Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei
Focused Transformer: Contrastive Training for Context Scaling, arXiv, 2307.03170, arxiv, pdf, cication: 12

Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś
Lost in the Middle: How Language Models Use Long Contexts, arXiv, 2307.03172, arxiv, pdf, cication: 64

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang
LongNet: Scaling Transformers to 1,000,000,000 Tokens, arXiv, 2307.02486, arxiv, pdf, cication: 15

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Extending Context Window of Large Language Models via Positional Interpolation, arXiv, 2306.15595, arxiv, pdf, cication: 36

Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian · (qbitai)
The Impact of Positional Encoding on Length Generalization in Transformers, arXiv, 2305.19466, arxiv, pdf, cication: 5

Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy
Long-range Language Modeling with Self-retrieval, arXiv, 2306.13421, arxiv, pdf, cication: 3

Ohad Rubin, Jonathan Berant
Block-State Transformers, arXiv, 2306.09539, arxiv, pdf, cication: 2

Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, arXiv, 2306.15626, arxiv, pdf, cication: 14

Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar
GLIMMER: generalized late-interaction memory reranker, arXiv, 2306.10231, arxiv, pdf, cication: 1

Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie
Augmenting Language Models with Long-Term Memory, arXiv, 2306.07174, arxiv, pdf, cication: 7

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei · (aka)
Sequence Parallelism: Long Sequence Training from System Perspective, arXiv, 2105.13120, arxiv, pdf, cication: 2

Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

Projects

LLMLingua - microsoft

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
long-context - abacusai

This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.
LLaMA rope_scaling
long_llama - cstankonrad

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Other

Understanding data influence on context scaling: a close look at baseline solution
Anthropic \ Long context prompting for Claude 2.1
The Secret Sauce behind 100K context window in LLMs: all tricks in one place | by Galina Alperovich | May, 2023 | GoPenAI
Extending Context is Hard | kaiokendev.github.io
一句话解锁100k+上下文大模型真实力，27分涨到98，GPT-4、Claude2.1适用 | 量子位
500万token巨兽，一次读完全套「哈利波特」！比ChatGPT长1000多倍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awesome_long_context_llm.md

awesome_long_context_llm.md

Awesom long-context llm

Survey

Papers

Projects

Other

Extra reference

Files

awesome_long_context_llm.md

Latest commit

History

awesome_long_context_llm.md

File metadata and controls

Awesom long-context llm

Survey

Papers

Projects

Other

Extra reference