Long-Context Language Modeling with Parallel Context Encoding

H. W. Yen, Tianyu Gao, Danqi Chen

Type: Preprint

Publication Date: 2024-02-26

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2402.16617

Abstract

Extending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE adopts a small encoder to process long inputs chunk by chunk and enables the frozen decoder to leverage additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models with only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long context on downstream tasks.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Parallel Context Windows for Large Language Models	2022	Nir Ratner Yoav Levine Yonatan Belinkov Ori Ram Omri Abend Ehud Karpas Amnon Shashua Kevin Leyton‐Brown Yoav Shoham
+ PDF Chat	FocusLLM: Scaling LLM's Context by Parallel Decoding	2024	Zhenyu Li Yike Zhang Tao Pan Yutao Sun Zhichao Duan Junjie Fang Rong Han Zixuan Wang Jianyong Wang
+ PDF Chat	Two are better than one: Context window extension with multi-grained self-injection	2024	Wei Han Pan Zhou Soujanya Poria Shuicheng Yan
+ PDF Chat	Training-Free Long-Context Scaling of Large Language Models	2024	Chenxin An Fei Huang Jun Zhang Shansan Gong Xipeng Qiu Chang Zhou Lingpeng Kong
+ PDF Chat	InfiniPot: Infinite Context Processing on Memory-Constrained LLMs	2024	Min‐Soo Kim Kyuhong Shim Jungwook Choi Simyung Chang
+ PDF Chat	Adapting Language Models to Compress Contexts	2023	Alexis Chevalier Alexander Wettig Anirudh Ajith Danqi Chen
+	Effective Long-Context Scaling of Foundation Models	2023	Wenhan Xiong Jingyu Liu Igor Molybog Hejia Zhang Prajjwal Bhargava Rui Hou Louis Martin Rashi Rungta Karthik Abinav Sankararaman Barlas Oğuz
+	Adapting Language Models to Compress Contexts	2023	Alexis Chevalier Alexander Wettig Anirudh Ajith Danqi Chen
+ PDF Chat	How to Train Long-Context Language Models (Effectively)	2024	Tianyu Gao Alexander Wettig H. W. Yen Danqi Chen
+	PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training	2023	Dawei Zhu Nan Yang Liang Wang Yifan Song Wenhao Wu Furu Wei Sujian Li
+ PDF Chat	LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models	2024	Zhiyuan Hu Yuliang Liu Jinman Zhao Suyuchen Wang Yan Wang Wei Shen Qing Gu Luu Anh Tuan See-Kiong Ng Zhiwei Jiang
+ PDF Chat	Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models	2024	Haoran Lian Junmin Chen Wei Huang Yizhe Xiong Wenping Hu Guiguang Ding Hui Chen Jianwei Niu Zijia Lin Fuzheng Zhang
+ PDF Chat	Why Does the Effective Context Length of LLMs Fall Short?	2024	Chenxin An Jun Zhang Ming Zhong Lei Li Shansan Gong Yao Luo Jingjing Xu Lingpeng Kong
+ PDF Chat	LLoCO: Learning Long Contexts Offline	2024	Sijun Tan Xiuyu Li Shishir G. Patil Ziyang Wu Tianjun Zhang Kurt Keutzer Joseph E. Gonzalez Raluca Ada Popa
+ PDF Chat	LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning	2024	Yansheng Mao Jiaqi Li Fanxu Meng Jing Xiong Zilong Zheng Muhan Zhang
+ PDF Chat	LongAlign: A Recipe for Long Context Alignment of Large Language Models	2024	Yushi Bai Xin Lv Jiajie Zhang Yuze He Ji Qi Lei Hou Jie Tang Yuxiao Dong Juanzi Li
+ PDF Chat	Recycled Attention: Efficient inference for long-context language models	2024	Fangyuan Xu Tanya Goyal Eunsol Choi
+	Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention	2023	Kaiqiang Song Xiaoyang Wang Sangwoo Cho Xiaoman Pan Dong Yu
+ PDF Chat	E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning	2024	Zihan Liao Jun Wang Hang Yu Lingxiao Wei Jianguo Li Jun Wang Wei Zhang
+	LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models	2023	Yukang Chen Shengju Qian Haotian Tang Xin Lai Zhijian Liu Song Han Jiaya Jia

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors