Long-Context Language Modeling with Parallel Context Encoding

Type: Preprint

Publication Date: 2024-02-26

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2402.16617

Abstract

Extending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE adopts a small encoder to process long inputs chunk by chunk and enables the frozen decoder to leverage additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, CEPE extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models with only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long context on downstream tasks.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Parallel Context Windows for Large Language Models 2022 Nir Ratner
Yoav Levine
Yonatan Belinkov
Ori Ram
Omri Abend
Ehud Karpas
Amnon Shashua
Kevin Leyton‐Brown
Yoav Shoham
+ PDF Chat FocusLLM: Scaling LLM's Context by Parallel Decoding 2024 Zhenyu Li
Yike Zhang
Tao Pan
Yutao Sun
Zhichao Duan
Junjie Fang
Rong Han
Zixuan Wang
Jianyong Wang
+ PDF Chat Two are better than one: Context window extension with multi-grained self-injection 2024 Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
+ PDF Chat Training-Free Long-Context Scaling of Large Language Models 2024 Chenxin An
Fei Huang
Jun Zhang
Shansan Gong
Xipeng Qiu
Chang Zhou
Lingpeng Kong
+ PDF Chat InfiniPot: Infinite Context Processing on Memory-Constrained LLMs 2024 Min‐Soo Kim
Kyuhong Shim
Jungwook Choi
Simyung Chang
+ PDF Chat Adapting Language Models to Compress Contexts 2023 Alexis Chevalier
Alexander Wettig
Anirudh Ajith
Danqi Chen
+ Effective Long-Context Scaling of Foundation Models 2023 Wenhan Xiong
Jingyu Liu
Igor Molybog
Hejia Zhang
Prajjwal Bhargava
Rui Hou
Louis Martin
Rashi Rungta
Karthik Abinav Sankararaman
Barlas Oğuz
+ Adapting Language Models to Compress Contexts 2023 Alexis Chevalier
Alexander Wettig
Anirudh Ajith
Danqi Chen
+ PDF Chat How to Train Long-Context Language Models (Effectively) 2024 Tianyu Gao
Alexander Wettig
H. W. Yen
Danqi Chen
+ PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training 2023 Dawei Zhu
Nan Yang
Liang Wang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
+ PDF Chat LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models 2024 Zhiyuan Hu
Yuliang Liu
Jinman Zhao
Suyuchen Wang
Yan Wang
Wei Shen
Qing Gu
Luu Anh Tuan
See-Kiong Ng
Zhiwei Jiang
+ PDF Chat Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models 2024 Haoran Lian
Junmin Chen
Wei Huang
Yizhe Xiong
Wenping Hu
Guiguang Ding
Hui Chen
Jianwei Niu
Zijia Lin
Fuzheng Zhang
+ PDF Chat Why Does the Effective Context Length of LLMs Fall Short? 2024 Chenxin An
Jun Zhang
Ming Zhong
Lei Li
Shansan Gong
Yao Luo
Jingjing Xu
Lingpeng Kong
+ PDF Chat LLoCO: Learning Long Contexts Offline 2024 Sijun Tan
Xiuyu Li
Shishir G. Patil
Ziyang Wu
Tianjun Zhang
Kurt Keutzer
Joseph E. Gonzalez
Raluca Ada Popa
+ PDF Chat LIFT: Improving Long Context Understanding Through Long Input Fine-Tuning 2024 Yansheng Mao
Jiaqi Li
Fanxu Meng
Jing Xiong
Zilong Zheng
Muhan Zhang
+ PDF Chat LongAlign: A Recipe for Long Context Alignment of Large Language Models 2024 Yushi Bai
Xin Lv
Jiajie Zhang
Yuze He
Ji Qi
Lei Hou
Jie Tang
Yuxiao Dong
Juanzi Li
+ PDF Chat Recycled Attention: Efficient inference for long-context language models 2024 Fangyuan Xu
Tanya Goyal
Eunsol Choi
+ Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention 2023 Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
+ PDF Chat E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning 2024 Zihan Liao
Jun Wang
Hang Yu
Lingxiao Wei
Jianguo Li
Jun Wang
Wei Zhang
+ LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models 2023 Yukang Chen
Shengju Qian
Haotian Tang
Xin Lai
Zhijian Liu
Song Han
Jiaya Jia

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors