A Little Goes a Long Way: Efficient Long Context Training and Inference
with Partial Contexts
A Little Goes a Long Way: Efficient Long Context Training and Inference
with Partial Contexts
Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by training on long-context data, followed by architectural modifications to reduce the overhead of KV cache during serving. …