Ask a Question

Prefer a chat interface with context about you and your work?

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by training on long-context data, followed by architectural modifications to reduce the overhead of KV cache during serving. …