Ask a Question

Prefer a chat interface with context about you and your work?

Characterizing Prompt Compression Methods for Long Context Inference

Characterizing Prompt Compression Methods for Long Context Inference

Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little …