Ask a Question

Prefer a chat interface with context about you and your work?

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Min P Sampling: Balancing Creativity and Coherence at High Temperature

Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, …