Min P Sampling: Balancing Creativity and Coherence at High Temperature
Min P Sampling: Balancing Creativity and Coherence at High Temperature
Large Language Models (LLMs) generate longform text by successively sampling the next token based on the probability distribution of the token vocabulary at each decoding step. Current popular truncation sampling methods such as top-$p$ sampling, also known as nucleus sampling, often struggle to balance coherence and creativity in generating text, …