Ask a Question

Prefer a chat interface with context about you and your work?

Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

Abstract Knowledge-grounded dialogue systems powered by large language models often generate responses that, while fluent, are not attributable to a relevant source of information. Progress towards models that do not exhibit this issue requires evaluation metrics that can quantify its prevalence. To this end, we introduce the Benchmark for Evaluation …