Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Conversational question answering aims to provide natural-language answers to users in information-seeking conversations. Existing conversational QA benchmarks compare models with pre-collected human-human conversations, using ground-truth answers provided in conversational history. It remains unclear whether we can rely on this static evaluation for model development and whether current systems can well …