Ask a Question

Prefer a chat interface with context about you and your work?

STORYSUMM: Evaluating Faithfulness in Story Summarization

STORYSUMM: Evaluating Faithfulness in Story Summarization

Human evaluation has been the gold standard for checking faithfulness in abstractive summarization. However, with a challenging source domain like narrative, multiple annotators can agree a summary is faithful, while missing details that are obvious errors only once pointed out. We therefore introduce a new dataset, STORYSUMM, comprising LLM summaries …