STORYSUMM: Evaluating Faithfulness in Story Summarization

Type: Preprint

Publication Date: 2024-07-08

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2407.06501

Abstract

Human evaluation has been the gold standard for checking faithfulness in abstractive summarization. However, with a challenging source domain like narrative, multiple annotators can agree a summary is faithful, while missing details that are obvious errors only once pointed out. We therefore introduce a new dataset, STORYSUMM, comprising LLM summaries of short stories with localized faithfulness labels and error explanations. This benchmark is for evaluation methods, testing whether a given method can detect challenging inconsistencies. Using this dataset, we first show that any one human annotation protocol is likely to miss inconsistencies, and we advocate for pursuing a range of methods when establishing ground truth for a summarization dataset. We finally test recent automatic metrics and find that none of them achieve more than 70% balanced accuracy on this task, demonstrating that it is a challenging benchmark for future work in faithfulness evaluation.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat SNaC: Coherence Error Detection for Narrative Summarization 2022 Tanya Goyal
Junyi Jessy Li
Greg Durrett
+ SNaC: Coherence Error Detection for Narrative Summarization 2022 Tanya Goyal
Junyi Jessy Li
Greg Durrett
+ Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics 2021 Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
+ Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics 2021 Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
+ BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics 2022 Liang Ma
Shuyang Cao
Robert L. Logan
Lu Di
Shihao Ran
Ke Zhang
Joel Tetreault
Aoife Cahill
Alejandro Jaimes
+ PDF Chat FABLES: Evaluating faithfulness and content selection in book-length summarization 2024 Yekyung Kim
Yapei Chang
Marzena Karpinska
Aparna Garimella
Varun Manjunatha
Kyle Lo
Tanya Goyal
Mohit Iyyer
+ PDF Chat Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency 2022 Yanzhu Guo
Chloé Clavel
Moussa Kamal Eddine
Michalis Vazirgiannis
+ Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency 2022 Yanzhu Guo
Chloé Clavel
Moussa Kamal Eddine
Michalis Vazirgiannis
+ LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization 2023 Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
+ NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization 2022 Chao Zhao
Faeze Brahman
Kaiqiang Song
Wenlin Yao
Dian Yu
Snigdha Chaturvedi
+ Asking and Answering Questions to Evaluate the Factual Consistency of Summaries 2020 Alex Wang
Kyunghyun Cho
Mike Lewis
+ On Faithfulness and Factuality in Abstractive Summarization 2020 Joshua Maynez
Shashi Narayan
Bernd Bohnet
Ryan McDonald
+ On Faithfulness and Factuality in Abstractive Summarization 2020 Joshua Maynez
Shashi Narayan
Bernd Bohnet
Ryan McDonald
+ On Faithfulness and Factuality in Abstractive Summarization 2020 Joshua Maynez
Shashi Narayan
Bernd Bohnet
Ryan McDonald
+ PDF Chat Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization 2022 Faisal Ladhak
Esin Durmus
He He
Claire Cardie
Kathleen McKeown
+ Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation 2022 Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
Ruilin Han
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
+ NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization 2022 Chao Zhao
Faeze Brahman
Kaiqiang Song
Wenlin Yao
Dian Yu
Snigdha Chaturvedi
+ Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization 2021 Faisal Ladhak
Esin Durmus
He He
Claire Cardie
Kathleen McKeown
+ PDF Chat Annotating and Modeling Fine-grained Factuality in Summarization 2021 Tanya Goyal
Greg Durrett
+ Annotating and Modeling Fine-grained Factuality in Summarization 2021 Tanya Goyal
Greg Durrett

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors