On the Reliability of Test Collections for Evaluating Systems of Different Types

Emine Yılmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos

Type: Preprint

Publication Date: 2020-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2004.13486

View Publication

Locations

arXiv (Cornell University) - View
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	On the Reliability of Test Collections for Evaluating Systems of Different Types	2020	Emine Yılmaz Nick Craswell Bhaskar Mitra Daniel Campos
+	Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?	2022	Ellen M. Voorhees Ian Soboroff Jimmy Lin
+	Understanding and Predicting Characteristics of Test Collections in Information Retrieval	2020	M.M. Rahman Mücahid Kutlu Matthew Lease
+ PDF Chat	Synthetic Test Collections for Retrieval Evaluation	2024	Hossein A. Rahmani Nick Craswell Emine Yilmaz Bhaskar Mitra Daniel Campos
+ PDF Chat	TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime	2021	Nick Craswell Bhaskar Mitra Emine Yılmaz Daniel Campos Ellen M. Voorhees Ian Soboroff
+	TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime	2021	Nick Craswell Bhaskar Mitra Emine Yılmaz Daniel Campos Ellen M. Voorhees Ian Soboroff
+	Consistency and Variation in Kernel Neural Ranking Model	2018	Mary Arpita Pyreddy Varshini Ramaseshan Narendra Nath Joshi Zhuyun Dai Chenyan Xiong Jamie Callan Zhiyuan Liu
+	Understanding and Predicting the Characteristics of Test Collections.	2020	M.M. Rahman Mücahid Kutlu Matthew Lease
+	Designing Test Collections for Comparing Many Systems	2014	Tetsuya Sakai
+	A comparison of pooled and sampled relevance judgments	2007	Ian Soboroff
+	Evaluation-as-a-Service: Overview and Outlook	2015	Allan Hanbury Henning Müller Krisztian Balog Torben Brodt Gordon V. Cormack Ivan Eggel Tim Gollub Frank Hopfgartner Jayashree Kalpathy‐Cramer Noriko Kando
+	How Discriminative Are Your Qrels? How To Study the Statistical Significance of Document Adjudication Methods	2023	David Otero Javier Parapar Nicola Ferro
+	Distributed Evaluations: Ending Neural Point Metrics	2018	Daniel Cohen Scott M. Jordan W. Bruce Croft
+ PDF Chat	A Comparison of Methods for Evaluating Generative IR	2024	Negar Arabzadeh Charles L. A. Clarke
+	Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)	2022	Tetsuya Sakai Sijie Tao Zhaohao Zeng
+ PDF Chat	LLMs Can Patch Up Missing Relevance Judgments in Evaluation	2024	Shivani Upadhyay Ehsan Kamalloo Jimmy Lin
+	New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches	2023	Mehmet Deniz Türkmen Matthew Lease Mücahid Kutlu
+ PDF Chat	On the Statistical Significance with Relevance Assessments of Large Language Models	2024	David Otero Javier Parapar Álvaro Barreiro
+ PDF Chat	Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations	2023	Yan Xiao Yixing Fan Ruqing Zhang Jiafeng Guo
+ PDF Chat	AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark	2024	Junming Chen Nan Wang Chaofan Li Bo Wang Shitao Xiao Han Xiao Hao Liao Defu Lian Zheng Liu

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors