On the Reliability of Test Collections for Evaluating Systems of Different Types

Type: Preprint

Publication Date: 2020-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2004.13486

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat On the Reliability of Test Collections for Evaluating Systems of Different Types 2020 Emine Yılmaz
Nick Craswell
Bhaskar Mitra
Daniel Campos
+ Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models? 2022 Ellen M. Voorhees
Ian Soboroff
Jimmy Lin
+ Understanding and Predicting Characteristics of Test Collections in Information Retrieval 2020 M.M. Rahman
Mücahid Kutlu
Matthew Lease
+ PDF Chat Synthetic Test Collections for Retrieval Evaluation 2024 Hossein A. Rahmani
Nick Craswell
Emine Yilmaz
Bhaskar Mitra
Daniel Campos
+ PDF Chat TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime 2021 Nick Craswell
Bhaskar Mitra
Emine Yılmaz
Daniel Campos
Ellen M. Voorhees
Ian Soboroff
+ TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime 2021 Nick Craswell
Bhaskar Mitra
Emine Yılmaz
Daniel Campos
Ellen M. Voorhees
Ian Soboroff
+ Consistency and Variation in Kernel Neural Ranking Model 2018 Mary Arpita Pyreddy
Varshini Ramaseshan
Narendra Nath Joshi
Zhuyun Dai
Chenyan Xiong
Jamie Callan
Zhiyuan Liu
+ Understanding and Predicting the Characteristics of Test Collections. 2020 M.M. Rahman
Mücahid Kutlu
Matthew Lease
+ Designing Test Collections for Comparing Many Systems 2014 Tetsuya Sakai
+ A comparison of pooled and sampled relevance judgments 2007 Ian Soboroff
+ Evaluation-as-a-Service: Overview and Outlook 2015 Allan Hanbury
Henning Müller
Krisztian Balog
Torben Brodt
Gordon V. Cormack
Ivan Eggel
Tim Gollub
Frank Hopfgartner
Jayashree Kalpathy‐Cramer
Noriko Kando
+ How Discriminative Are Your Qrels? How To Study the Statistical Significance of Document Adjudication Methods 2023 David Otero
Javier Parapar
Nicola Ferro
+ Distributed Evaluations: Ending Neural Point Metrics 2018 Daniel Cohen
Scott M. Jordan
W. Bruce Croft
+ PDF Chat A Comparison of Methods for Evaluating Generative IR 2024 Negar Arabzadeh
Charles L. A. Clarke
+ Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION) 2022 Tetsuya Sakai
Sijie Tao
Zhaohao Zeng
+ PDF Chat LLMs Can Patch Up Missing Relevance Judgments in Evaluation 2024 Shivani Upadhyay
Ehsan Kamalloo
Jimmy Lin
+ New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches 2023 Mehmet Deniz Türkmen
Matthew Lease
Mücahid Kutlu
+ PDF Chat On the Statistical Significance with Relevance Assessments of Large Language Models 2024 David Otero
Javier Parapar
Álvaro Barreiro
+ PDF Chat Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations 2023 Yan Xiao
Yixing Fan
Ruqing Zhang
Jiafeng Guo
+ PDF Chat AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark 2024 Junming Chen
Nan Wang
Chaofan Li
Bo Wang
Shitao Xiao
Han Xiao
Hao Liao
Defu Lian
Zheng Liu

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors