Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Existing analysis work in machine reading comprehension (MRC) is largely concerned with evaluating the capabilities of systems. However, the capabilities of datasets are not assessed for benchmarking language understanding precisely. We propose a semi-automated, ablation-based methodology for this challenge; By checking whether questions can be solved even after removing features …