CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Type: Article

Publication Date: 2017-07-01

Citations: 1700

DOI: https://doi.org/10.1109/cvpr.2017.215

Download PDF

Abstract

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover short-comings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2016 Justin C. Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2016 Justin Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ PDF Chat QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning 2022 Zechen Li
Anders Søgaard
+ QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning 2022 Zechen Li
Anders Søgaard
+ PDF Chat REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering 2020 Siwen Luo
Soyeon Caren Han
Kaiyuan Sun
Josiah Poon
+ PDF Chat PropTest: Automatic Property Testing for Improved Visual Programming 2024 Jaywon Koo
Ziyan Yang
Paola Cascante-Bonilla
Baishakhi Ray
Vicente Ordóñez
+ PDF Chat VISREAS: Complex Visual Reasoning with Unanswerable Questions 2024 Syeda Nahida Akter
Sangwu Lee
Yingshan Chang
Yonatan Bisk
Eric Nyberg
+ PDF Chat How Transferable are Reasoning Patterns in VQA? 2021 Corentin Kervadec
Théo Jaunet
Grigory Antipov
Moez Baccouche
Romain Vuillemot
Christian Wolf
+ Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" 2020 Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
Kazuhito Koishida
+ PDF Chat Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task 2025 Mohit Vaishnav
Tanel Tammet
+ PDF Chat Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning 2018 David Mascharka
Philip Tran
Ryan Soklaski
Arjun Majumdar
+ PDF Chat FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts 2024 Shubhankar Singh
Purvi Chaurasia
Yerram Varun
Pranshu Pandya
Vatsal Gupta
Vivek Gupta
Dan Roth
+ PDF Chat Take A Step Back: Rethinking the Two Stages in Visual Reasoning 2024 Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
+ Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers 2024 Aleksandar Stanić
Sergi Caelles
Michael Tschannen
+ Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data 2023 Nathan Vaska
Victoria Helus
+ PDF Chat Can you even tell left from right? Presenting a new challenge for VQA 2024 Sai Raam Venkataraman
Rishi Sridhar Rao
S. Balasubramanian
R. Raghunatha Sarma
Chandra Sekhar Vorugunti
+ Can you even tell left from right? Presenting a new challenge for VQA 2022 Sai Raam Venkatraman
Rishi Rao
S. Balasubramanian
Chandra Sekhar Vorugunti
R. Raghunatha Sarma
+ PDF Chat From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis 2024 Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
+ Inferring and Executing Programs for Visual Reasoning 2017 Justin C. Johnson
Bharath Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ PDF Chat Inferring and Executing Programs for Visual Reasoning 2017 Justin Johnson
Bharath Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick

Works That Cite This (1047)

Action Title Year Authors
+ IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning 2021 Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Yu Zhou
Xiaodan Liang
Song‐Chun Zhu
+ Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis 2021 Yu Hsuan Li
Tzu-Yin Chao
Ching-Chun Huang
Pin‐Yu Chen
Wei-Chen Chiu
+ PDF Chat DPPMask: Masked Image Modeling with Determinantal Point Processes 2024 Junde Xu
Zikai Lin
Donghao Zhou
Yaodong Yang
Xiangyun Liao
Qiong Wang
Bian Wu
Guangyong Chen
Pheng‐Ann Heng
+ Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks 2024 Tianwei Chen
Noa García
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
+ PDF Chat Weakly Supervised Temporal Adjacent Network for Language Grounding 2021 Yuechen Wang
Jiajun Deng
Wengang Zhou
Houqiang Li
+ Deconfounded Image Captioning: A Causal Retrospect 2020 Xu Yang
Hanwang Zhang
Jianfei Cai
+ PDF Chat Generative Compositional Augmentations for Scene Graph Prediction 2021 B. A. Knyazev
Harm de Vries
Cătălina Cangea
Graham W. Taylor
Aaron Courville
Eugene Belilovsky
+ PDF Chat PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation 2023 Qihao Liu
Adam Kortylewski
Alan Yuille
+ Evaluating the Progress of Deep Learning for Visual Relational Concepts 2020 Sebastian Stabinger
David Peer
Justus Piater
Antonio Rodrı́guez-Sánchez
+ PDF Chat Interpretable Neural Computation for Real-World Compositional Visual Question Answering 2020 Ruixue Tang
Chao Ma