VISREAS: Complex Visual Reasoning with Unanswerable Questions

Type: Preprint

Publication Date: 2024-02-22

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2403.10534

Abstract

Verifying a question's validity before answering is crucial in real-world applications, where users may provide imperfect instructions. In this scenario, an ideal model should address the discrepancies in the query and convey them to the users rather than generating the best possible answer. Addressing this requirement, we introduce a new compositional visual question-answering dataset, VISREAS, that consists of answerable and unanswerable visual queries formulated by traversing and perturbing commonalities and differences among objects, attributes, and relations. VISREAS contains 2.07M semantically diverse queries generated automatically using Visual Genome scene graphs. The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer. LOGIC2VISION outperforms generative models in VISREAS (+4.82% over LLaVA-1.5; +12.23% over InstructBLIP) and achieves a significant gain in performance against the classification models.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering 2019 Drew A. Hudson
Christopher D. Manning
+ GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering 2019 Drew A. Hudson
Christopher D. Manning
+ GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering 2019 Drew A. Hudson
Christopher D. Manning
+ PDF Chat LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering 2020 Weixin Liang
Feiyang Niu
Aishwarya Reganti
Govind Thattai
Gökhan Tür
+ LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering 2020 Weixin Liang
Feiyang Niu
Aishwarya Reganti
Govind Thattai
Gökhan Tür
+ Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs. 2021 Daniel Reich
Felix Putze
Tanja Schultz
+ Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs 2021 Daniel Reich
Felix Putze
Tanja Schultz
+ PDF Chat CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations 2022 Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
+ 3D-Aware Visual Question Answering about Parts, Poses and Occlusions 2023 Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
+ PDF Chat CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense 2022 Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
+ Interpretable Neural Computation for Real-World Compositional Visual Question Answering 2020 Ruixue Tang
Chao Ma
+ PDF Chat CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2017 Justin Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network 2019 Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
+ IQ-VQA: Intelligent Visual Question Answering 2020 Vatsal Goel
Mohit Chandak
Ashish Anand
Prithwijit Guha
+ IQ-VQA: Intelligent Visual Question Answering 2020 Vatsal Goel
Mohit Chandak
Ashish Anand
Prithwijit Guha
+ Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions 2020 Radhika Dua
Sai Srinivas Kancheti
Vineeth N Balasubramanian
+ Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" 2020 Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
Kazuhito Koishida
+ CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2016 Justin C. Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning 2016 Justin Johnson
Bharath Hariharan
Laurens van der Maaten
Li Fei-Fei
C. Lawrence Zitnick
Ross Girshick
+ PDF Chat From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis 2024 Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors