Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo

Type: Article

Publication Date: 2018-01-01

Citations: 54

DOI: https://doi.org/10.18653/v1/d18-1164

Abstract

In Visual Question Answering, most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer. Although such end-to-end models might report promising performance, they rarely provide any insight, apart from the answer, into the VQA process. In this work, we propose to break up the end-to-end VQA into two steps: explaining and reasoning, in an attempt towards a more explainable VQA by shedding light on the intermediate results between these two steps. To that end, we first extract attributes and generate descriptions as explanations for an image. Next, a reasoning module utilizes these explanations in place of the image to infer an answer. The advantages of such a breakdown include: (1) the attributes and captions can reflect what the system extracts from the image, thus can provide some insights for the predicted answer; (2) these intermediate results can help identify the inabilities of the image understanding or the answer inference part when the predicted answer is wrong. We conduct extensive experiments on a popular VQA dataset and our system achieves comparable performance with the baselines, yet with added benefits of explanability and the inherent ability to further improve with higher quality explanations.

Locations

arXiv (Cornell University) - View - PDF
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing - View - PDF

Similar Works

Action	Title	Year	Authors
+	VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions	2018	Qing Li Qingyi Tao Shafiq Joty Jianfei Cai Jiebo Luo
+ PDF Chat	Coarse-to-Fine Reasoning for Visual Question Answering	2022	Binh X. Nguyen Tuong Do Huy Dat Tran Erman Tjiputra Quang D. Tran Anh Nguyen
+	Coarse-to-Fine Reasoning for Visual Question Answering	2021	Binh X. Nguyen Tuong Do Huy Tran Erman Tjiputra Quang D. Tran Anh Nguyen
+	Answer Them All! Toward Universal Visual Question Answering Models	2019	Robik Shrestha Kushal Kafle Christopher Kanan
+ PDF Chat	Answer Them All! Toward Universal Visual Question Answering Models	2019	Robik Shrestha Kushal Kafle Christopher Kanan
+	Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering	2017	Vahid Kazemi Ali Elqursh
+ PDF Chat	Convincing Rationales for Visual Question Answering Reasoning	2024	Kun Li George Vosselman Michael Ying Yang
+	Image Captioning and Visual Question Answering Based on Attributes and Their Related External Knowledge.	2016	Qi Wu Chunhua Shen Anton van den Hengel Peng Wang Anthony Dick
+	Faithful Multimodal Explanation for Visual Question Answering	2019	Jialin Wu Raymond J. Mooney
+	Visual Question Answering: A Survey of Methods and Datasets	2016	Qi Wu Damien Teney Peng Wang Chunhua Shen Anthony Dick Anton van den Hengel
+	Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions	2020	Radhika Dua Sai Srinivas Kancheti Vineeth N Balasubramanian
+	Image Captioning and Visual Question Answering Based on Attributes and External Knowledge	2016	Qi Wu Chunhua Shen Anton van den Hengel Peng Wang Anthony Dick
+	Faithful Multimodal Explanation for Visual Question Answering	2018	Jialin Wu Raymond J. Mooney
+	Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions.	2020	Radhika Dua Sai Srinivas Kancheti Vineeth N Balasubramanian
+ PDF Chat	Image Captioning and Visual Question Answering Based on Attributes and External Knowledge	2017	Qi Wu Chunhua Shen Peng Wang Anthony Dick Anton van den Hengel
+ PDF Chat	Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining	2019	Yundong Zhang Juan Carlos Niebles Álvaro Soto
+	Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining	2018	Yundong Zhang Juan Carlos Niebles Álvaro Soto
+	VQA: Visual Question Answering	2015	Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. Lawrence Zitnick Dhruv Batra Devi Parikh
+	Self-Critical Reasoning for Robust Visual Question Answering	2019	Jialin Wu Raymond J. Mooney
+ PDF Chat	LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering	2020	Weixin Liang Feiyang Niu Aishwarya Reganti Govind Thattai Gökhan Tür

Works That Cite This (27)

Action	Title	Year	Authors
+ PDF Chat	Visual question answering based on local-scene-aware referring expression generation	2021	Jung-Jun Kim Dong-Gyu Lee Jialin Wu Hong-Gyu Jung Seong‐Whan Lee
+ PDF Chat	A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge	2022	Dustin Schwenk Apoorv Khandelwal Christopher Clark Kenneth Marino Roozbeh Mottaghi
+	Visual Question Answering based on Local-Scene-Aware Referring Expression Generation	2021	Jungjun Kim Dong-Gyu Lee Jialin Wu Hong-Gyu Jung Seong‐Whan Lee
+	LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering	2020	Weixin Liang Feiyang Niu Aishwarya Reganti Govind Thattai Gökhan Tür
+	QA2Explanation: Generating and Evaluating Explanations for Question Answering Systems over Knowledge Graph	2020	Saeedeh Shekarpour Abhishek Nadgeri Kuldeep Singh
+ PDF Chat	SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions	2020	Ramprasaath R. Selvaraju Purva Tendulkar Devi Parikh Eric Horvitz Marco Túlio Ribeiro Besmira Nushi Ece Kamar
+	Relation-Aware Graph Attention Network for Visual Question Answering	2019	Linjie Li Zhe Gan Yu Cheng Jingjing Liu
+ PDF Chat	Relation-Aware Graph Attention Network for Visual Question Answering	2019	Linjie Li Zhe Gan Yu Cheng Jingjing Liu
+	SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions	2020	Ramprasaath R. Selvaraju Purva Tendulkar Devi Parikh Eric Horvitz Marco Ribeiro Besmira Nushi Ece Kamar
+	Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning	2021	Zhicheng Huang Zhaoyang Zeng Yupan Huang Bei Liu Dongmei Fu Jianlong Fu

Works Cited by This (25)

Action	Title	Year	Authors
+	Microsoft COCO Captions: Data Collection and Evaluation Server	2015	Xinlei Chen Hao Fang Tsung-Yi Lin Ramakrishna Vedantam Saurabh Gupta Piotr Dollár C. Lawrence Zitnick
+ PDF Chat	Show and tell: A neural image caption generator	2015	Oriol Vinyals Alexander Toshev Samy Bengio Dumitru Erhan
+ PDF Chat	From captions to visual concepts and back	2015	Hao Fang Saurabh Gupta Forrest Iandola Rupesh K. Srivastava Li Deng Piotr Dollár Jianfeng Gao Xiaodong He Margaret Mitchell John Platt
+ PDF Chat	VQA: Visual Question Answering	2015	Stanislaw Antol Aishwarya Agrawal Jiasen Lu Margaret Mitchell Dhruv Batra C. Lawrence Zitnick Devi Parikh
+	Neural Machine Translation by Jointly Learning to Align and Translate	2014	Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio
+ PDF Chat	Deep Residual Learning for Image Recognition	2016	Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
+ PDF Chat	Image Captioning with Semantic Attention	2016	Quanzeng You Hailin Jin Zhaowen Wang Fang Chen Jiebo Luo
+	Hierarchical Question-Image Co-Attention for Visual Question Answering	2016	Jiasen Lu Jianwei Yang Dhruv Batra Devi Parikh
+ PDF Chat	Dual Attention Networks for Multimodal Reasoning and Matching	2017	Hyeonseob Nam Jung-Woo Ha Jeonghee Kim
+ PDF Chat	Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering	2017	Yash Goyal Tejas Khot Douglas Summers-Stay Dhruv Batra Devi Parikh