Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

Type: Article

Publication Date: 2017-10-01

Citations: 14682

DOI: https://doi.org/10.1109/iccv.2017.74

Abstract

We propose a technique for producing `visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for `dog' or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad- CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multi-modal inputs (e.g. visual question answering) or reinforcement learning, without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are more faithful to the underlying model, and (d) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a `stronger' deep network from a `weaker' one even when both make identical predictions. Our code is available at https: //github.com/ramprs/grad-cam/ along with a demo on CloudCV [2] and video at youtu.be/COjUB9Izk6E.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization	2016	Ramprasaath R. Selvaraju Abhishek Das Ramakrishna Vedantam Michael Cogswell Devi Parikh Dhruv Batra
+	Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization	2016	Ramprasaath R. Selvaraju Abhishek Das Ramakrishna Vedantam Michael Cogswell Devi Parikh Dhruv Batra
+	Grad-CAM: Why did you say that?	2016	Ramprasaath R. Selvaraju Abhishek Das Ramakrishna Vedantam Michael Cogswell Devi Parikh Dhruv Batra
+	U-CAM: Visual Explanation using Uncertainty based Class Activation Maps	2019	Badri N. Patro Mayank Lunayach Shivansh Patel Vinay P. Namboodiri
+ PDF Chat	U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps	2019	Badri N. Patro Mayank Lunayach Shivansh Patel Vinay P. Namboodiri
+	U-CAM: Visual Explanation using Uncertainty based Class Activation Maps	2019	Badri N. Patro Mayank Lunayach Shivansh Patel Vinay P. Namboodiri
+ PDF Chat	Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization	2019	Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra
+ PDF Chat	Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded	2019	Ramprasaath R. Selvaraju Stefan Lee Yilin Shen Hongxia Jin Shalini Ghosh Larry Heck Dhruv Batra Devi Parikh
+	Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded	2019	Ramprasaath R. Selvaraju Stefan Lee Yilin Shen Hongxia Jin Shalini Ghosh Larry Heck Dhruv Batra Devi Parikh
+ PDF Chat	Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification	2024	Matteo Bianchi Antonio de Santis Andrea Tocchetti Marco Brambilla
+ PDF Chat	Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification	2024	Matteo Bianchi Antonio de Santis Andrea Tocchetti Marco Brambilla
+	Uncertainty based Class Activation Maps for Visual Question Answering	2020	Badri N. Patro Mayank Lunayach Vinay P. Namboodiri
+	Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks	2020	Rachel Lea Draelos Lawrence Carin
+	Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks.	2021	Rachel Lea Draelos Lawrence Carin
+	CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations	2021	Leila Arras Ahmed Osman Wojciech Samek
+	Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models	2020	Jiamei Sun Sebastian Lapuschkin Wojciech Samek Alexander Binder
+	Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks	2017	José Oramas Kaili Wang Tinne Tuytelaars
+	Explain and improve: LRP-inference fine-tuning for image captioning models	2021	Jiamei Sun Sebastian Lapuschkin Wojciech Samek Alexander Binder
+ PDF Chat	Integrated Grad-Cam: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks Via Integrated Gradient-Based Scoring	2021	Sam Sattarzadeh Mahesh Sudhakar Konstantinos N. Plataniotis Jongseong Jang Yeonjeong Jeong Hyunwoo Kim
+	Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring	2021	Sam Sattarzadeh Mahesh Sudhakar Konstantinos N. Plataniotis Jongseong Jang Yeonjeong Jeong Hyunwoo Kim

Works That Cite This (3771)

Action	Title	Year	Authors
+	UDIS: Unsupervised Discovery of Bias in Deep Visual Recognition Models	2021	Arvindkumar Krishnakumar Viraj Prabhu Sruthi Sudhakar Judy Hoffman
+	Damage Estimation and Localization from Sparse Aerial Imagery	2021	Rene Garcia Franceschini Jeffrey Liu Saurabh Amin
+	Weakly-Supervised Cloud Detection with Fixed-Point GANs.	2021	Joachim Nyborg Ira Assent
+	Object-Aware Cropping for Self-Supervised Learning	2021	Shlok Mishra Anshul Shah Ankan Bansal Abhyuday Jagannatha Abhishek Sharma David Jacobs Dilip Krishnan
+	Semantic Network Interpretation	2018	Pei Guo Ryan Farrell
+	Learnable Structural Semantic Readout for Graph Classification	2021	Dongha Lee Su Kyoung Kim Seonghyeon Lee Chanyoung Park Hwanjo Yu
+	Out of distribution detection for skin and malaria images.	2021	Muhammad Zaida Shafaqat Ali Mohsen Ali Sarfaraz Hussein Asma Saadia Waqas Sultani
+ PDF Chat	A comprehensive and reliable feature attribution method: Double-sided remove and reconstruct (DoRaR)	2024	Dong Chen Qin George T. Amariucai Daji Qiao Yong Guan Fu Shen
+ PDF Chat	LATFormer: Locality-Aware Point-View Fusion Transformer for 3D shape recognition	2024	Xinwei He Silin Cheng Dingkang Liang Song Bai Xi Wang Yingying Zhu
+ PDF Chat	Multi-scale convolutional neural network for automated AMD classification using retinal OCT images	2022	Saman Sotoudeh‐Paima Ata Jodeiri Fedra Hajizadeh Hamid Soltanian‐Zadeh

Works Cited by This (35)

Action	Title	Year	Authors
+	Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering	2015	Haoyuan Gao Junhua Mao Jie Zhou Zhiheng Huang Lei Wang Wei Xu
+	Exploring models and data for image question answering	2015	Mengye Ren Ryan Kiros Richard S. Zemel
+ PDF Chat	Self-taught object localization with deep networks	2016	Loris Bazzani Alessandra Bergamo Dragomir Anguelov Lorenzo Torresani
+	Very Deep Convolutional Networks for Large-Scale Image Recognition	2014	Karen Simonyan Andrew Zisserman
+	Inverting Convolutional Networks with Convolutional Networks.	2015	Alexey Dosovitskiy Thomas Brox
+	Microsoft COCO Captions: Data Collection and Evaluation Server	2015	Xinlei Chen Hao Fang Tsung-Yi Lin Ramakrishna Vedantam Saurabh Gupta Piotr Dollár C. Lawrence Zitnick
+ PDF Chat	Show and tell: A neural image caption generator	2015	Oriol Vinyals Alexander Toshev Samy Bengio Dumitru Erhan
+ PDF Chat	Fully convolutional networks for semantic segmentation	2015	Jonathan Long Evan Shelhamer Trevor Darrell
+ PDF Chat	Deep visual-semantic alignments for generating image descriptions	2015	Andrej Karpathy Li Fei-Fei
+ PDF Chat	From captions to visual concepts and back	2015	Hao Fang Saurabh Gupta Forrest Iandola Rupesh K. Srivastava Li Deng Piotr Dollár Jianfeng Gao Xiaodong He Margaret Mitchell John Platt