Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Type: Article

Publication Date: 2017-10-01

Citations: 14682

DOI: https://doi.org/10.1109/iccv.2017.74

Download PDF

Abstract

We propose a technique for producing `visual explanations' for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach - Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for `dog' or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. Unlike previous approaches, Grad- CAM is applicable to a wide variety of CNN model-families: (1) CNNs with fully-connected layers (e.g. VGG), (2) CNNs used for structured outputs (e.g. captioning), (3) CNNs used in tasks with multi-modal inputs (e.g. visual question answering) or reinforcement learning, without architectural changes or re-training. We combine Grad-CAM with existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and apply it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures. In the context of image classification models, our visualizations (a) lend insights into failure modes of these models (showing that seemingly unreasonable predictions have reasonable explanations), (b) outperform previous methods on the ILSVRC-15 weakly-supervised localization task, (c) are more faithful to the underlying model, and (d) help achieve model generalization by identifying dataset bias. For image captioning and VQA, our visualizations show even non-attention based models can localize inputs. Finally, we design and conduct human studies to measure if Grad-CAM explanations help users establish appropriate trust in predictions from deep networks and show that Grad-CAM helps untrained users successfully discern a `stronger' deep network from a `weaker' one even when both make identical predictions. Our code is available at https: //github.com/ramprs/grad-cam/ along with a demo on CloudCV [2] and video at youtu.be/COjUB9Izk6E.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization 2016 Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
+ Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization 2016 Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
+ Grad-CAM: Why did you say that? 2016 Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
+ U-CAM: Visual Explanation using Uncertainty based Class Activation Maps 2019 Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
+ PDF Chat U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps 2019 Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
+ U-CAM: Visual Explanation using Uncertainty based Class Activation Maps 2019 Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
+ PDF Chat Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization 2019 Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
+ PDF Chat Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded 2019 Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
+ Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded 2019 Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
+ PDF Chat Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification 2024 Matteo Bianchi
Antonio de Santis
Andrea Tocchetti
Marco Brambilla
+ PDF Chat Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification 2024 Matteo Bianchi
Antonio de Santis
Andrea Tocchetti
Marco Brambilla
+ Uncertainty based Class Activation Maps for Visual Question Answering 2020 Badri N. Patro
Mayank Lunayach
Vinay P. Namboodiri
+ Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks 2020 Rachel Lea Draelos
Lawrence Carin
+ Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks. 2021 Rachel Lea Draelos
Lawrence Carin
+ CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations 2021 Leila Arras
Ahmed Osman
Wojciech Samek
+ Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models 2020 Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
+ Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks 2017 José Oramas
Kaili Wang
Tinne Tuytelaars
+ Explain and improve: LRP-inference fine-tuning for image captioning models 2021 Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
+ PDF Chat Integrated Grad-Cam: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks Via Integrated Gradient-Based Scoring 2021 Sam Sattarzadeh
Mahesh Sudhakar
Konstantinos N. Plataniotis
Jongseong Jang
Yeonjeong Jeong
Hyunwoo Kim
+ Integrated Grad-CAM: Sensitivity-Aware Visual Explanation of Deep Convolutional Networks via Integrated Gradient-Based Scoring 2021 Sam Sattarzadeh
Mahesh Sudhakar
Konstantinos N. Plataniotis
Jongseong Jang
Yeonjeong Jeong
Hyunwoo Kim