https://github.com/ramprs/grad-cam

[ICCV 2017] Torch code for Grad-CAM
https://github.com/ramprs/grad-cam

convolutional-neural-networks deep-learning grad-cam heatmap iccv17 interpretability visual-explanation

Last synced: about 2 months ago
JSON representation

[ICCV 2017] Torch code for Grad-CAM

Host: GitHub
URL: https://github.com/ramprs/grad-cam
Owner: ramprs
Created: 2016-05-27T19:37:36.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2022-09-17T02:42:00.000Z (almost 3 years ago)
Last Synced: 2025-04-08T11:15:22.710Z (3 months ago)
Topics: convolutional-neural-networks, deep-learning, grad-cam, heatmap, iccv17, interpretability, visual-explanation
Language: Lua
Homepage: https://arxiv.org/abs/1610.02391
Size: 1.54 MB
Stars: 1,552
Watchers: 17
Forks: 228
Open Issues: 11
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        
## Grad-CAM: Gradient-weighted Class Activation Mapping

Code for the paper

**[Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization][7]**  

Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra  

[https://arxiv.org/abs/1610.02391][7]

Demo: [gradcam.cloudcv.org][8]

![Overview](http://i.imgur.com/JaGbdZ5.png)

### Usage

Download Caffe model(s) and prototxt for VGG-16/VGG-19/AlexNet using `sh models/download_models.sh`.

#### Classification

```

th classification.lua -input_image_path images/cat_dog.jpg -label 243 -gpuid 0

th classification.lua -input_image_path images/cat_dog.jpg -label 283 -gpuid 0

```

##### Options

- `proto_file`: Path to the `deploy.prototxt` file for the CNN Caffe model. Default is `models/VGG_ILSVRC_16_layers_deploy.prototxt`

- `model_file`: Path to the `.caffemodel` file for the CNN Caffe model. Default is `models/VGG_ILSVRC_16_layers.caffemodel`

- `input_image_path`: Path to the input image. Default is `images/cat_dog.jpg`

- `input_sz`: Input image size. Default is 224 (Change to 227 if using AlexNet)

- `layer_name`: Layer to use for Grad-CAM. Default is `relu5_3` (use `relu5_4` for VGG-19 and `relu5` for AlexNet)

- `label`: Class label to generate grad-CAM for (-1 = use predicted class, 283 = Tiger cat, 243 = Boxer). Default is -1. These correspond to ILSVRC synset IDs

- `out_path`: Path to save images in. Default is `output/`

- `gpuid`: 0-indexed id of GPU to use. Default is -1 = CPU

- `backend`: Backend to use with [loadcaffe][3]. Default is `nn`

- `save_as_heatmap`: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1

##### Examples

'border collie' (233)

![](http://i.imgur.com/nTaVH57.png)

![](http://i.imgur.com/fjZ8E3Z.png)

![](http://i.imgur.com/RzPhOYo.png)

'tabby cat' (282)

![](http://i.imgur.com/nTaVH57.png)

![](http://i.imgur.com/94ZMSNI.png)

![](http://i.imgur.com/wmtUTgj.png)

'boxer' (243)

![](http://i.imgur.com/OAoSQYT.png)

![](http://i.imgur.com/iZuijZy.png)

![](http://i.imgur.com/o7RStQm.png)

'tiger cat' (283)

![](http://i.imgur.com/OAoSQYT.png)

![](http://i.imgur.com/NzXRy5E.png)

![](http://i.imgur.com/fP0Dd87.png)

#### Visual Question Answering

Clone the [VQA][5] ([http://arxiv.org/abs/1505.00468][4]) sub-repository (`git submodule init && git submodule update`), and download and unzip the provided extracted features and pretrained model.

```

th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'dog' -gpuid 0

th visual_question_answering.lua -input_image_path images/cat_dog.jpg -question 'What animal?' -answer 'cat' -gpuid 0

```

##### Options

- `proto_file`: Path to the `deploy.prototxt` file for the CNN Caffe model. Default is `models/VGG_ILSVRC_19_layers_deploy.prototxt`

- `model_file`: Path to the `.caffemodel` file for the CNN Caffe model. Default is `models/VGG_ILSVRC_19_layers.caffemodel`

- `input_image_path`: Path to the input image. Default is `images/cat_dog.jpg`

- `input_sz`: Input image size. Default is 224 (Change to 227 if using AlexNet)

- `layer_name`: Layer to use for Grad-CAM. Default is `relu5_4` (use `relu5_3` for VGG-16 and `relu5` for AlexNet)

- `question`: Input question. Default is `What animal?`

- `answer`: Optional answer (For eg. "cat") to generate Grad-CAM for ('' = use predicted answer). Default is ''

- `out_path`: Path to save images in. Default is `output/`

- `model_path`: Path to VQA model checkpoint. Default is `VQA_LSTM_CNN/lstm.t7`

- `gpuid`: 0-indexed id of GPU to use. Default is -1 = CPU

- `backend`: Backend to use with [loadcaffe][3]. Default is `cudnn`

- `save_as_heatmap`: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1

##### Examples

What animal? Dog

![](http://i.imgur.com/OAoSQYT.png)

![](http://i.imgur.com/QBTstax.png)

![](http://i.imgur.com/NRyhfdL.png)

What animal? Cat

![](http://i.imgur.com/OAoSQYT.png)

![](http://i.imgur.com/hqBWRAm.png)

![](http://i.imgur.com/lwj5oAX.png)

What color is the fire hydrant? Green

![](http://i.imgur.com/Zak2NZW.png)

![](http://i.imgur.com/GbhRhkg.png)

![](http://i.imgur.com/lrAgGj0.png)

What color is the fire hydrant? Yellow

![](http://i.imgur.com/Zak2NZW.png)

![](http://i.imgur.com/cHzOo7k.png)

![](http://i.imgur.com/CJ6QiGD.png)

What color is the fire hydrant? Green and Yellow

![](http://i.imgur.com/Zak2NZW.png)

![](http://i.imgur.com/i7AwHXx.png)

![](http://i.imgur.com/7N6BVgq.png)

What color is the fire hydrant? Red and Yellow

![](http://i.imgur.com/Zak2NZW.png)

![](http://i.imgur.com/uISYeOR.png)

![](http://i.imgur.com/ebZVlTI.png)

#### Image Captioning

Clone the [neuraltalk2][6] sub-repository. Running `sh models/download_models.sh` will download the pretrained model and place it in the neuraltalk2 folder.

Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following:

```

local utils = require 'neuraltalk2.misc.utils'

local net_utils = require 'neuraltalk2.misc.net_utils'

local LSTM = require 'neuraltalk2.misc.LSTM'

```

```

th captioning.lua -input_image_path images/cat_dog.jpg -caption 'a dog and cat posing for a picture' -gpuid 0

th captioning.lua -input_image_path images/cat_dog.jpg -caption '' -gpuid 0

```

##### Options

- `input_image_path`: Path to the input image. Default is `images/cat_dog.jpg`

- `input_sz`: Input image size. Default is 224 (Change to 227 if using AlexNet)

- `layer`: Layer to use for Grad-CAM. Default is 30 (relu5_3 for vgg16)

- `caption`: Optional input caption. No input will use the generated caption as default

- `out_path`: Path to save images in. Default is `output/`

- `model_path`: Path to captioning model checkpoint. Default is `neuraltalk2/model_id1-501-1448236541.t7`

- `gpuid`: 0-indexed id of GPU to use. Default is -1 = CPU

- `backend`: Backend to use with [loadcaffe][3]. Default is `cudnn`

- `save_as_heatmap`: Whether to save heatmap or raw Grad-CAM. 1 = save heatmap, 0 = save raw Grad-CAM. Default is 1

##### Examples

a dog and cat posing for a picture

![](http://i.imgur.com/OAoSQYT.png)

![](http://i.imgur.com/TiKdMMw.png)

![](http://i.imgur.com/GSQeR2M.png)

a bathroom with a toilet and a sink

![](http://i.imgur.com/gE6VXql.png)

![](http://i.imgur.com/K3E9TWS.png)

![](http://i.imgur.com/em2oHRy.png)

### License

BSD

#### 3rd-party

- [VQA_LSTM_CNN][5], BSD

- [neuraltalk2][6], BSD

[3]: https://github.com/szagoruyko/loadcaffe

[4]: http://arxiv.org/abs/1505.00468

[5]: https://github.com/VT-vision-lab/VQA_LSTM_CNN

[6]: https://github.com/karpathy/neuraltalk2 

[7]: https://arxiv.org/abs/1610.02391

[8]: http://gradcam.cloudcv.org/

[9]: https://ramprs.github.io/

[10]: http://abhishekdas.com/

[11]: http://vrama91.github.io/

[12]: http://mcogswell.io/

[13]: https://computing.ece.vt.edu/~parikh/

[14]: https://computing.ece.vt.edu/~dbatra/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ramprs/grad-cam

Awesome Lists containing this project

README