Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/szagoruyko/attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)
https://github.com/szagoruyko/attention-transfer

attention deep-learning knowledge-distillation pytorch

Last synced: 4 days ago
JSON representation

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Awesome Lists containing this project

README

        

Attention Transfer
==============

PyTorch code for "Paying More Attention to Attention: Improving the Performance of
Convolutional Neural Networks via Attention Transfer"

Conference paper at ICLR2017: https://openreview.net/forum?id=Sks9_ajex

What's in this repo so far:
* Activation-based AT code for CIFAR-10 experiments
* Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
* Jupyter notebook to visualize attention maps of ResNet-34 [visualize-attention.ipynb](visualize-attention.ipynb)

Coming:
* grad-based AT
* Scenes and CUB activation-based AT code

The code uses PyTorch . Note that the original experiments were done
using [torch-autograd](https://github.com/twitter/torch-autograd), we have so far validated that CIFAR-10 experiments are
*exactly* reproducible in PyTorch, and are in process of doing so for ImageNet (results are
very slightly worse in PyTorch, due to hyperparameters).

bibtex:

```
@inproceedings{Zagoruyko2017AT,
author = {Sergey Zagoruyko and Nikos Komodakis},
title = {Paying More Attention to Attention: Improving the Performance of
Convolutional Neural Networks via Attention Transfer},
booktitle = {ICLR},
url = {https://arxiv.org/abs/1612.03928},
year = {2017}}
```

## Requirements

First install [PyTorch](https://pytorch.org), then install [torchnet](https://github.com/pytorch/tnt):

```
pip install git+https://github.com/pytorch/tnt.git@master
```

then install other Python packages:

```
pip install -r requirements.txt
```

## Experiments

### CIFAR-10

This section describes how to get the results in the table 1 of the paper.

First, train teachers:

```
python cifar.py --save logs/resnet_40_1_teacher --depth 40 --width 1
python cifar.py --save logs/resnet_16_2_teacher --depth 16 --width 2
python cifar.py --save logs/resnet_40_2_teacher --depth 40 --width 2
```

To train with activation-based AT do:

```
python cifar.py --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3
```

To train with KD:

```
python cifar.py --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9
```

We plan to add AT+KD with decaying `beta` to get the best knowledge transfer results soon.

### ImageNet

#### Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

| Model | val error |
|:------|:---------:|
|ResNet-18 | 30.4, 10.8 |
|ResNet-18-ResNet-34-AT | 29.3, 10.0 |

Download link:

Model definition:

Convergence plot:

#### Train from scratch

Download pretrained weights for ResNet-34
(see also [functional-zoo](https://github.com/szagoruyko/functional-zoo) for more
information):

```
wget https://s3.amazonaws.com/modelzoo-networks/resnet-34-export.pth
```

Prepare the data following [fb.resnet.torch](https://github.com/facebook/fb.resnet.torch)
and run training (e.g. using 2 GPUs):

```
python imagenet.py --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
--teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
--beta 1e+3
```