Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/szagoruyko/attention-transfer
Improving Convolutional Networks via Attention Transfer (ICLR 2017)
https://github.com/szagoruyko/attention-transfer
attention deep-learning knowledge-distillation pytorch
Last synced: 4 days ago
JSON representation
Improving Convolutional Networks via Attention Transfer (ICLR 2017)
- Host: GitHub
- URL: https://github.com/szagoruyko/attention-transfer
- Owner: szagoruyko
- Created: 2017-01-17T15:38:09.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-07-11T11:49:59.000Z (over 6 years ago)
- Last Synced: 2025-01-12T19:04:34.480Z (11 days ago)
- Topics: attention, deep-learning, knowledge-distillation, pytorch
- Language: Jupyter Notebook
- Homepage: https://arxiv.org/abs/1612.03928
- Size: 443 KB
- Stars: 1,450
- Watchers: 51
- Forks: 276
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-pytorch - Attention Transfer
- awesome-pytorch - Attention Transfer
README
Attention Transfer
==============PyTorch code for "Paying More Attention to Attention: Improving the Performance of
Convolutional Neural Networks via Attention Transfer"
Conference paper at ICLR2017: https://openreview.net/forum?id=Sks9_ajexWhat's in this repo so far:
* Activation-based AT code for CIFAR-10 experiments
* Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
* Jupyter notebook to visualize attention maps of ResNet-34 [visualize-attention.ipynb](visualize-attention.ipynb)Coming:
* grad-based AT
* Scenes and CUB activation-based AT codeThe code uses PyTorch . Note that the original experiments were done
using [torch-autograd](https://github.com/twitter/torch-autograd), we have so far validated that CIFAR-10 experiments are
*exactly* reproducible in PyTorch, and are in process of doing so for ImageNet (results are
very slightly worse in PyTorch, due to hyperparameters).bibtex:
```
@inproceedings{Zagoruyko2017AT,
author = {Sergey Zagoruyko and Nikos Komodakis},
title = {Paying More Attention to Attention: Improving the Performance of
Convolutional Neural Networks via Attention Transfer},
booktitle = {ICLR},
url = {https://arxiv.org/abs/1612.03928},
year = {2017}}
```## Requirements
First install [PyTorch](https://pytorch.org), then install [torchnet](https://github.com/pytorch/tnt):
```
pip install git+https://github.com/pytorch/tnt.git@master
```then install other Python packages:
```
pip install -r requirements.txt
```## Experiments
### CIFAR-10
This section describes how to get the results in the table 1 of the paper.
First, train teachers:
```
python cifar.py --save logs/resnet_40_1_teacher --depth 40 --width 1
python cifar.py --save logs/resnet_16_2_teacher --depth 16 --width 2
python cifar.py --save logs/resnet_40_2_teacher --depth 40 --width 2
```To train with activation-based AT do:
```
python cifar.py --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3
```To train with KD:
```
python cifar.py --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9
```We plan to add AT+KD with decaying `beta` to get the best knowledge transfer results soon.
### ImageNet
#### Pretrained model
We provide ResNet-18 pretrained model with activation based AT:
| Model | val error |
|:------|:---------:|
|ResNet-18 | 30.4, 10.8 |
|ResNet-18-ResNet-34-AT | 29.3, 10.0 |Download link:
Model definition:
Convergence plot:
#### Train from scratch
Download pretrained weights for ResNet-34
(see also [functional-zoo](https://github.com/szagoruyko/functional-zoo) for more
information):```
wget https://s3.amazonaws.com/modelzoo-networks/resnet-34-export.pth
```Prepare the data following [fb.resnet.torch](https://github.com/facebook/fb.resnet.torch)
and run training (e.g. using 2 GPUs):```
python imagenet.py --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
--teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
--beta 1e+3
```