Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/zju-vipa/Fast-Datafree

[AAAI-2022] Up to 100x Faster Data-free Knowledge Distillation
https://github.com/zju-vipa/Fast-Datafree
Last synced: 7 days ago
JSON representation
[AAAI-2022] Up to 100x Faster Data-free Knowledge Distillation
Host: GitHub
URL: https://github.com/zju-vipa/Fast-Datafree
Owner: zju-vipa
Created: 2022-01-04T10:01:19.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-10-24T17:09:55.000Z (about 2 years ago)
Last Synced: 2024-08-02T15:30:43.781Z (3 months ago)
Language: Python
Homepage:
Size: 723 KB
Stars: 64
Watchers: 3
Forks: 10
Open Issues: 7
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Fast-Datafree

This repo implements the efficient data-free distillation algorithm from the AAAI-22 paper 

["Up to 100x Faster Data-free Knowledge Distillation"](https://arxiv.org/pdf/2112.06253.pdf)











## TODO

* ImageNet 

* Distributed training

## Results











### CIFAR-10

| Method   | ResNet-34 
 ResNet-18 | VGG-11 
 ResNet-18 | WRN40-2 
 WRN16-1 | WRN40-2 
 WRN40-1 | WRN40-2 
 WRN16-2 | Speed-up |

|------------|---------------------|------------------|-----------------|-----------------|-----------------|------------------|

| Teacher    | 95.70               | 92.25            | 94.87           | 94.87           | 94.87           | -                |

| Student    | 95.20               | 95.20            | 91.12           | 93.94           | 93.95           | -                |

| KD         | 95.20               | 95.20            | 95.20           | 95.20           | 95.20           | -                |

| DeepInv-2k [[7]](#7) | 93.26 (42.1h)       | 90.36 (20.2h)    | 83.04 (16.9h)   | 86.85 (21.9h)   | 89.72 (18.2h)   | 1.0x             |

| CMI-500  [[4]](#4)  | 94.84 (19.0h)       | 91.13 (11.6h)    | 90.01 (13.3h)   | 92.78 (14.1h)   | 92.52 (13.6h)   | 1.6x             |

| DAFL [[1]](#1) | 92.22 (2.73h)       | 81.10 (0.73h)    | 65.71 (1.73h)   | 81.33 (1.53h)   | 81.55 (1.60h)   | 15.7x            |

| ZSKT [[6]](#6) | 93.32 (1.67h)       | 89.46 (0.33h)    | 83.74 (0.87h)   | 86.07 (0.87h)   | 89.66 (0.87h)   | 30.4x            |

| DFQ  [[3]](#3) | 94.61 (8.79h)       | 90.84 (1.50h)    | 86.14 (0.75h)   | 91.69 (0.75h)   | 92.01 (0.75h)   | 18.9x            |

| Fast-2     | 92.62 (0.06h)       | 84.67 (0.03h)    | 88.36 (0.03h)   | 89.56 (0.03h)   | 89.68 (0.03h)   | 655.0x           |

| Fast-5     | 93.63 (0.14h)       | 89.94 (0.08h)    | 88.90 (0.08h)   | 92.04 (0.09h)   | 91.96 (0.08h)   | 247.1x           |

| Fast-10    | 94.05 (0.28h)       | 90.53 (0.15h)    | 89.29 (0.15h)   | 92.51 (0.17h)   | 92.45 (0.17h)   | 126.7x           |

### CIFAR-100

| Method     | ResNet-34 
 ResNet-18 | VGG-11 
 ResNet-18 | WRN40-2 
 WRN16-1 | WRN40-2 
 WRN40-1 | WRN40-2 
 WRN16-2 | Speed Up |

|------------|---------------|---------------|---------------|---------------|---------------|---------|

| Teacher    | 78.05         | 71.32         | 75.83         | 75.83         | 75.83         | -       |

| Student    | 77.10         | 77.10         | 65.31         | 72.19         | 73.56         | -       |

| KD         | 77.87         | 75.07         | 64.06         | 68.58         | 70.79         | -       |

| DeepInv-2k | 61.32 (42.1h) | 54.13 (20.1h) | 53.77 (17.0h) | 61.33 (21.9h) | 61.34 (18.2h) | 1.0x    |

| CMI-500    | 77.04 (19.2h) | 70.56 (11.6h) | 57.91 (13.3h) | 68.88 (14.2h) | 68.75 (13.9h) | 1.6x    |

| DAFL       | 74.47 (2.73h) | 54.16 (0.73h) | 20.88 (1.67h) | 42.83 (1.80h) | 43.70 (1.73h) | 15.2x   |

| ZSKT       | 67.74 (1.67h) | 54.31 (0.33h) | 36.66 (0.87h) | 53.60 (0.87h) | 54.59 (0.87h) | 30.4x   |

| DFQ        | 77.01 (8.79h) | 66.21 (1.54h) | 51.27 (0.75h) | 54.43 (0.75h) | 64.79 (0.75h) | 18.8x   |

| Fast-2     | 69.76 (0.06h) | 62.83 (0.03h) | 41.77 (0.03h) | 53.15 (0.04h) | 57.08 (0.04h) | 588.2x  |

| Fast-5     | 72.82 (0.14h) | 65.28 (0.08h) | 52.90 (0.07h) | 61.80 (0.09h) | 63.83 (0.08h) | 253.1x  |

| Fast-10    | 74.34 (0.27h) | 67.44 (0.16h) | 54.02 (0.16h) | 63.91 (0.17h) | 65.12 (0.17h) | 124.7x  |

### ImageNet

| Method         | Data Amount | Syn. Time | Speed Up | ResNet-50 
 ResNet-50 | ResNet-50 
 ResNet-18 | ResNet-50 
 MobileNetv2 |

|----------------|-------------|-----------|----------|-----------|-----------|-----------|

| Scratch        | 1.3M        | -         | -        | 75.45     | 68.45     | 70.01     |

| Places365+KD   | 1.8M        | -         | -        | 55.74     | 45.53     | 39.89     |

| Generative DFD [[5]](#5) | -           | ~300h     | 1x       | 69.75     | 54.66     | 43.15     |

| DeepInv-2k     | 140k        | 166h      | 1.8x     | 68.00     | -         | -         |

| Fast-50        | 140k        | 6.28h     | 47.8x    | 68.61     | 53.45     | 43.02     |

## Quick Start

### 1. Prepare the files

To reproduce our results, please download pre-trained teacher models from [Dropbox-Models (266 MB)](https://www.dropbox.com/sh/w8xehuk7debnka3/AABhoazFReE_5mMeyvb4iUWoa?dl=0) and extract them as `checkpoints/pretrained`.

Instead, you can train a model from scratch as follows.

```bash

python train_scratch.py --model wrn40_2 --dataset cifar10 --batch-size 256 --lr 0.1 --epoch 200 --gpu 0

```

   

### 2. Reproduce our results

* To get similar results of our method on CIFAR datasets, run the script in `scripts/fast_cifar.sh`. (A sample is shown below) 

  Synthesized images and logs will be saved in `checkpoints/datafree-fast_meta`.

    ```bash

    # g-steps is the number of iterations in synthesizing

    python datafree_kd.py \

    --gpu 0 \

    --seed 0 \

    --dataset cifar100 \

    --warmup 20 --epochs 220 \

    --batch_size 256 \

    --lr 0.2 \

    --kd_steps 400 --ep_steps 400 \

    --adv 1.1 --bn 10.0 --oh 0.4 \

    --act 0 --balance 0 \

    --T 20 \

    --method fast_meta --is_maml 1 \

    --g_steps 2 \

    --lr_z 0.015 --lr_g 5e-3 \

    --bn_mmt 0.9 \

    --reset_l0 1 --reset_bn 0 \

    --save_dir run/wrn-2 --log_tag wrn-2

    ```

* To calculate the generator's gradient with REPTILE, set `--is_maml 0`. 

  To disable meta learning, use `--method fast` option which updates the generator in a normal (sequential) way.

* To get the result of Vanilla KD using original training data, run the following command.

    ```bash

    # beta>0 to use hard targets

    python vanilla_kd.py --teacher wrn40_2 --student wrn16_1 --dataset cifar10 --transfer_set cifar10 --beta 0.1 --batch-size 128 --lr 0.1 --epoch 200 --gpu 0

    ```

* ImageNet: please refers to the [logs](https://github.com/zju-vipa/Fast-Datafree/tree/main/logs) and  the [training script](https://github.com/zju-vipa/Fast-Datafree/blob/main/datafree_kd_imagenet.py) for more details.  

### 3. Train your own models

You can register your models and datasets in registry.py by modifying `NORMALIZ_DICT`, `MODEL_DICT` and `get_dataset`. Then you can run the above commands to train your own models.

## Bibtex

If you found this work useful for your research, please cite our paper:

```

@inproceedings{fang2022up,

  title={Up to 100x faster data-free knowledge distillation},

  author={Fang, Gongfan and Mo, Kanya and Wang, Xinchao and Song, Jie and Bei, Shitao and Zhang, Haofei and Song, Mingli},

  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},

  volume={36},

  number={6},

  pages={6597--6604},

  year={2022}

}

```

## References

[1]

Chen, H., Wang, Y., Xu, C., Yang, Z., Liu, C., Shi, B., ... & Tian, Q. (2019). 

Data-free learning of student networks. 

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3514-3522.  

[2]

Chen, H., Guo, T., Xu, C., Li, W., Xu, C., Xu, C., & Wang, Y. (2021). 

Learning Student Networks in the Wild. 

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6428-6437.  

[3]

Choi, Y., Choi, J., El-Khamy, M., & Lee, J. (2020). 

Data-free network quantization with adversarial knowledge distillation. 

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 710-711.  

[4]

Fang, G., Song, J., Wang, X., Shen, C., Wang, X., & Song, M. (2021). 

Contrastive Model Inversion for Data-Free Knowledge Distillation. 

arXiv preprint arXiv:2105.08584.  

[5]

Luo, L., Sandler, M., Lin, Z., Zhmoginov, A., & Howard, A. (2020). 

Large-scale generative data-free distillation. 

arXiv preprint arXiv:2012.05578.  

[6]

Micaelli, P., & Storkey, A. J. (2019). 

Zero-shot knowledge transfer via adversarial belief matching. 

Advances in Neural Information Processing Systems, 32.  

[7]

Yin, H., Molchanov, P., Alvarez, J. M., Li, Z., Mallya, A., Hoiem, D., ... & Kautz, J. (2020). 

Dreaming to distill: Data-free knowledge transfer via deepinversion. 

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8715-8724.