{"id":13717528,"url":"https://github.com/HobbitLong/RepDistiller","last_synced_at":"2025-05-07T07:31:36.377Z","repository":{"id":37458086,"uuid":"216665070","full_name":"HobbitLong/RepDistiller","owner":"HobbitLong","description":"[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods","archived":false,"fork":false,"pushed_at":"2023-10-16T19:21:54.000Z","size":62,"stargazers_count":2308,"open_issues_count":34,"forks_count":402,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-14T16:57:55.712Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HobbitLong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-10-21T21:03:12.000Z","updated_at":"2025-04-14T08:53:20.000Z","dependencies_parsed_at":"2022-07-12T14:03:43.680Z","dependency_job_id":"4ca190fb-1bc4-426f-b904-d3318df8724f","html_url":"https://github.com/HobbitLong/RepDistiller","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HobbitLong%2FRepDistiller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HobbitLong%2FRepDistiller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HobbitLong%2FRepDistiller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HobbitLong%2FRepDistiller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HobbitLong","download_url":"https://codeload.github.com/HobbitLong/RepDistiller/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252833570,"owners_count":21811210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T00:01:23.600Z","updated_at":"2025-05-07T07:31:36.016Z","avatar_url":"https://github.com/HobbitLong.png","language":"Python","readme":"# RepDistiller\n\nThis repo:\n\n**(1) covers the implementation of the following ICLR 2020 paper:**\n\n\"Contrastive Representation Distillation\" (CRD). [Paper](http://arxiv.org/abs/1910.10699), [Project Page](http://hobbitlong.github.io/CRD/).\n\n\u003cdiv style=\"text-align:center\"\u003e\u003cimg src=\"http://hobbitlong.github.io/CRD/CRD_files/teaser.jpg\" width=\"85%\" height=\"85%\"\u003e\u003c/div\u003e  \n\n\u003cp\u003e\u003c/p\u003e\n\n**(2) benchmarks 12 state-of-the-art knowledge distillation methods in PyTorch, including:**\n\n(KD) - Distilling the Knowledge in a Neural Network  \n(FitNet) - Fitnets: hints for thin deep nets  \n(AT) - Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks\n    via Attention Transfer  \n(SP) - Similarity-Preserving Knowledge Distillation  \n(CC) - Correlation Congruence for Knowledge Distillation  \n(VID) - Variational Information Distillation for Knowledge Transfer  \n(RKD) - Relational Knowledge Distillation  \n(PKT) - Probabilistic Knowledge Transfer for deep representation learning  \n(AB) - Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons  \n(FT) - Paraphrasing Complex Network: Network Compression via Factor Transfer  \n(FSP) - A Gift from Knowledge Distillation:\n    Fast Optimization, Network Minimization and Transfer Learning  \n(NST) - Like what you like: knowledge distill via neuron selectivity transfer \n\n## Installation\n\nThis repo was tested with Ubuntu 16.04.5 LTS, Python 3.5, PyTorch 0.4.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions \u003e=0.4.0\n\n## Running\n\n1. Fetch the pretrained teacher models by:\n\n    ```\n    sh scripts/fetch_pretrained_teachers.sh\n    ```\n   which will download and save the models to `save/models`\n   \n2. Run distillation by following commands in `scripts/run_cifar_distill.sh`. An example of running Geoffrey's original Knowledge Distillation (KD) is given by:\n\n    ```\n    python train_student.py --path_t ./save/models/resnet32x4_vanilla/ckpt_epoch_240.pth --distill kd --model_s resnet8x4 -r 0.1 -a 0.9 -b 0 --trial 1\n    ```\n    where the flags are explained as:\n    - `--path_t`: specify the path of the teacher model\n    - `--model_s`: specify the student model, see 'models/\\_\\_init\\_\\_.py' to check the available model types.\n    - `--distill`: specify the distillation method\n    - `-r`: the weight of the cross-entropy loss between logit and ground truth, default: `1`\n    - `-a`: the weight of the KD loss, default: `None`\n    - `-b`: the weight of other distillation losses, default: `None`\n    - `--trial`: specify the experimental id to differentiate between multiple runs.\n    \n    Therefore, the command for running CRD is something like:\n    ```\n    python train_student.py --path_t ./save/models/resnet32x4_vanilla/ckpt_epoch_240.pth --distill crd --model_s resnet8x4 -a 0 -b 0.8 --trial 1\n    ```\n    \n3. Combining a distillation objective with KD is simply done by setting `-a` as a non-zero value, which results in the following example (combining CRD with KD)\n    ```\n    python train_student.py --path_t ./save/models/resnet32x4_vanilla/ckpt_epoch_240.pth --distill crd --model_s resnet8x4 -a 1 -b 0.8 --trial 1     \n    ```\n\n4. (optional) Train teacher networks from scratch. Example commands are in `scripts/run_cifar_vanilla.sh`\n\nNote: the default setting is for a single-GPU training. If you would like to play this repo with multiple GPUs, you might need to tune the learning rate, which empirically needs to be scaled up linearly with the batch size, see [this paper](https://arxiv.org/abs/1706.02677)\n\n## Benchmark Results on CIFAR-100:\n\nPerformance is measured by classification accuracy (%)\n\n1. Teacher and student are of the **same** architectural type.\n\n| Teacher \u003cbr\u003e Student | wrn-40-2 \u003cbr\u003e wrn-16-2 | wrn-40-2 \u003cbr\u003e wrn-40-1 | resnet56 \u003cbr\u003e resnet20 | resnet110 \u003cbr\u003e resnet20 | resnet110 \u003cbr\u003e resnet32 | resnet32x4 \u003cbr\u003e resnet8x4 |  vgg13 \u003cbr\u003e vgg8 |\n|:---------------:|:-----------------:|:-----------------:|:-----------------:|:------------------:|:------------------:|:--------------------:|:-----------:|\n| Teacher \u003cbr\u003e Student |    75.61 \u003cbr\u003e 73.26    |    75.61 \u003cbr\u003e 71.98    |    72.34 \u003cbr\u003e 69.06    |     74.31 \u003cbr\u003e 69.06    |     74.31 \u003cbr\u003e 71.14    |      79.42 \u003cbr\u003e 72.50     | 74.64 \u003cbr\u003e 70.36 |\n| KD | 74.92 | 73.54 | 70.66 | 70.67 | 73.08 | 73.33 | 72.98 |\n| FitNet | 73.58 | 72.24 | 69.21 | 68.99 | 71.06 | 73.50 | 71.02 |\n| AT | 74.08 | 72.77 | 70.55 | 70.22 | 72.31 | 73.44 | 71.43 |\n| SP | 73.83 | 72.43 | 69.67 | 70.04 | 72.69 | 72.94 | 72.68 |\n| CC | 73.56 | 72.21 | 69.63 | 69.48 | 71.48 | 72.97 | 70.71 |\n| VID | 74.11 | 73.30 | 70.38 | 70.16 | 72.61 | 73.09 | 71.23 |\n| RKD | 73.35 | 72.22 | 69.61 | 69.25 | 71.82 | 71.90 | 71.48 |\n| PKT | 74.54 | 73.45 | 70.34 | 70.25 | 72.61 | 73.64 | 72.88 |\n| AB | 72.50 | 72.38 | 69.47 | 69.53 | 70.98 | 73.17 | 70.94 |\n| FT | 73.25 | 71.59 | 69.84 | 70.22 | 72.37 | 72.86 | 70.58 |\n| FSP | 72.91 | N/A | 69.95 | 70.11 | 71.89 | 72.62 | 70.23 |\n| NST | 73.68 | 72.24 | 69.60 | 69.53 | 71.96 | 73.30 | 71.53 |\n| **CRD** | **75.48** | **74.14** | **71.16** | **71.46** | **73.48** | **75.51** | **73.94** |\n\n2. Teacher and student are of **different** architectural type.\n\n| Teacher \u003cbr\u003e Student | vgg13 \u003cbr\u003e MobileNetV2 | ResNet50 \u003cbr\u003e MobileNetV2 | ResNet50 \u003cbr\u003e vgg8 | resnet32x4 \u003cbr\u003e ShuffleNetV1 | resnet32x4 \u003cbr\u003e ShuffleNetV2 | wrn-40-2 \u003cbr\u003e ShuffleNetV1 |\n|:---------------:|:-----------------:|:--------------------:|:-------------:|:-----------------------:|:-----------------------:|:---------------------:|\n| Teacher \u003cbr\u003e Student |    74.64 \u003cbr\u003e 64.60    |      79.34 \u003cbr\u003e 64.60     |  79.34 \u003cbr\u003e 70.36  |       79.42 \u003cbr\u003e 70.50       |       79.42 \u003cbr\u003e 71.82       |      75.61 \u003cbr\u003e 70.50      |\n| KD | 67.37 | 67.35 | 73.81 | 74.07 | 74.45 | 74.83 |\n| FitNet | 64.14 | 63.16 | 70.69 | 73.59 | 73.54 | 73.73 |\n| AT | 59.40 | 58.58 | 71.84 | 71.73 | 72.73 | 73.32 |\n| SP | 66.30 | 68.08 | 73.34 | 73.48 | 74.56 | 74.52 |\n| CC | 64.86 | 65.43 | 70.25 | 71.14 | 71.29 | 71.38 |\n| VID | 65.56 | 67.57 | 70.30 | 73.38 | 73.40 | 73.61 |\n| RKD | 64.52 | 64.43 | 71.50 | 72.28 | 73.21 | 72.21 |\n| PKT | 67.13 | 66.52 | 73.01 | 74.10 | 74.69 | 73.89 |\n| AB | 66.06 | 67.20 | 70.65 | 73.55 | 74.31 | 73.34 |\n| FT | 61.78 | 60.99 | 70.29 | 71.75 | 72.50 | 72.03 |\n| NST | 58.16 | 64.96 | 71.28 | 74.12 | 74.68 | 74.89 |\n| **CRD** | **69.73** | **69.11** | **74.30** | **75.11** | **75.65** | **76.05** |\n\n## Citation\n\nIf you find this repo useful for your research, please consider citing the paper\n\n```\n@inproceedings{tian2019crd,\n  title={Contrastive Representation Distillation},\n  author={Yonglong Tian and Dilip Krishnan and Phillip Isola},\n  booktitle={International Conference on Learning Representations},\n  year={2020}\n}\n```\nFor any questions, please contact Yonglong Tian (yonglong@mit.edu).\n\n## Acknowledgement\n\nThanks to Baoyun Peng for providing the code of CC and to Frederick Tung for verifying our reimplementation of SP. Thanks also go to authors of other papers who make their code publicly available.\n","funding_links":[],"categories":["Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries","PyTorch","Knowledge Distillation"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHobbitLong%2FRepDistiller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHobbitLong%2FRepDistiller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHobbitLong%2FRepDistiller/lists"}