{"id":43212451,"url":"https://github.com/megvii-research/mdistiller","last_synced_at":"2026-02-01T07:34:10.749Z","repository":{"id":37785174,"uuid":"470505142","full_name":"megvii-research/mdistiller","owner":"megvii-research","description":"The official implementation of [CVPR2022] Decoupled Knowledge Distillation https://arxiv.org/abs/2203.08679 and [ICCV2023] DOT: A Distillation-Oriented Trainer  https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf","archived":false,"fork":false,"pushed_at":"2023-11-05T09:23:59.000Z","size":1075,"stargazers_count":634,"open_issues_count":15,"forks_count":100,"subscribers_count":6,"default_branch":"master","last_synced_at":"2023-11-06T10:23:18.769Z","etag":null,"topics":["cifar","coco","computer-vision","cvpr2022","deep-learning","iccv2023","imagenet","knowledge-distillation","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/megvii-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-16T09:03:42.000Z","updated_at":"2023-11-06T07:32:05.000Z","dependencies_parsed_at":"2023-02-18T03:46:02.014Z","dependency_job_id":null,"html_url":"https://github.com/megvii-research/mdistiller","commit_stats":null,"previous_names":[],"tags_count":1,"template":null,"template_full_name":null,"purl":"pkg:github/megvii-research/mdistiller","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2Fmdistiller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2Fmdistiller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2Fmdistiller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2Fmdistiller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/megvii-research","download_url":"https://codeload.github.com/megvii-research/mdistiller/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megvii-research%2Fmdistiller/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28972584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T06:46:42.625Z","status":"ssl_error","status_checked_at":"2026-02-01T06:44:56.173Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cifar","coco","computer-vision","cvpr2022","deep-learning","iccv2023","imagenet","knowledge-distillation","pytorch"],"created_at":"2026-02-01T07:34:10.182Z","updated_at":"2026-02-01T07:34:10.743Z","avatar_url":"https://github.com/megvii-research.png","language":"Python","funding_links":[],"categories":["Knowledge Distillation"],"sub_categories":[],"readme":"\u003cdiv align=center\u003e\u003cimg src=\".github/mdistiller.png\" width=\"40%\" \u003e\u003cdiv align=left\u003e\n\nThis repo is\n\n(1) a PyTorch library that provides classical knowledge distillation algorithms on mainstream CV benchmarks,\n\n(2) the official implementation of the CVPR-2022 paper: [Decoupled Knowledge Distillation](https://arxiv.org/abs/2203.08679).\n\n(3) the official implementation of the ICCV-2023 paper: [DOT: A Distillation-Oriented Trainer](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_DOT_A_Distillation-Oriented_Trainer_ICCV_2023_paper.pdf).\n\n\n# DOT: A Distillation-Oriented Trainer\n\n### Framework\n\n\u003cdiv style=\"text-align:center\"\u003e\u003cimg src=\".github/dot.png\" width=\"80%\" \u003e\u003c/div\u003e\n\n### Main Benchmark Results\n\nOn CIFAR-100:\n\n| Teacher \u003cbr\u003e Student | ResNet32x4 \u003cbr\u003e ResNet8x4| VGG13 \u003cbr\u003e VGG8| ResNet32x4 \u003cbr\u003e  ShuffleNet-V2|\n|:---------------:|:-----------------:|:-----------------:|:-----------------:|\n| KD | 73.33 | 72.98 | 74.45 |\n| **KD+DOT** | **75.12** | **73.77** | **75.55** |\n\nOn Tiny-ImageNet:\n\n| Teacher \u003cbr\u003e Student |ResNet18 \u003cbr\u003e MobileNet-V2|ResNet18 \u003cbr\u003e ShuffleNet-V2|\n|:---------------:|:-----------------:|:-----------------:|\n| KD | 58.35 | 62.26 | \n| **KD+DOT** | **64.01** | **65.75** |\n\nOn ImageNet:\n\n| Teacher \u003cbr\u003e Student |ResNet34 \u003cbr\u003e ResNet18|ResNet50 \u003cbr\u003e MobileNet-V1|\n|:---------------:|:-----------------:|:-----------------:|\n| KD | 71.03 | 70.50 | \n| **KD+DOT** | **71.72** | **73.09** |\n\n# Decoupled Knowledge Distillation\n\n### Framework \u0026 Performance\n\n\u003cdiv style=\"text-align:center\"\u003e\u003cimg src=\".github/dkd.png\" width=\"80%\" \u003e\u003c/div\u003e\n\n### Main Benchmark Results\n\nOn CIFAR-100:\n\n\n| Teacher \u003cbr\u003e Student |ResNet56 \u003cbr\u003e ResNet20|ResNet110 \u003cbr\u003e ResNet32| ResNet32x4 \u003cbr\u003e ResNet8x4| WRN-40-2 \u003cbr\u003e WRN-16-2| WRN-40-2 \u003cbr\u003e WRN-40-1 | VGG13 \u003cbr\u003e VGG8|\n|:---------------:|:-----------------:|:-----------------:|:-----------------:|:------------------:|:------------------:|:--------------------:|\n| KD | 70.66 | 73.08 | 73.33 | 74.92 | 73.54 | 72.98 |\n| **DKD** | **71.97** | **74.11** | **76.32** | **76.23** | **74.81** | **74.68** |\n\n\n| Teacher \u003cbr\u003e Student |ResNet32x4 \u003cbr\u003e ShuffleNet-V1|WRN-40-2 \u003cbr\u003e ShuffleNet-V1| VGG13 \u003cbr\u003e MobileNet-V2| ResNet50 \u003cbr\u003e MobileNet-V2| ResNet32x4 \u003cbr\u003e MobileNet-V2|\n|:---------------:|:-----------------:|:-----------------:|:-----------------:|:------------------:|:------------------:|\n| KD | 74.07 | 74.83 | 67.37 | 67.35 | 74.45 |\n| **DKD** | **76.45** | **76.70** | **69.71** | **70.35** | **77.07** |\n\n\nOn ImageNet:\n\n| Teacher \u003cbr\u003e Student |ResNet34 \u003cbr\u003e ResNet18|ResNet50 \u003cbr\u003e MobileNet-V1|\n|:---------------:|:-----------------:|:-----------------:|\n| KD | 71.03 | 70.50 | \n| **DKD** | **71.70** | **72.05** |\n\n# MDistiller\n\n### Introduction\n\nMDistiller supports the following distillation methods on CIFAR-100, ImageNet and MS-COCO:\n|Method|Paper Link|CIFAR-100|ImageNet|MS-COCO|\n|:---:|:---:|:---:|:---:|:---:|\n|KD| \u003chttps://arxiv.org/abs/1503.02531\u003e |\u0026check;|\u0026check;| |\n|FitNet| \u003chttps://arxiv.org/abs/1412.6550\u003e |\u0026check;| | |\n|AT| \u003chttps://arxiv.org/abs/1612.03928\u003e |\u0026check;|\u0026check;| |\n|NST| \u003chttps://arxiv.org/abs/1707.01219\u003e |\u0026check;| | |\n|PKT| \u003chttps://arxiv.org/abs/1803.10837\u003e |\u0026check;| | |\n|KDSVD| \u003chttps://arxiv.org/abs/1807.06819\u003e |\u0026check;| | |\n|OFD| \u003chttps://arxiv.org/abs/1904.01866\u003e |\u0026check;|\u0026check;| |\n|RKD| \u003chttps://arxiv.org/abs/1904.05068\u003e |\u0026check;| | |\n|VID| \u003chttps://arxiv.org/abs/1904.05835\u003e |\u0026check;| | |\n|SP| \u003chttps://arxiv.org/abs/1907.09682\u003e |\u0026check;| | |\n|CRD| \u003chttps://arxiv.org/abs/1910.10699\u003e |\u0026check;|\u0026check;| |\n|ReviewKD| \u003chttps://arxiv.org/abs/2104.09044\u003e |\u0026check;|\u0026check;|\u0026check;|\n|DKD| \u003chttps://arxiv.org/abs/2203.08679\u003e |\u0026check;|\u0026check;|\u0026check;|\n\n\n### Installation\n\nEnvironments:\n\n- Python 3.6\n- PyTorch 1.9.0\n- torchvision 0.10.0\n\nInstall the package:\n\n```\nsudo pip3 install -r requirements.txt\nsudo python3 setup.py develop\n```\n\n### Getting started\n\n0. Wandb as the logger\n\n- The registeration: \u003chttps://wandb.ai/home\u003e.\n- If you don't want wandb as your logger, set `CFG.LOG.WANDB` as `False` at `mdistiller/engine/cfg.py`.\n\n1. Evaluation\n\n- You can evaluate the performance of our models or models trained by yourself.\n\n- Our models are at \u003chttps://github.com/megvii-research/mdistiller/releases/tag/checkpoints\u003e, please download the checkpoints to `./download_ckpts`\n\n- If test the models on ImageNet, please download the dataset at \u003chttps://image-net.org/\u003e and put them to `./data/imagenet`\n\n  ```bash\n  # evaluate teachers\n  python3 tools/eval.py -m resnet32x4 # resnet32x4 on cifar100\n  python3 tools/eval.py -m ResNet34 -d imagenet # ResNet34 on imagenet\n  \n  # evaluate students\n  python3 tools/eval.p -m resnet8x4 -c download_ckpts/dkd_resnet8x4 # dkd-resnet8x4 on cifar100\n  python3 tools/eval.p -m MobileNetV1 -c download_ckpts/imgnet_dkd_mv1 -d imagenet # dkd-mv1 on imagenet\n  python3 tools/eval.p -m model_name -c output/your_exp/student_best # your checkpoints\n  ```\n\n\n2. Training on CIFAR-100\n\n- Download the `cifar_teachers.tar` at \u003chttps://github.com/megvii-research/mdistiller/releases/tag/checkpoints\u003e and untar it to `./download_ckpts` via `tar xvf cifar_teachers.tar`.\n\n  ```bash\n  # for instance, our DKD method.\n  python3 tools/train.py --cfg configs/cifar100/dkd/res32x4_res8x4.yaml\n\n  # you can also change settings at command line\n  python3 tools/train.py --cfg configs/cifar100/dkd/res32x4_res8x4.yaml SOLVER.BATCH_SIZE 128 SOLVER.LR 0.1\n  ```\n\n3. Training on ImageNet\n\n- Download the dataset at \u003chttps://image-net.org/\u003e and put them to `./data/imagenet`\n\n  ```bash\n  # for instance, our DKD method.\n  python3 tools/train.py --cfg configs/imagenet/r34_r18/dkd.yaml\n  ```\n\n4. Training on MS-COCO\n\n- see [detection.md](detection/README.md)\n\n\n5. Extension: Visualizations\n\n- Jupyter notebooks: [tsne](tools/visualizations/tsne.ipynb) and [correlation_matrices](tools/visualizations/correlation.ipynb)\n\n\n### Custom Distillation Method\n\n1. create a python file at `mdistiller/distillers/` and define the distiller\n  \n  ```python\n  from ._base import Distiller\n\n  class MyDistiller(Distiller):\n      def __init__(self, student, teacher, cfg):\n          super(MyDistiller, self).__init__(student, teacher)\n          self.hyper1 = cfg.MyDistiller.hyper1\n          ...\n\n      def forward_train(self, image, target, **kwargs):\n          # return the output logits and a Dict of losses\n          ...\n      # rewrite the get_learnable_parameters function if there are more nn modules for distillation.\n      # rewrite the get_extra_parameters if you want to obtain the extra cost.\n    ...\n  ```\n\n2. regist the distiller in `distiller_dict` at `mdistiller/distillers/__init__.py`\n\n3. regist the corresponding hyper-parameters at `mdistiller/engines/cfg.py`\n\n4. create a new config file and test it.\n\n# Citation\n\nIf this repo is helpful for your research, please consider citing the paper:\n\n```BibTeX\n@article{zhao2022dkd,\n  title={Decoupled Knowledge Distillation},\n  author={Zhao, Borui and Cui, Quan and Song, Renjie and Qiu, Yiyu and Liang, Jiajun},\n  journal={arXiv preprint arXiv:2203.08679},\n  year={2022}\n}\n@article{zhao2023dot,\n  title={DOT: A Distillation-Oriented Trainer},\n  author={Zhao, Borui and Cui, Quan and Song, Renjie and Liang, Jiajun},\n  journal={arXiv preprint arXiv:2307.08436},\n  year={2023}\n}\n```\n\n# License\n\nMDistiller is released under the MIT license. See [LICENSE](LICENSE) for details.\n\n# Acknowledgement\n\n- Thanks for CRD and ReviewKD. We build this library based on the [CRD's codebase](https://github.com/HobbitLong/RepDistiller) and the [ReviewKD's codebase](https://github.com/dvlab-research/ReviewKD).\n\n- Thanks Yiyu Qiu and Yi Shi for the code contribution during their internship in MEGVII Technology.\n\n- Thanks Xin Jin for the discussion about DKD.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegvii-research%2Fmdistiller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmegvii-research%2Fmdistiller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegvii-research%2Fmdistiller/lists"}