{"id":13408563,"url":"https://github.com/dkozlov/awesome-knowledge-distillation","last_synced_at":"2026-02-17T19:02:15.984Z","repository":{"id":37432228,"uuid":"83434695","full_name":"dkozlov/awesome-knowledge-distillation","owner":"dkozlov","description":"Awesome Knowledge Distillation","archived":false,"fork":false,"pushed_at":"2025-12-25T11:16:29.000Z","size":251,"stargazers_count":3812,"open_issues_count":0,"forks_count":516,"subscribers_count":112,"default_branch":"master","last_synced_at":"2026-02-12T12:41:39.415Z","etag":null,"topics":["co-training","deep-learning","distillation","distillation-model","kd","knowldge-distillation","knowledge-distillation","knowledge-transfer","model-compression","model-distillation","teacher-student"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dkozlov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-02-28T13:18:38.000Z","updated_at":"2026-02-12T09:10:29.000Z","dependencies_parsed_at":"2025-11-15T02:00:43.947Z","dependency_job_id":null,"html_url":"https://github.com/dkozlov/awesome-knowledge-distillation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dkozlov/awesome-knowledge-distillation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkozlov%2Fawesome-knowledge-distillation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkozlov%2Fawesome-knowledge-distillation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkozlov%2Fawesome-knowledge-distillation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkozlov%2Fawesome-knowledge-distillation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dkozlov","download_url":"https://codeload.github.com/dkozlov/awesome-knowledge-distillation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dkozlov%2Fawesome-knowledge-distillation/sbom","scorecard":{"id":346815,"data":{"date":"2025-08-11","repo":{"name":"github.com/dkozlov/awesome-knowledge-distillation","commit":"deb1557f67c67be8f9962e647d1a79143cafc3d2"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.3,"checks":[{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":5,"reason":"Found 9/17 approved changesets -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":5,"reason":"6 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 22 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T07:15:09.547Z","repository_id":37432228,"created_at":"2025-08-18T07:15:09.547Z","updated_at":"2025-08-18T07:15:09.547Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29554389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T18:16:07.221Z","status":"ssl_error","status_checked_at":"2026-02-17T18:16:04.782Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["co-training","deep-learning","distillation","distillation-model","kd","knowldge-distillation","knowledge-distillation","knowledge-transfer","model-compression","model-distillation","teacher-student"],"created_at":"2024-07-30T20:00:53.684Z","updated_at":"2026-02-17T19:02:15.977Z","avatar_url":"https://github.com/dkozlov.png","language":null,"funding_links":[],"categories":["Others","Uncategorized","REFERENCE","Other Lists","A01_文本生成_文本对话","Core Machine Learning Research","🤖 AI \u0026 Machine Learning"],"sub_categories":["Uncategorized","2023","TeX Lists","大语言对话模型及数据","Robustness, Interpretability, and Learning Paradigms"],"readme":"Awesome Knowledge Distillation\n================================\n\n### Papers\n- [Neural Network Ensembles](https://www.researchgate.net/publication/3191841_Neural_Network_Ensembles), L.K. Hansen, P. Salamon, 1990\n- [Neural Network Ensembles, Cross Validation, and Active Learning](https://papers.nips.cc/paper/1001-neural-network-ensembles-cross-validation-and-active-learning.pdf), Andres Krogh, Jesper Vedelsby, 1995\n- [Combining labeled and unlabeled data with co-training](https://www.cs.cmu.edu/~avrim/Papers/cotrain.pdf), A. Blum, T. Mitchell, 1998\n- [Ensemble Methods in Machine Learning](http://web.engr.oregonstate.edu/~tgd/publications/mcs-ensembles.pdf), Thomas G. Dietterich, 2000\n- [Model Compression](http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf), Rich Caruana, 2006\n- [Dark knowledge](http://www.ttic.edu/dl/dark14.pdf), Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2014\n- [Learning with Pseudo-Ensembles](https://arxiv.org/pdf/1412.4864.pdf), Philip Bachman, Ouais Alsharif, Doina Precup, 2014\n- [Distilling the Knowledge in a Neural Network](https://arxiv.org/pdf/1503.02531.pdf), Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2015\n- [Cross Modal Distillation for Supervision Transfer](https://arxiv.org/pdf/1507.00448), Saurabh Gupta, Judy Hoffman, Jitendra Malik, 2015\n- [Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization](https://arxiv.org/pdf/1511.04798), Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal, 2015\n- [Distilling Model Knowledge](https://arxiv.org/pdf/1510.02437.pdf), George Papamakarios, 2015\n- [Unifying distillation and privileged information](https://arxiv.org/pdf/1511.03643), David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015\n- [Learning Using Privileged Information: Similarity Control and Knowledge Transfer](http://www.jmlr.org/papers/volume16/vapnik15b/vapnik15b.pdf), Vladimir Vapnik, Rauf Izmailov, 2015\n- [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/pdf/1511.04508.pdf), Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, 2016\n- [Do deep convolutional nets really need to be deep and convolutional?](https://arxiv.org/pdf/1603.05691.pdf), Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson, 2016\n- [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928), Sergey Zagoruyko, Nikos Komodakis, 2016\n- [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550), Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015\n- [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650), Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016\n- [Knowledge Distillation for Small-footprint Highway Networks](https://arxiv.org/pdf/1608.00892), Liang Lu, Michelle Guo, Steve Renals, 2016\n- [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947), [deeplearning-papernotes](https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/seq-knowledge-distillation.md), Yoon Kim, Alexander M. Rush, 2016\n- [MobileID: Face Model Compression by Distilling Knowledge from Neurons](http://personal.ie.cuhk.edu.hk/~pluo/pdf/aaai16-face-model-compression.pdf), Ping Luo,  Zhenyao Zhu, Ziwei Liu, Xiaogang Wang and Xiaoou Tang, 2016\n- [Recurrent Neural Network Training with Dark Knowledge Transfer](https://arxiv.org/pdf/1505.04630.pdf), Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016\n- [Adapting Models to Signal Degradation using Distillation](https://arxiv.org/abs/1604.00433), Jong-Chyi Su, Subhransu Maji,2016\n- [Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results](https://arxiv.org/pdf/1703.01780), Antti Tarvainen, Harri Valpola, 2017\n- [Data-Free Knowledge Distillation For Deep Neural Networks](http://raphagl.com/research/replayed-distillation/), Raphael Gontijo Lopes, Stefano Fenu, 2017 \n- [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219), Zehao Huang, Naiyan Wang, 2017\n- [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513), Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017\n- [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220), Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017\n- [Knowledge Projection for Deep Neural Networks](https://arxiv.org/pdf/1710.09505), Zhi Zhang, Guanghan Ning, Zhihai He, 2017\n- [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613), Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017\n- [Local Affine Approximators for Improving Knowledge Transfer](https://lld-workshop.github.io/papers/LLD_2017_paper_28.pdf), Suraj Srinivas and Francois Fleuret, 2017\n- [Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model](http://papers.nips.cc/paper/6635-best-of-both-worlds-transferring-knowledge-from-discriminative-learning-to-a-generative-visual-dialog-model.pdf), Jiasen Lu1, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra 2017\n- [Learning Efficient Object Detection Models with Knowledge Distillation](http://papers.nips.cc/paper/6676-learning-efficient-object-detection-models-with-knowledge-distillation.pdf), Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, Manmohan Chandraker, 2017\n- [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/pdf/1707.07012.pdf), Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le, 2017\n- [Revisiting knowledge transfer for training object class detectors](https://arxiv.org/pdf/1708.06128.pdf), Jasper Uijlings, Stefan Popov, Vittorio Ferrari, 2017\n- [A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf), Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, 2017\n- [Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net](https://arxiv.org/pdf/1708.04106.pdf), Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017\n- [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/pdf/1712.04440.pdf), Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, 2017\n- [Parallel WaveNet:Fast High-Fidelity Speech Synthesis](https://arxiv.org/pdf/1711.10433.pdf), Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, 2017\n- [Learning from Noisy Labels with Distillation](https://arxiv.org/abs/1703.02391), Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li, 2017\n- [Deep Mutual Learning](https://arxiv.org/abs/1706.00384), Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, 2017\n- [Distilling a Neural Network Into a Soft Decision Tree](https://arxiv.org/abs/1711.09784), Nicholas Frosst, Geoffrey Hinton, 2017\n- [Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge](https://arxiv.org/pdf/1803.04042.pdf), Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton, 2018\n- [Efficient Neural Architecture Search via Parameters Sharing](https://arxiv.org/pdf/1802.03268), Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, 2018\n- [Defensive Collaborative Multi-task Training - Defending against Adversarial Attack towards Deep Neural Networks](https://arxiv.org/pdf/1803.05123), Derek Wang, Chaoran Li, Sheng Wen, Yang Xiang, Wanlei Zhou, Surya Nepal, 2018\n- [Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation](https://arxiv.org/pdf/1710.06169), Sarah Tan, Rich Caruana, Giles Hooker, Yin Lou, 2018\n- [Deep Co-Training for Semi-Supervised Image Recognition](https://arxiv.org/pdf/1803.05984), Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, Alan Yuille, 2018\n- [Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples](https://arxiv.org/pdf/1803.05787), Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2018\n- [Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling](https://arxiv.org/pdf/1803.04687.pdf), Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang, 2018\n- [Born Again Neural Networks](https://arxiv.org/abs/1805.04770), Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018\n- [YASENN: Explaining Neural Networks via Partitioning Activation Sequences](https://arxiv.org/abs/1811.02783), Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018\n- [Knowledge Distillation with Adversarial Samples Supporting Decision Boundary](https://arxiv.org/abs/1805.05532), Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018\n- [Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons](https://arxiv.org/abs/1811.03233), Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018\n- [Self-supervised knowledge distillation using singular value decomposition](http://openaccess.thecvf.com/content_ECCV_2018/html/SEUNG_HYUN_LEE_Self-supervised_Knowledge_Distillation_ECCV_2018_paper.html), Seung Hyun Lee, Dae Ha Kim, Byung Cheol Song, 2018\n- [Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection](https://arxiv.org/abs/1809.05884), Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan, 2018\n- [Learning to Steer by Mimicking Features from Heterogeneous Auxiliary Networks](https://arxiv.org/abs/1811.02759), Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy, 2018\n- [A Generalized Meta-loss function for regression and classification using privileged information](https://arxiv.org/abs/1811.06885), Amina Asif, Muhammad Dawood, Fayyaz ul Amir Afsar Minhas, 2018\n- [Large scale distributed neural network training through online distillation](https://arxiv.org/abs/1804.03235), Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton, 2018\n- [KDGAN: Knowledge Distillation with Generative Adversarial Networks](http://papers.nips.cc/paper/7358-kdgan-knowledge-distillation-with-generative-adversarial-networks), Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi, 2018\n- [Deep Face Recognition Model Compression via Knowledge Transfer and Distillation](https://arxiv.org/abs/1906.00619), Jayashree Karlekar, Jiashi Feng, Zi Sian Wong, Sugiri Pranata, 2019\n- [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068), Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, 2019\n- [Graph-based Knowledge Distillation by Multi-head Attention Network](https://arxiv.org/abs/1907.02226), Seunghyun Lee, Byung Cheol Song, 2019\n- [Knowledge Adaptation for Efficient Semantic Segmentation](https://arxiv.org/abs/1903.04688), Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019\n- [Structured Knowledge Distillation for Semantic Segmentation](https://arxiv.org/abs/1903.04197), Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang, 2019\n- [Fast Human Pose Estimation](https://arxiv.org/abs/1811.05419), Feng Zhang, Xiatian Zhu, Mao Ye, 2019\n- [MEAL: Multi-Model Ensemble via Adversarial Learning](https://arxiv.org/abs/1812.02425), Zhiqiang Shen, Zhankui He, Xiangyang Xue, 2019\n- [Learning Lightweight Lane Detection CNNs by Self Attention Distillation](https://arxiv.org/abs/1908.00821), Yuenan Hou, Zheng Ma, Chunxiao Liu, Chen Change Loy, 2019\n- [Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher](https://arxiv.org/abs/1902.03393), Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh, 2019\n- [A Comprehensive Overhaul of Feature Distillation](https://arxiv.org/abs/1904.01866), Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi, 2019\n- [Contrastive Representation Distillation](https://arxiv.org/pdf/1910.10699.pdf), Yonglong Tian, Dilip Krishnan, Phillip Isola, 2019\n- [Distillation-Based Training for Multi-Exit Architectures](http://openaccess.thecvf.com/content_ICCV_2019/papers/Phuong_Distillation-Based_Training_for_Multi-Exit_Architectures_ICCV_2019_paper.pdf), Mary Phuong, Christoph H. Lampert, Am Campus, 2019\n- [Learning Metrics from Teachers: Compact Networks for Image Embedding](https://arxiv.org/abs/1904.03624), Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa, 2019\n- [On the Efficacy of Knowledge Distillation](https://arxiv.org/abs/1910.01348), Jang Hyun Cho, Bharath Hariharan, 2019\n- [Revisit Knowledge Distillation: a Teacher-free Framework](https://arxiv.org/abs/1909.11723), Li Yuan, Francis E.H.Tay, Guilin Li, Tao Wang, Jiashi Feng, 2019\n- [Ensemble Distribution Distillation](https://arxiv.org/abs/1905.00076), Andrey Malinin, Bruno Mlodozeniec, Mark Gales, 2019\n- [Improving Generalization and Robustness with Noisy Collaboration in Knowledge Distillation](https://arxiv.org/abs/1910.05057), Elahe Arani, Fahad Sarfraz, Bahram Zonooz, 2019\n- [Self-training with Noisy Student improves ImageNet classification](https://arxiv.org/abs/1911.04252), Qizhe Xie, Eduard Hovy, Minh-Thang Luong, Quoc V. Le, 2019\n- [Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework](https://arxiv.org/abs/1910.12061), Srinidhi Hegde, Ranjitha Prasad, Ramya Hebbalaguppe, Vishwajith Kumar, 2019\n- [Preparing Lessons: Improve Knowledge Distillation with Better Supervision](https://arxiv.org/abs/1911.07471), Tiancheng Wen, Shenqi Lai, Xueming Qian, 2019\n- [Positive-Unlabeled Compression on the Cloud](https://arxiv.org/abs/1909.09757), Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, Chunjing Xu, Dacheng Tao, Chang Xu, 2019\n- [Variational Information Distillation for Knowledge Transfer](https://arxiv.org/abs/1904.05835), Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai, 2019\n- [Knowledge Distillation via Instance Relationship Graph](http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Knowledge_Distillation_via_Instance_Relationship_Graph_CVPR_2019_paper.pdf), Yufan Liu, Jiajiong Cao, Bing Li, Chunfeng Yuan, Weiming Hu, Yangxi Li and Yunqiang Duan, 2019\n- [Knowledge Distillation via Route Constrained Optimization](https://arxiv.org/abs/1904.09149), Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, Xiaolin Hu, 2019\n- [Similarity-Preserving Knowledge Distillation](https://arxiv.org/abs/1907.09682), Frederick Tung, Greg Mori, 2019\n- [Distilling Object Detectors with Fine-grained Feature Imitation](https://arxiv.org/abs/1906.03609), Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng, 2019\n- [Knowledge Squeezed Adversarial Network Compression](https://arxiv.org/abs/1904.05100), Shu Changyong, Li Peng, Xie Yuan, Qu Yanyun, Dai Longquan, Ma Lizhuang, 2019\n- [Stagewise Knowledge Distillation](https://arxiv.org/abs/1911.06786), Akshay Kulkarni, Navid Panchi, Shital Chiddarwar, 2019\n- [Knowledge Distillation from Internal Representations](https://arxiv.org/abs/1910.03723), Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo, 2019\n- [Knowledge Flow: Improve Upon Your Teachers](https://arxiv.org/abs/1904.05878), Iou-Jen Liu, Jian Peng, Alexander G. Schwing, 2019\n- [Graph Representation Learning via Multi-task Knowledge Distillation](https://arxiv.org/abs/1911.05700), Jiaqi Ma, Qiaozhu Mei, 2019\n- [Deep geometric knowledge distillation with graphs](https://arxiv.org/abs/1911.03080), Carlos Lassance, Myriam Bontonou, Ghouthi Boukli Hacene, Vincent Gripon, Jian Tang, Antonio Ortega, 2019\n- [Correlation Congruence for Knowledge Distillation](https://arxiv.org/abs/1904.01802), Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang, 2019\n- [Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://arxiv.org/abs/1905.08094), Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma, 2019 \n- [BAM! Born-Again Multi-Task Networks for Natural Language Understanding](https://arxiv.org/abs/1907.04829), Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le, 2019\n- [Self-Knowledge Distillation in Natural Language Processing](https://arxiv.org/abs/1908.01851), Sangchul Hahn, Heeyoul Choi, 2019\n- [Rethinking Data Augmentation: Self-Supervision and Self-Distillation](https://arxiv.org/abs/1910.05872), Hankook Lee, Sung Ju Hwang, Jinwoo Shin, 2019 \n- [MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks](https://arxiv.org/abs/1911.09418), Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai, 2019\n- [Efficient Video Classification Using Fewer Frames](https://arxiv.org/abs/1902.10640), Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra, 2019\n- [Retaining Privileged Information for Multi-Task Learning](http://web.mit.edu/lilehman/www/paper/LUPI_KDD2019.pdf), Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou, Li-Wei Lehman\n- [Data-Free Learning of Student Networks](https://arxiv.org/pdf/1904.01186.pdf), Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang1, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian, 2019 \n- [Positive-Unlabeled Compression on the Cloud](https://arxiv.org/pdf/1909.09757.pdf), Yixing Xu, Yunhe Wang, Hanting Chen, Kai Han, Chunjing Xu, Dacheng Tao, Chang Xu, 2019\n- [When Does Label Smoothing Help?](https://arxiv.org/pdf/1906.02629.pdf), Rafael Müller, Simon Kornblith, Geoffrey Hinton, 2019\n- [TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/pdf/1909.10351.pdf), Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu, 2019\n- [The State of Knowledge Distillation for Classification](https://arxiv.org/abs/1912.10850), Fabian Ruffy, Karanbir Chahal, 2019\n- [Distilling Task-Specific Knowledge from BERT into Simple Neural Networks](https://arxiv.org/abs/1903.12136), Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, 2019\n- [Channel Distillation: Channel-Wise Attention for Knowledge Distillation](https://arxiv.org/abs/2006.01683), Zaida Zhou, Chaoran Zhuge, Xinwei Guan, Wen Liu, 2020\n- [Residual Knowledge Distillation](https://arxiv.org/abs/2002.09168), Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy, 2020\n- [ResKD: Residual-Guided Knowledge Distillation](https://arxiv.org/abs/2006.04719), Xuewei Li, Songyuan Li, Bourahla Omar, Fei Wu, Xi Li, 2020\n- [Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion](https://arxiv.org/abs/1912.08795), Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz, 2020\n- [MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks](https://arxiv.org/abs/2009.08453), Zhiqiang Shen, Marios Savvides, 2020\n- [MGD: Matching Guided Distillation](https://arxiv.org/abs/2008.09958), Kaiyu Yue, Jiangfan Deng, Feng Zhou, 2020\n- [Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation](https://arxiv.org/abs/2010.07485), Jia Guo, Minghao Chen, Yao Hu, Chen Zhu, Xiaofei He, Deng Cai, 2020\n- [Regularizing Class-wise Predictions via Self-knowledge Distillation](https://arxiv.org/abs/2003.13964), Sukmin Yun, Jongjin Park, Kimin Lee, Jinwoo Shin, 2020\n- [Training data-efficient image transformers \u0026 distillation through attention](https://arxiv.org/abs/2012.12877), Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou, 2020\n- [Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks](https://arxiv.org/abs/2004.05937), Lin Wang, Kuk-Jin Yoon, 2020\n- [Cross-Layer Distillation with Semantic Calibration](https://arxiv.org/abs/2012.03236),Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Yan Feng, Chun Chen, 2020\n- [Subclass Distillation](https://arxiv.org/abs/2002.03936), Rafael Müller, Simon Kornblith, Geoffrey Hinton, 2020\n- [MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis](https://arxiv.org/abs/2104.04767), Sergei Belousov, 2021\n- [Knowledge Distillation: A Survey](https://arxiv.org/abs/2006.05525), Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao, 2021\n- [Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation](https://arxiv.org/abs/2103.08273), Mingi Ji, Seungjae Shin, Seunghyun Hwang, Gibeom Park, Il-Chul Moon, 2021\n- [Complementary Relation Contrastive Distillation](https://arxiv.org/abs/2103.16367),Jinguo Zhu, Shixiang Tang, Dapeng Chen, Shijie Yu, Yakun Liu, Aijun Yang, Mingzhe Rong, Xiaohua Wang, 2021\n- [Distilling Knowledge via Knowledge Review](https://arxiv.org/abs/2104.09044),Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia, 2021\n- [Hierarchical Self-supervised Augmented Knowledge Distillation](https://arxiv.org/abs/2107.13715), Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu, 2021\n- [Causal Distillation for Language Models](https://arxiv.org/abs/2112.02505), Zhengxuan Wu, Atticus Geiger, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman, 2021\n- [How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting](https://arxiv.org/abs/2203.04781), Alessio Monti, Angelo Porrello, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara, 2022\n- [On Distillation of Guided Diffusion Models](https://arxiv.org/abs/2210.03142), Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans, 2022\n- [Progressive Distillation for Fast Sampling of Diffusion Models](https://arxiv.org/abs/2202.00512), Tim Salimans, Jonathan Ho, 2022\n- [Considerations When Learning Additive Explanations for Black-Box Models](https://arxiv.org/pdf/1801.08640.pdf), Sarah Tan, Giles Hooker, Paul Koch, Albert Gordo, Rich Caruana, 2023\n- [UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition](https://arxiv.org/abs/2308.03279), Wenxuan Zhou, Sheng Zhang, Yu Gu, Muhao Chen, Hoifung Poon, 2023\n- [TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation](https://arxiv.org/abs/2303.04248), David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu, 2023\n- [Adversarial Diffusion Distillation](https://arxiv.org/abs/2311.17042), Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach, 2023\n- [One-step Diffusion with Distribution Matching Distillation](https://arxiv.org/abs/2311.18828), Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, Taesung Park, 2023\n- [MobileSAMv2: Faster Segment Anything to Everything](https://arxiv.org/abs/2312.09579), Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, Choong Seon Hong, 2023\n- [Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models](https://arxiv.org/abs/2311.18237), Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel, 2023\n- [Logit Standardization in Knowledge Distillation](https://openaccess.thecvf.com/content/CVPR2024/html/Sun_Logit_Standardization_in_Knowledge_Distillation_CVPR_2024_paper.html), Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, Xiaochun Cao, 2024\n- [VkD : Improving Knowledge Distillation using Orthogonal Projections](https://openaccess.thecvf.com/content/CVPR2024/papers/Miles_VkD_Improving_Knowledge_Distillation_using_Orthogonal_Projections_CVPR_2024_paper.pdf), Roy Miles, Ismail Elezi, Jiankang Deng, 2024\n- [Understanding the Role of the Projector in Knowledge Distillation](https://ojs.aaai.org/index.php/AAAI/article/view/28219/28433), Roy Miles, Krystian Mikolajczyk, 2024\n- [Information Theoretic Representation Distillation](https://bmvc2022.mpi-inf.mpg.de/0385.pdf), Roy Miles, Adrian Lopez Rodriguez, Krystian Mikolajczyk, 2024\n- [Precision Shaking and DORPO: Conceptual Foundations of LLM Knowledge Distillation Methods](https://github.com/OpZest/Papers/blob/main/White_papers/Precision_Shaking_and_DORPO.md), Áron Cserveni, 2024\n- [Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation](https://arxiv.org/abs/2405.05224), Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet, 2024\n- [Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping](https://arxiv.org/abs/2402.19159), Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, Tat-Jen Cham, 2024\n- [Generative Prompt Internalization](https://arxiv.org/abs/2411.15927), Haebin Shin, Lei Ji, Yeyun Gong, Sungdong Kim, Eunbi Choi, Minjoon Seo, 2024\n- [DεpS: Delayed ε-Shrinking for Faster Once-For-All Training](https://arxiv.org/abs/2407.06167), Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov, 2024\n- [Transferring Knowledge from Large Foundation Models to Small Downstream Models](https://arxiv.org/abs/2406.07337), Shikai Qiu, Boran Han, Danielle C. Maddix, Shuai Zhang, Yuyang Wang, Andrew Gordon Wilson, 2024\n- [ScaleKD: Strong Vision Transformers Could Be Excellent Teachers](https://arxiv.org/abs/2411.06786), Jiawei Fan, Chao Li, Xiaolong Liu and Anbang Yao, 2024\n- [Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation](https://arxiv.org/abs/2412.08139), Jiaming Lv, Haoyuan Yang, Peihua Li, 2024\n- [Simple Unsupervised Knowledge Distillation With Space Similarity](https://arxiv.org/abs/2409.13939), Aditya Singh, Haohan Wang, 2024\n- [EchoDFKD: Data-Free Knowledge Distillation for Cardiac Ultrasound Segmentation using Synthetic Data](https://arxiv.org/abs/2409.07566), Grégoire Petit, Nathan Palluau, Axel Bauer, Clemens Dlaska, 2025\n- [MaKD: Multi-aspect Knowledge Distillation with Large Language Model](https://arxiv.org/pdf/2501.13341), Taegyeong Lee, et al. 2025\n- [Autoregressive Distillation of Diffusion Transformers](https://arxiv.org/abs/2504.11295), Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, 2025\n- [CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation](https://arxiv.org/pdf/2505.21904), Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu, 2025\n***\n### Videos\n- [Dark knowledge](https://www.youtube.com/watch?v=EK61htlw8hY), Geoffrey Hinton, 2014\n- [Model Compression](https://www.youtube.com/watch?v=0WZmuryQdgg), Rich Caruana, 2016\n\n***\n### Implementations\n\n## MXNet\n- [Bayesian Dark Knowledge](https://github.com/dmlc/mxnet/blob/master/example/bayesian-methods/bdk.ipynb)\n\n## PyTorch\n- [Attention Transfer](https://github.com/szagoruyko/attention-transfer)\n- [Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model](https://github.com/jiasenlu/visDial.pytorch)\n- [Interpreting Deep Classifier by Visual Distillation of Dark Knowledge](https://github.com/xukai92/darksight)\n- [Mean teachers are better role models](https://github.com/CuriousAI/mean-teacher)\n- [Relational Knowledge Distillation](https://github.com/lenscloth/RKD)\n- [Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons](https://github.com/bhheo/AB_distillation)\n- [Fast Human Pose Estimation Pytorch](https://github.com/yuanyuanli85/Fast_Human_Pose_Estimation_Pytorch)\n- [MEAL: Multi-Model Ensemble via Adversarial Learning](https://github.com/AaronHeee/MEAL)\n- [MEAL-V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks](https://github.com/szq0214/MEAL-V2)\n- [Using Teacher Assistants to Improve Knowledge Distillation](https://github.com/imirzadeh/Teacher-Assistant-Knowledge-Distillation)\n- [A Comprehensive Overhaul of Feature Distillation](https://github.com/clovaai/overhaul-distillation)\n- [Contrastive Representation Distillation](https://github.com/HobbitLong/RepDistiller)\n- [Transformer model distillation](http://nlp_architect.nervanasys.com/transformers_distillation.html)\n- [TinyBERT](https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT)\n- [Data Efficient Model Compression ](https://github.com/huawei-noah/Data-Efficient-Model-Compression)\n- [Channel Distillation](https://github.com/zhouzaida/channel-distillation)\n- [Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion](https://github.com/NVlabs/DeepInversion)\n- [MGD: Matching Guided Distillation](https://github.com/KaiyuYue/mgd)\n- [torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation](https://github.com/yoshitomo-matsubara/torchdistill)\n- [Knowledge Distillation on SSD](https://github.com/SsisyphusTao/SSD-Knowledge-Distillation)\n- [distiller: A large scale study of Knowledge Distillation](https://github.com/karanchahal/distiller)\n- [Knowledge-Distillation-Zoo: Pytorch implementation of various Knowledge Distillation (KD) methods](https://github.com/AberHu/Knowledge-Distillation-Zoo)\n- [A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility](https://github.com/peterliht/knowledge-distillation-pytorch)\n- [Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.](https://github.com/IntelLabs/distiller)\n- [KD_Lib :  A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization. ](https://github.com/SforAiDl/KD_Lib)\n- [Vision Transformer Distillation](https://github.com/lucidrains/vit-pytorch#distillation)\n- [Cross-Layer Distillation with Semantic Calibration](https://github.com/DefangChen/SemCKD)\n- [Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation](https://github.com/MingiJi/FRSKD)\n- [Distilling Knowledge via Knowledge Review](https://github.com/dvlab-research/ReviewKD)\n- [Hierarchical Self-supervised Augmented Knowledge Distillation](https://github.com/winycg/HSAKD)\n- [Causal Distillation for Language Models](https://github.com/frankaging/Causal-Distill)\n- [UniversalNER](https://github.com/universal-ner/universal-ner)\n- [MobileSAM](https://github.com/ChaoningZhang/MobileSAM)\n- [Logit-Standardization-KD](https://github.com/sunshangquan/logit-standardization-KD)\n- [Delayed Eps-Shrinking for Faster Once-For-All Training](https://github.com/gatech-sysml/deps)\n- [Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation](https://github.com/JiamingLv/WKD)\n- [EchoDFKD](https://github.com/GregoirePetit/EchoDFKD)\n- [Autoregressive Distillation of Diffusion Transformers (ARD)](https://github.com/alsdudrla10/ARD)\n- [Simple Unsupervised Knowledge Distillation With Space Similarity](https://github.com/paganpasta/coss-pytorch)\n\n## Lua\n- [Example for teacher/student-based learning ](https://github.com/hoelzl/Academia)\n\n## Torch\n- [Distilling knowledge to specialist ConvNets for clustered classification ](https://github.com/natoromano/specialistnets)\n- [Sequence-Level Knowledge Distillation](https://github.com/harvardnlp/seq2seq-attn), [Neural Machine Translation on Android](https://github.com/harvardnlp/nmt-android)\n- [cifar.torch distillation](https://github.com/samirasamadi/Distillation)\n- [ENet-SAD](https://github.com/cardwing/Codes-for-Lane-Detection)\n\n## Theano\n- [FitNets: Hints for Thin Deep Nets](https://github.com/net-titech/distillation/tree/master/FitNets)\n- [Transfer knowledge from a large DNN or an ensemble of DNNs into a small DNN](https://github.com/tejasgodambe/knowledge-distillation)\n\n## Lasagne + Theano\n- [Experiments-with-Distilling-Knowledge](https://github.com/usholanb/Experiments-with-Distilling-Knowledge)\n\n## Tensorflow\n- [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://github.com/chengshengchan/model_compression)\n- [Distillation](https://github.com/suhangpro/distillation)\n- [An example application of neural network distillation to MNIST](https://github.com/akamaus/mnist-distill)\n- [Data-free Knowledge Distillation for Deep Neural Networks](https://github.com/iRapha/replayed_distillation)\n- [Inspired by net2net, network distillation ](https://github.com/luzai/NetworkCompress)\n- [Deep Reinforcement Learning, knowledge transfer](https://github.com/arnomoonens/DeepRL/tree/master/agents/knowledgetransfer)\n- [Knowledge Distillation using Tensorflow](https://github.com/DushyantaDhyani/kdtf)\n- [Knowledge Distillation Methods with Tensorflow](https://github.com/sseung0703/KD_methods_with_TF)\n- [Zero-Shot Knowledge Distillation in Deep Networks in ICML2019 ](https://github.com/sseung0703/Zero-shot_Knowledge_Distillation)\n- [Knowledge_distillation_benchmark via Tensorflow2.0](https://github.com/sseung0703/Knowledge_distillation_via_TF2.0)\n\n## Caffe\n- [Face Model Compression by Distilling Knowledge from Neurons](https://github.com/liuziwei7/mobile-id)\n- [KnowledgeDistillation Layer (Caffe implementation)](https://github.com/wentianli/knowledge_distillation_caffe)\n- [Knowledge distillation, realized in caffe ](https://github.com/smn2010/caffe_kdistill)\n- [Cross Modal Distillation for Supervision Transfer](https://github.com/xiaolonw/fast-rcnn-distillation)\n- [Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection](https://github.com/Yochengliu/MLIC-KD-WSD)\n- [Knowledge Distillation via Instance Relationship Graph](https://github.com/yufanLIU/IRG)\n\n## Keras\n- [Knowledge distillation with Keras](https://github.com/TropComplique/knowledge-distillation-keras)\n- [keras google-vision's distillation ](https://github.com/GINK03/keras-distillation)\n- [Distilling the knowledge in a Neural Network](https://github.com/tejasgodambe/knowledge-distillation)\n\n##\n- https://github.com/PaddlePaddle/PaddleSlim/tree/release/2.0.0/docs/zh_cn/tutorials\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkozlov%2Fawesome-knowledge-distillation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdkozlov%2Fawesome-knowledge-distillation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdkozlov%2Fawesome-knowledge-distillation/lists"}