{"id":13704559,"url":"https://github.com/lhyfst/knowledge-distillation-papers","last_synced_at":"2025-05-05T10:30:36.372Z","repository":{"id":34574251,"uuid":"161295505","full_name":"lhyfst/knowledge-distillation-papers","owner":"lhyfst","description":"knowledge distillation papers","archived":false,"fork":false,"pushed_at":"2023-02-10T02:34:12.000Z","size":329,"stargazers_count":731,"open_issues_count":2,"forks_count":82,"subscribers_count":37,"default_branch":"master","last_synced_at":"2024-08-24T14:36:30.888Z","etag":null,"topics":["dark-knowledge","knowledge-distillation","model-compression","paper","reading-list"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lhyfst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-12-11T07:36:55.000Z","updated_at":"2024-08-22T09:23:28.000Z","dependencies_parsed_at":"2024-01-07T00:07:52.331Z","dependency_job_id":"659e3118-316f-40cd-98fa-7c3c215d65ea","html_url":"https://github.com/lhyfst/knowledge-distillation-papers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhyfst%2Fknowledge-distillation-papers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhyfst%2Fknowledge-distillation-papers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhyfst%2Fknowledge-distillation-papers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lhyfst%2Fknowledge-distillation-papers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lhyfst","download_url":"https://codeload.github.com/lhyfst/knowledge-distillation-papers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224440016,"owners_count":17311570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dark-knowledge","knowledge-distillation","model-compression","paper","reading-list"],"created_at":"2024-08-02T21:01:12.061Z","updated_at":"2024-11-13T11:31:33.993Z","avatar_url":"https://github.com/lhyfst.png","language":null,"funding_links":[],"categories":["Others","REFERENCE","Related Repos","Related Repo"],"sub_categories":["2023","Driver","2015"],"readme":"# knowledge distillation papers\n\n\n## Early Papers\n\n* [Model Compression](http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf), Rich Caruana, 2006\n* [Distilling the Knowledge in a Neural Network](https://arxiv.org/pdf/1503.02531.pdf), Hinton, J.Dean, 2015\n* [Knowledge Acquisition from Examples Via Multiple Models](https://homes.cs.washington.edu/~pedrod/papers/mlc97.pdf), Perdo Domingos, 1997\n* [Combining labeled and unlabeled data with co-training](https://www.cs.cmu.edu/~avrim/Papers/cotrain.pdf), A. Blum, T. Mitchell, 1998 \n* [Using A Neural Network to Approximate An Ensemble of Classifiers](http://axon.cs.byu.edu/papers/zeng.npl2000.pdf), Xinchuan Zeng and Tony R. Martinez, 2000\n* [Do Deep Nets Really Need to be Deep?](https://arxiv.org/pdf/1312.6184.pdf), Lei Jimmy Ba, Rich Caruana, 2014\n\n\n## Recommended Papers\n\n* [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550), Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015\n* [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928), Sergey Zagoruyko, Nikos Komodakis, 2016\n* [A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf), Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, 2017\n* [Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513.pdf), Zheng Xu, Yen-Chang Hsu, Jiawei Huang\n* [Born Again Neural Networks](https://arxiv.org/abs/1805.04770), Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018\n* [Net2Net: Accelerating Learning Via Knowledge Transfer](https://arxiv.org/pdf/1511.05641.pdf), Tianqi Chen, Ian Goodfellow, Jonathon Shlens, 2016\n* [Unifying distillation and privileged information](https://arxiv.org/pdf/1511.03643), David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015\n* [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/pdf/1511.04508.pdf), Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, 2016\n* [Large scale distributed neural network training through online distillation](https://arxiv.org/pdf/1804.03235.pdf), Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton, 2018\n* [Deep Mutual Learning](https://arxiv.org/pdf/1706.00384.pdf), Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, 2017\n* [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513), Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017\n* [Data-Free Knowledge Distillation for Deep Neural Networks](https://arxiv.org/pdf/1710.07535.pdf), Raphael Gontijo Lopes, Stefano Fenu, Thad Starner, 2017\n* [Quantization Mimic: Towards Very Tiny CNN for Object Detection](https://arxiv.org/pdf/1805.02152.pdf), Yi Wei, Xinyu Pan, Hongwei Qin, Wanli Ouyang, Junjie Yan, 2018\n* [Knowledge Projection for Deep Neural Networks](https://arxiv.org/pdf/1710.09505), Zhi Zhang, Guanghan Ning, Zhihai He, 2017\n* [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613), Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017\n* [Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving](https://arxiv.org/pdf/1804.06332.pdf), Jiaolong Xu, Peng Wang, Heng Yang and Antonio M. L ´opez, 2018\n* [Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net](https://arxiv.org/pdf/1708.04106.pdf), Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017\n* [Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher](https://arxiv.org/pdf/1902.03393.pdf), Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh, 2019\n* [ResKD: Residual-Guided Knowledge Distillation](https://arxiv.org/pdf/2006.04719.pdf), Xuewei Li, Songyuan Li, Bourahla Omar, and Xi Li, 2020\n* [Rethinking Data Augmentation: Self-Supervision and Self-Distillation](https://arxiv.org/abs/1910.05872), Hankook Lee, Sung Ju Hwang, Jinwoo Shin, 2019 \n* [MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks](https://arxiv.org/abs/1911.09418), Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai, 2019\n* [Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://arxiv.org/abs/1905.08094), Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma, 2019 \n\n\n## 2016\n* [Cross Modal Distillation for Supervision Transfer](https://people.eecs.berkeley.edu/~jhoffman/papers/Gupta_CVPR16.pdf), Saurabh Gupta, Judy Hoffman, Jitendra Malik, CVPR 2016\n* [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650), Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016\n* [Knowledge Distillation for Small-footprint Highway Networks](https://arxiv.org/pdf/1608.00892), Liang Lu, Michelle Guo, Steve Renals, 2016\n* [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947), [deeplearning-papernotes](https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/seq-knowledge-distillation.md), Yoon Kim, Alexander M. Rush, 2016\n* [Recurrent Neural Network Training with Dark Knowledge Transfer](https://arxiv.org/pdf/1505.04630.pdf), Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016\n* [Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/11977/12130), Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang, and Xiaoou Tang, 2016\n* [Sequence-Level Knowledge Distillation](https://arxiv.org/abs/1606.07947), Yoon Kim, Alexander M. Rush, EMNLP 2016\n* [Distilling Word Embeddings: An Encoding Approach](https://arxiv.org/abs/1506.04488), Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin, CIKM 2016\n\n## 2017\n\n* [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/pdf/1712.04440), Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, CVPR 2017\n* [Knowledge Projection for Deep Neural Networks](https://arxiv.org/pdf/1710.09505), Zhi Zhang, Guanghan Ning, Zhihai He, 2017\n* [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219), Zehao Huang, Naiyan Wang, 2017\n* [Data-Free Knowledge Distillation For Deep Neural Networks](http://raphagl.com/research/replayed-distillation/), Raphael Gontijo Lopes, Stefano Fenu, 2017 \n* [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220), Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017\n* [Adapting Models to Signal Degradation using Distillation](https://arxiv.org/pdf/1604.00433.pdf), Jong-Chyi Su, Subhransu Maji, BMVC 2017\n* [Cross-lingual Distillation for Text Classification](https://arxiv.org/abs/1705.02073), Ruochen Xu, Yiming Yang, ACL 2017, [code](https://github.com/xrc10/cross-distill)\n\n## 2018\n\n* [Learning Global Additive Explanations for Neural Nets Using Model Distillation](https://arxiv.org/pdf/1801.08640.pdf), Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo, 2018\n* [YASENN: Explaining Neural Networks via Partitioning Activation Sequences](https://arxiv.org/pdf/1811.02783), Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018\n* [Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results](https://arxiv.org/pdf/1703.01780), Antti Tarvainen, Harri Valpola, 2018\n* [Local Affine Approximators for Improving Knowledge Transfer](https://lld-workshop.github.io/2017/papers/LLD_2017_paper_28.pdf), Suraj Srinivas \u0026 François Fleuret, 2018\n* [Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?](https://arxiv.org/pdf/1806.07550.pdf)Shilin Zhu, Xin Dong, Hao Su, 2018\n* [Probabilistic Knowledge Transfer for deep representation learning](https://arxiv.org/pdf/1803.10837.pdf), Nikolaos Passalis, Anastasios Tefas, 2018\n* [Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons](https://arxiv.org/pdf/1811.03233.pdf), Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018\n* [Paraphrasing Complex Network: Network Compression via Factor Transfer](https://arxiv.org/pdf/1802.04977.pdf), Jangho Kim, SeongUk Park, Nojun Kwak, NIPS, 2018\n* [KDGAN: Knowledge Distillation with Generative Adversarial Networks](https://proceedings.neurips.cc/paper/2018/file/019d385eb67632a7e958e23f24bd07d7-Paper.pdf), Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi, NeurIPS 2018\n* [Distilling Knowledge for Search-based Structured Prediction](https://aclanthology.org/P18-1129/), Yijia Liu, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu, ACL 2018\n\n\n## 2019\n* [Learning Efficient Detector with Semi-supervised Adaptive Distillation](https://arxiv.org/pdf/1901.00366.pdf), Shitao Tang, Litong Feng, Zhanghui Kuang, Wenqi Shao, Quanquan Li, Wei Zhang, Yimin Chen, 2019\n* [Dataset Distillation](https://arxiv.org/pdf/1811.10959.pdf), Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros, 2019\n* [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068), Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, 2019\n* [Knowledge Adaptation for Efficient Semantic Segmentation](https://arxiv.org/abs/1903.04688), Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019\n* [A Comprehensive Overhaul of Feature Distillation](https://arxiv.org/abs/1904.01866), Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi, 2019, [code](https://github.com/clovaai/overhaul-distillation)\n* [Towards Understanding Knowledge Distillation](http://arxiv.org/abs/2002.03532), Mary Phuong, Christoph Lampert, ICML, 2019\n* [Knowledge Distillation from Internal Representations](https://arxiv.org/abs/1910.03723), Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo, 2019\n* [Knowledge Flow: Improve Upon Your Teachers](https://arxiv.org/abs/1904.05878), Iou-Jen Liu, Jian Peng, Alexander G. Schwing, 2019\n* [Similarity-Preserving Knowledge Distillation](https://arxiv.org/pdf/1907.09682.pdf), Frederick Tung, Greg Mori, 2019\n* [Correlation Congruence for Knowledge Distillation](Correlation Congruence for Knowledge Distillation), Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang, 2019\n* [Variational Information Distillation for Knowledge Transfer](https://arxiv.org/pdf/1904.05835.pdf), Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai, 2019\n* [Knowledge Distillation via Instance Relationship Graph](https://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Knowledge_Distillation_via_Instance_Relationship_Graph_CVPR_2019_paper.pdf), Yufan Liu, Jiajiong Cao, Bing Lia, Chunfeng Yuan, Weiming Hua, Yangxi Lic, Yunqiang Duan, CVPR 2019\n* [Structured Knowledge Distillation for Semantic Segmentation](https://arxiv.org/pdf/1903.04197.pdf), Yifan Liu, Changyong Shu, Jingdong Wang, Chunhua Shen, CVPR 2019\n* [Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention](https://aclanthology.org/P19-1305/), Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo, ACL 2019, [code](https://github.com/KelleyYin/Cross-lingual-Summarization)\n* [Distilling Task-Specific Knowledge from BERT into Simple Neural Networks](https://arxiv.org/abs/1903.12136), Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, arXiv, 2019\n* [Multilingual Neural Machine Translation with Knowledge Distillation](https://arxiv.org/abs/1902.10461), Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu, ICLR 2019\n* [BAM! Born-Again Multi-Task Networks for Natural Language Understanding](https://arxiv.org/abs/1907.04829), Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le, ACL 2019\n* [Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding](https://arxiv.org/abs/1904.09482), Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, arXiv 2019\n* [Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection](https://ojs.aaai.org//index.php/AAAI/article/view/4649), AAAI 2019\n\n\n## 2020\n* [Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion](https://arxiv.org/abs/1912.08795), Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz, 2020\n* [Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation](https://arxiv.org/abs/2010.07485), Jia Guo, Minghao Chen, Yao Hu, Chen Zhu, Xiaofei He, Deng Cai, 2020\n* [Data-Free Adversarial Distillation](https://arxiv.org/pdf/1912.11006.pdf), Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song, 2020\n* [Contrastive Representation Distillation](https://arxiv.org/abs/1910.10699v2), Yonglong Tian, Dilip Krishnan, Phillip Isola, ICLR 2020, [code](https://github.com/HobbitLong/RepDistiller)\n* [StyleGAN2 Distillation for Feed-forward Image Manipulation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670171.pdf), Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin, ECCV 2020\n* [Distilling Knowledge from Graph Convolutional Networks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Distilling_Knowledge_From_Graph_Convolutional_Networks_CVPR_2020_paper.pdf), Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, Xinchao Wang, CVPR 2020\n* [Self-supervised Knowledge Distillation for Few-shot Learning](https://arxiv.org/abs/2006.09785), Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah, 2020, [code](https://github.com/brjathu/SKD)\n* [Online Knowledge Distillation with Diverse Peers](https://arxiv.org/abs/1912.00350), Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng and Chun Chen, AAAI, 2020\n* [Intra-class Feature Variation Distillation for Semantic Segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520341.pdf), Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, and Yongchao Xu, ECCV 2020\n* [Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123690324.pdf), Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, and Tao Mei, ECCV 2020\n* [Improving Face Recognition from Hard Samples via Distribution Distillation Loss](https://arxiv.org/abs/2002.03662), Yuge Huang, Pengcheng Shen, Ying Tai, Shaoxin Li, Xiaoming Liu, Jilin Li, Feiyue Huang, Rongrong Ji, ECCV 2020\n* [Distilling Knowledge Learned in BERT for Text Generation](https://arxiv.org/abs/1911.03829), Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu, ACL 2020, [code](https://github.com/ChenRocks/Distill-BERT-Textgen)\n\n## 2021\n* [Dataset Distillation with Infinitely Wide Convolutional Networks](https://openreview.net/forum?id=hXWPpJedrVP), Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee, 2021\n* [Dataset Meta-Learning from Kernel Ridge-Regression](https://openreview.net/forum?id=l-PrrQrK0QR), Timothy Nguyen, Zhourong Chen, Jaehoon Lee, 2021\n* [Up to 100× Faster Data-free Knowledge Distillation](https://arxiv.org/pdf/2112.06253.pdf), Gongfan Fang1, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song, 2021\n* [Robustness and Diversity Seeking Data-Free Knowledge Distillation](https://arxiv.org/pdf/2011.03749.pdf), Pengchao Han, Jihong Park, Shiqiang Wang, Yejun Liu, 2021\n* [Data-Free Knowledge Transfer: A Survey](https://arxiv.org/pdf/2112.15278.pdf), Yuang Liu, Wei Zhang, Jun Wang, Jianyong Wang, 2021\n* [Undistillable: Making A Nasty Teacher That CANNOT teach students](https://openreview.net/forum?id=0zvfm-nZqQs), Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang, ICLR 2021\n* [QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning](https://arxiv.org/abs/2107.13892), Kaan Ozkara, Navjot Singh, Deepesh Data, Suhas Diggavi, NeurIPS 2021\n* [KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation](https://arxiv.org/abs/2109.10504), Yongfei Liu, Chenfei Wu, Shao-yen Tseng, Vasudev Lal, Xuming He, Nan Duan\n* [Online Knowledge Distillation for Efficient Pose Estimation](https://arxiv.org/abs/2108.02092), Zheng Li, Jingwen Ye, Mingli Song, Ying Huang, Zhigeng Pan, ICCV 2021\n* [Does Knowledge Distillation Really Work?](https://arxiv.org/abs/2106.05945), Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson, NeurIPS 2021\n* [Hierarchical Self-supervised Augmented Knowledge Distillation](https://arxiv.org/abs/2107.13715), Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu, IJCAI 2021\n* [DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis With GANs](https://archives.ismir.net/ismir2021/paper/000060.pdf), Javier Nistal, Stefan Lattner, Gaël Richard, ISMIR2021\n* [On Self-Distilling Graph Neural Network](https://www.ijcai.org/proceedings/2021/0314.pdf), Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang, IJCAI 2021\n* [Graph-Free Knowledge Distillation for Graph Neural Networks](https://www.ijcai.org/proceedings/2021/0320.pdf), Xiang Deng, Zhongfei Zhang, IJCAI 2021\n* [Self Supervision to Distillation for Long-Tailed Visual Recognition](https://openaccess.thecvf.com/content/ICCV2021/papers/Li_Self_Supervision_to_Distillation_for_Long-Tailed_Visual_Recognition_ICCV_2021_paper.pdf), Tianhao Li, Limin Wang, Gangshan Wu, ICCV 2021\n* [Cross-Layer Distillation with Semantic Calibration](https://arxiv.org/abs/2012.03236), Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen, AAAI 2021\n* [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256), Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen, ICCV 2021\n* [Training data-efficient image transformers \u0026 distillation through attention](https://icml.cc/virtual/2021/poster/8671), Hugo Touvron, Matthieu Cord, Douze Matthijs, Francisco Massa, Alexandre Sablayrolles, Herve Jegou, ICML 2021\n* [Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation](https://arxiv.org/abs/2202.03680), Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang, ICCV 2021, [code](https://github.com/ADLab-AutoDrive/ICKD)\n* [torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation](https://arxiv.org/abs/2011.12913), Yoshitomo Matsubara, International Workshop on Reproducible Research in Pattern Recognition 2021, [code](https://github.com/yoshitomo-matsubara/torchdistill)\n\n## 2022\n* [LGD: Label-guided Self-distillation for Object Detection](https://arxiv.org/abs/2109.11496), Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun, AAAI 2022\n* [MonoDistill: Learning Spatial Features for Monocular 3D Object Detection](https://openreview.net/forum?id=C54V-xTWfi), Anonymous, ICLR 2022\n* [Bag of Instances Aggregation Boosts Self-supervised Distillation](https://openreview.net/forum?id=N0uJGWDw21d), Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian, ICLR 2022\n* [Meta Learning for Knowledge Distillation](https://arxiv.org/abs/2106.04570), Wangchunshu Zhou, Canwen Xu, Julian McAuley, 2022\n* [Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837), Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan, CVPR 2022\n* [Self-Distilled StyleGAN: Towards Generation from Internet Photos](https://arxiv.org/abs/2202.12211), Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri, 2022\n* [Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation](https://arxiv.org/abs/2112.04840), Gang Li, Xiang Li, Yujie Wang, Shanshan Zhang, Yichao Wu, Ding Liang, AAAI 2022\n* [Decoupled Knowledge Distillation](https://arxiv.org/abs/2203.08679), Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang, CVPR 2022, [code](https://github.com/megvii-research/mdistiller)\n* [Graph Flow: Cross-layer Graph Flow Distillation for Dual-Efficient Medical Image Segmentation](https://arxiv.org/abs/2203.08667), Wenxuan Zou, Muyi Sun, 2022\n* [Dataset Distillation by Matching Training Trajectories](https://arxiv.org/abs/2203.11932), George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu, CVPR 2022\n* [Knowledge Distillation with the Reused Teacher Classifier](https://arxiv.org/abs/2203.14001), Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen, CVPR 2022\n* [Self-Distillation from the Last Mini-Batch for Consistency Regularization](https://arxiv.org/abs/2203.16172), Shen Yiqing, Xu Liwu, Yang Yuzhe, Li Yaqian and Guo Yandong, CVPR 2022 [code](https://github.com/dongkyuk/DLB-Pytorch)\n* [DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers](), Xianing Chen, Qiong Cao, Yujie Zhong, Shenghua Gao, CVPR 2022\n* [Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning](https://arxiv.org/abs/2203.09249), Lin Zhang, Li Shen, Liang Ding, Dacheng Tao, Ling-Yu Duan, CVPR 2022\n* [LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection](https://arxiv.org/abs/2203.14956), Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jiwen Lu, Jie Zhou, 2022\n* [Localization Distillation for Dense Object Detection](https://arxiv.org/abs/2102.12252), Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Zuo, Qibin Hou, Ming-Ming Cheng, CVPR 2022, [code](https://github.com/HikariTJU/LD)\n* [Localization Distillation for Object Detection](https://arxiv.org/abs/2204.05957), Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, Wangmeng Zuo, Ming-Ming Cheng, 2022, [code](https://github.com/Zzh-tju/Rotated-LD)\n* [Cross-Image Relational Knowledge Distillation for Semantic Segmentation](https://arxiv.org/abs/2204.06986), Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, Qian Zhang, CVPR 2022, [code](https://github.com/winycg/CIRKD)\n* [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237), Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov, CVPR 2022\n* [Spot-adaptive Knowledge Distillation](https://arxiv.org/abs/2205.02399), Jie Song, Ying Chen, Jingwen Ye, Mingli Song, TIP 2022, [code](https://github.com/zju-vipa/spot-adaptive-pytorch)\n* [MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning](https://arxiv.org/abs/2203.03137), Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, Xinge You, CVPR 2022\n* [Knowledge Distillation via the Target-aware Transformer](https://arxiv.org/abs/2205.10793), Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022\n* [PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection](https://arxiv.org/abs/2205.11098), Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma, arXiv 2022, [code](https://github.com/RunpeiDong/PointDistiller)\n* [Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation](https://arxiv.org/abs/2203.06321), Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma, CVPR 2022\n* [Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation](https://arxiv.org/abs/2205.14141), Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo, Tech Report 2022, [code](https://github.com/SwinTransformer/Feature-Distillation)\n* [Knowledge Distillation via the Target-aware Transformer](https://arxiv.org/abs/2205.10793), Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022\n* [BERT Learns to Teach: Knowledge Distillation with Meta Learning](https://arxiv.org/abs/2106.04570), Wangchunshu Zhou, Canwen Xu, Julian McAuley, ACL 2022, [code](https://github.com/JetRunner/MetaDistil)\n* [Nearest Neighbor Knowledge Distillation for Neural Machine Translation](https://arxiv.org/abs/2205.00479), Zhixian Yang, Renliang Sun, Xiaojun Wan, NAACL 2022\n* [Knowledge Condensation Distillation](https://arxiv.org/abs/2207.05409), Chenxin Li, Mingbao Lin, Zhiyuan Ding, Nie Lin, Yihong Zhuang, Yue Huang, Xinghao Ding, Liujuan Cao, ECCV 2022, [code](https://github.com/dzy3/KCD)\n* [Masked Generative Distillation](https://arxiv.org/abs/2205.01529), Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan, ECCV 2022, [code](https://github.com/yzd-v/MGD)\n* [DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection](https://arxiv.org/abs/2207.05536), Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang\n* [Distilled Dual-Encoder Model for Vision-Language Understanding](https://arxiv.org/abs/2112.08723), Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei, [code](https://github.com/kugwzk/Distilled-DualEncoder)\n* [Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection](https://arxiv.org/abs/2207.02541), Hongyu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun, ECCV 2022, [code](https://github.com/Megvii-BaseDetection/DenseTeacher)\n* [Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation](https://arxiv.org/abs/2107.01378), Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang\n* [TinyViT: Fast Pretraining Distillation for Small Vision Transformers](), Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang, ECCV 2022\n* [Self-slimmed Vision Transformer](https://arxiv.org/abs/2111.12624), Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu, ICLR 2022\n* [KD-MVS: Knowledge Distillation Based Self-supervised Learning for MVS](https://arxiv.org/abs/2207.10425), Yikang Ding, Qingtian Zhu, Xiangyue Liu, Wentao Yuan, Haotian Zhang, CHi Zhang, ECCV 2022, [code](https://github.com/megvii-research/kd-mvs)\n* [Rethinking Data Augmentation for Robust Visual Question Answering](https://arxiv.org/abs/2207.08739), Long Chen, Yuhang Zheng, Jun Xiao, ECCV 2022, [code](https://github.com/ItemZheng/KDDAug)\n* [ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval](https://arxiv.org/abs/2205.09153), Yuxiang Lu, Yiding Liu, Jiaxiang Liu, Yunsheng Shi, Zhengjie Huang, Shikun Feng Yu Sun, Hao Tian, Hua Wu, Shuaiqiang Wang, Dawei Yin, Haifeng Wang\n* [Prune Your Model Before Distill It](https://arxiv.org/abs/2109.14960), Jinhyuk Park, Albert No, ECCV 2022, [code](https://github.com/ososos888/prune-then-distill)\n* [Efficient One Pass Self-distillation with Zipf's Label Smoothing](https://arxiv.org/abs/2207.12980), Jiajun Liang, Linze Li, Zhaodong Bing, Borui Zhao, Yao Tang, Bo Lin, Haoqiang Fan, ECCV 2022, [code](https://github.com/megvii-research/zipfls)\n* [R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis](https://arxiv.org/abs/2203.17261), ECCV 2022, [code](https://github.com/snap-research/R2L)\n* [D3Former: Debiased Dual Distilled Transformer for Incremental Learning](https://arxiv.org/abs/2208.00777), Abdelrahman Mohamed, Rushali Grandhe, KJ Joseph, Salman Khan, Fahad Khan, [code](https://github.com/abdohelmy/D-3Former)\n* [SdAE: Self-distillated Masked Autoencoder](https://arxiv.org/abs/2208.00449), Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong, Qi Tian, ECCV 2022, [code](https://github.com/AbrahamYabo/SdAE)\n* [Masked Generative Distillation](https://arxiv.org/abs/2205.01529), Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan, ECCV 2022, [code](https://github.com/yzd-v/MGD)\n* [MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition](https://arxiv.org/abs/2208.05768), Chuanguang Yang, Zhulin An, Helong Zhou, Linhang Cai, Xiang Zhi, Jiwen Wu, Yongjun Xu, Qian Zhang, ECCV 2022, [code](https://github.com/winycg/Self-KD-Lib)\n* [Mind the Gap in Distilling StyleGANs](https://arxiv.org/abs/2208.08840), Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy, ECCV 2022, [code](https://github.com/xuguodong03/StyleKD)\n* [Prune Your Model Before Distill It](https://arxiv.org/pdf/2109.14960.pdf), Jinhyuk Park and Albert No, ECCV 2022, [code](https://github.com/ososos888/prune-then-distill)\n* [HIRE: Distilling high-order relational knowledge from heterogeneous graph neural networks](https://arxiv.org/abs/2207.11887), Jing Liu, Tongya Zheng, Qinfen Hao, Neurocomputing\n* [A Fast Knowledge Distillation Framework for Visual Recognition](https://arxiv.org/abs/2112.01528), Zhiqiang Shen, Eric Xing, ECCV 2022, [code](https://github.com/szq0214/FKD)\n* [Knowledge Distillation from A Stronger Teacher](https://arxiv.org/abs/2205.10536), Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu, NeurIPS 2022, [code](https://github.com/hunto/DIST_KD)\n* [ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval](https://arxiv.org/abs/2207.14757), Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara, CBMI 2022, [code](https://github.com/mesnico/ALADIN)\n* [Towards Efficient 3D Object Detection with Knowledge Distillation](https://arxiv.org/abs/2205.15156), Jihan Yang, Shaoshuai Shi, Runyu Ding, Zhe Wang, Xiaojuan Qi, NeurlPS 2022, [code](https://github.com/CVMI-Lab/SparseKD)\n* [Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher](https://arxiv.org/abs/2110.08532), Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi, COLING 2022\n* [Noisy Self-Knowledge Distillation for Text Summarization](https://arxiv.org/abs/2009.07032), Yang Liu, Sheng Shen, Mirella Lapata, arXiv 2021\n* [On Distillation of Guided Diffusion Models](https://arxiv.org/abs/2210.03142), Chenlin Meng, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans, arXiv 2022\n* [ViTKD: Practical Guidelines for ViT feature knowledge distillation](https://arxiv.org/abs/2209.02432), Zhendong Yang, Zhe Li, Ailing Zeng, Zexian Li, Chun Yuan, Yu Li, arXiv 2022, [code](https://github.com/yzd-v/cls_KD)\n* [Self-Regulated Feature Learning via Teacher-free Feature Distillation](https://lilujunai.github.io/Teacher-free-Distillation/), Lujun Li, ECCV 2022, [code](https://github.com/lilujunai/Teacher-free-Distillation)\n* [DETRDistill: A Universal Knowledge Distillation Framework for DETR-families](https://arxiv.org/abs/2211.10156v2), Jiahao Chang, Shuo Wang, Guangkai Xu, Zehui Chen, Chenhongyi Yang, Feng Zhao, arXiv 2022\n* [Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression](https://ieeexplore.ieee.org/abstract/document/9804342), Yufan Liu, Jiajiong Cao, Bing Li, Weiming Hu, Stephen Maybank, TPAMI 2022\n* [Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?](https://proceedings.mlr.press/v162/chandrasegaran22a/chandrasegaran22a.pdf), Keshigeyan Chandrasegaran, Ngoc-Trung Tran, Yunqing Zhao, Ngai-Man Cheung, ICML 2022\n\n## 2023\n* [Curriculum Temperature for Knowledge Distillation](https://arxiv.org/abs/2211.16231), Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang, AAAI 2023, [code](https://github.com/zhengli97/CTKD)\n* [Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling](https://arxiv.org/abs/2301.00230), Xin Ma, Chang Liu, Chunyu Xie, Long Ye, Yafeng Deng, Xiangyang Ji, arXiv 2023, [code](https://github.com/mx-mark/DMJD)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhyfst%2Fknowledge-distillation-papers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flhyfst%2Fknowledge-distillation-papers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhyfst%2Fknowledge-distillation-papers/lists"}