Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lhyfst/knowledge-distillation-papers

knowledge distillation papers
https://github.com/lhyfst/knowledge-distillation-papers

dark-knowledge knowledge-distillation model-compression paper reading-list

Last synced: 3 months ago
JSON representation

knowledge distillation papers

Awesome Lists containing this project

README

        

# knowledge distillation papers

## Early Papers

* [Model Compression](http://www.cs.cornell.edu/~caruana/compression.kdd06.pdf), Rich Caruana, 2006
* [Distilling the Knowledge in a Neural Network](https://arxiv.org/pdf/1503.02531.pdf), Hinton, J.Dean, 2015
* [Knowledge Acquisition from Examples Via Multiple Models](https://homes.cs.washington.edu/~pedrod/papers/mlc97.pdf), Perdo Domingos, 1997
* [Combining labeled and unlabeled data with co-training](https://www.cs.cmu.edu/~avrim/Papers/cotrain.pdf), A. Blum, T. Mitchell, 1998
* [Using A Neural Network to Approximate An Ensemble of Classifiers](http://axon.cs.byu.edu/papers/zeng.npl2000.pdf), Xinchuan Zeng and Tony R. Martinez, 2000
* [Do Deep Nets Really Need to be Deep?](https://arxiv.org/pdf/1312.6184.pdf), Lei Jimmy Ba, Rich Caruana, 2014

## Recommended Papers

* [FitNets: Hints for Thin Deep Nets](https://arxiv.org/pdf/1412.6550), Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2015
* [Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer](https://arxiv.org/pdf/1612.03928), Sergey Zagoruyko, Nikos Komodakis, 2016
* [A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yim_A_Gift_From_CVPR_2017_paper.pdf), Junho Yim, Donggyu Joo, Jihoon Bae, Junmo Kim, 2017
* [Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513.pdf), Zheng Xu, Yen-Chang Hsu, Jiawei Huang
* [Born Again Neural Networks](https://arxiv.org/abs/1805.04770), Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018
* [Net2Net: Accelerating Learning Via Knowledge Transfer](https://arxiv.org/pdf/1511.05641.pdf), Tianqi Chen, Ian Goodfellow, Jonathon Shlens, 2016
* [Unifying distillation and privileged information](https://arxiv.org/pdf/1511.03643), David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, Vladimir Vapnik, 2015
* [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/pdf/1511.04508.pdf), Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, 2016
* [Large scale distributed neural network training through online distillation](https://arxiv.org/pdf/1804.03235.pdf), Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton, 2018
* [Deep Mutual Learning](https://arxiv.org/pdf/1706.00384.pdf), Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu, 2017
* [Learning Loss for Knowledge Distillation with Conditional Adversarial Networks](https://arxiv.org/pdf/1709.00513), Zheng Xu, Yen-Chang Hsu, Jiawei Huang, 2017
* [Data-Free Knowledge Distillation for Deep Neural Networks](https://arxiv.org/pdf/1710.07535.pdf), Raphael Gontijo Lopes, Stefano Fenu, Thad Starner, 2017
* [Quantization Mimic: Towards Very Tiny CNN for Object Detection](https://arxiv.org/pdf/1805.02152.pdf), Yi Wei, Xinyu Pan, Hongwei Qin, Wanli Ouyang, Junjie Yan, 2018
* [Knowledge Projection for Deep Neural Networks](https://arxiv.org/pdf/1710.09505), Zhi Zhang, Guanghan Ning, Zhihai He, 2017
* [Moonshine: Distilling with Cheap Convolutions](https://arxiv.org/pdf/1711.02613), Elliot J. Crowley, Gavin Gray, Amos Storkey, 2017
* [Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving](https://arxiv.org/pdf/1804.06332.pdf), Jiaolong Xu, Peng Wang, Heng Yang and Antonio M. L ´opez, 2018
* [Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net](https://arxiv.org/pdf/1708.04106.pdf), Zihao Liu, Qi Liu, Tao Liu, Yanzhi Wang, Wujie Wen, 2017
* [Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher](https://arxiv.org/pdf/1902.03393.pdf), Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh, 2019
* [ResKD: Residual-Guided Knowledge Distillation](https://arxiv.org/pdf/2006.04719.pdf), Xuewei Li, Songyuan Li, Bourahla Omar, and Xi Li, 2020
* [Rethinking Data Augmentation: Self-Supervision and Self-Distillation](https://arxiv.org/abs/1910.05872), Hankook Lee, Sung Ju Hwang, Jinwoo Shin, 2019
* [MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks](https://arxiv.org/abs/1911.09418), Yunteng Luan, Hanyu Zhao, Zhi Yang, Yafei Dai, 2019
* [Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://arxiv.org/abs/1905.08094), Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, Kaisheng Ma, 2019

## 2016
* [Cross Modal Distillation for Supervision Transfer](https://people.eecs.berkeley.edu/~jhoffman/papers/Gupta_CVPR16.pdf), Saurabh Gupta, Judy Hoffman, Jitendra Malik, CVPR 2016
* [Deep Model Compression: Distilling Knowledge from Noisy Teachers](https://arxiv.org/pdf/1610.09650), Bharat Bhusan Sau, Vineeth N. Balasubramanian, 2016
* [Knowledge Distillation for Small-footprint Highway Networks](https://arxiv.org/pdf/1608.00892), Liang Lu, Michelle Guo, Steve Renals, 2016
* [Sequence-Level Knowledge Distillation](https://arxiv.org/pdf/1606.07947), [deeplearning-papernotes](https://github.com/dennybritz/deeplearning-papernotes/blob/master/notes/seq-knowledge-distillation.md), Yoon Kim, Alexander M. Rush, 2016
* [Recurrent Neural Network Training with Dark Knowledge Transfer](https://arxiv.org/pdf/1505.04630.pdf), Zhiyuan Tang, Dong Wang, Zhiyong Zhang, 2016
* [Face Model Compression by Distilling Knowledge from Neurons](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/download/11977/12130), Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang, and Xiaoou Tang, 2016
* [Sequence-Level Knowledge Distillation](https://arxiv.org/abs/1606.07947), Yoon Kim, Alexander M. Rush, EMNLP 2016
* [Distilling Word Embeddings: An Encoding Approach](https://arxiv.org/abs/1506.04488), Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin, CIKM 2016

## 2017

* [Data Distillation: Towards Omni-Supervised Learning](https://arxiv.org/pdf/1712.04440), Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He, CVPR 2017
* [Knowledge Projection for Deep Neural Networks](https://arxiv.org/pdf/1710.09505), Zhi Zhang, Guanghan Ning, Zhihai He, 2017
* [Like What You Like: Knowledge Distill via Neuron Selectivity Transfer](https://arxiv.org/pdf/1707.01219), Zehao Huang, Naiyan Wang, 2017
* [Data-Free Knowledge Distillation For Deep Neural Networks](http://raphagl.com/research/replayed-distillation/), Raphael Gontijo Lopes, Stefano Fenu, 2017
* [DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer](https://arxiv.org/pdf/1707.01220), Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, 2017
* [Adapting Models to Signal Degradation using Distillation](https://arxiv.org/pdf/1604.00433.pdf), Jong-Chyi Su, Subhransu Maji, BMVC 2017
* [Cross-lingual Distillation for Text Classification](https://arxiv.org/abs/1705.02073), Ruochen Xu, Yiming Yang, ACL 2017, [code](https://github.com/xrc10/cross-distill)

## 2018

* [Learning Global Additive Explanations for Neural Nets Using Model Distillation](https://arxiv.org/pdf/1801.08640.pdf), Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, Albert Gordo, 2018
* [YASENN: Explaining Neural Networks via Partitioning Activation Sequences](https://arxiv.org/pdf/1811.02783), Yaroslav Zharov, Denis Korzhenkov, Pavel Shvechikov, Alexander Tuzhilin, 2018
* [Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results](https://arxiv.org/pdf/1703.01780), Antti Tarvainen, Harri Valpola, 2018
* [Local Affine Approximators for Improving Knowledge Transfer](https://lld-workshop.github.io/2017/papers/LLD_2017_paper_28.pdf), Suraj Srinivas & François Fleuret, 2018
* [Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?](https://arxiv.org/pdf/1806.07550.pdf)Shilin Zhu, Xin Dong, Hao Su, 2018
* [Probabilistic Knowledge Transfer for deep representation learning](https://arxiv.org/pdf/1803.10837.pdf), Nikolaos Passalis, Anastasios Tefas, 2018
* [Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons](https://arxiv.org/pdf/1811.03233.pdf), Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi, 2018
* [Paraphrasing Complex Network: Network Compression via Factor Transfer](https://arxiv.org/pdf/1802.04977.pdf), Jangho Kim, SeongUk Park, Nojun Kwak, NIPS, 2018
* [KDGAN: Knowledge Distillation with Generative Adversarial Networks](https://proceedings.neurips.cc/paper/2018/file/019d385eb67632a7e958e23f24bd07d7-Paper.pdf), Xiaojie Wang, Rui Zhang, Yu Sun, Jianzhong Qi, NeurIPS 2018
* [Distilling Knowledge for Search-based Structured Prediction](https://aclanthology.org/P18-1129/), Yijia Liu, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu, ACL 2018

## 2019
* [Learning Efficient Detector with Semi-supervised Adaptive Distillation](https://arxiv.org/pdf/1901.00366.pdf), Shitao Tang, Litong Feng, Zhanghui Kuang, Wenqi Shao, Quanquan Li, Wei Zhang, Yimin Chen, 2019
* [Dataset Distillation](https://arxiv.org/pdf/1811.10959.pdf), Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, Alexei A. Efros, 2019
* [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068), Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho, 2019
* [Knowledge Adaptation for Efficient Semantic Segmentation](https://arxiv.org/abs/1903.04688), Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan, 2019
* [A Comprehensive Overhaul of Feature Distillation](https://arxiv.org/abs/1904.01866), Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, Nojun Kwak, Jin Young Choi, 2019, [code](https://github.com/clovaai/overhaul-distillation)
* [Towards Understanding Knowledge Distillation](http://arxiv.org/abs/2002.03532), Mary Phuong, Christoph Lampert, ICML, 2019
* [Knowledge Distillation from Internal Representations](https://arxiv.org/abs/1910.03723), Gustavo Aguilar, Yuan Ling, Yu Zhang, Benjamin Yao, Xing Fan, Edward Guo, 2019
* [Knowledge Flow: Improve Upon Your Teachers](https://arxiv.org/abs/1904.05878), Iou-Jen Liu, Jian Peng, Alexander G. Schwing, 2019
* [Similarity-Preserving Knowledge Distillation](https://arxiv.org/pdf/1907.09682.pdf), Frederick Tung, Greg Mori, 2019
* [Correlation Congruence for Knowledge Distillation](Correlation Congruence for Knowledge Distillation), Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang, 2019
* [Variational Information Distillation for Knowledge Transfer](https://arxiv.org/pdf/1904.05835.pdf), Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai, 2019
* [Knowledge Distillation via Instance Relationship Graph](https://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Knowledge_Distillation_via_Instance_Relationship_Graph_CVPR_2019_paper.pdf), Yufan Liu, Jiajiong Cao, Bing Lia, Chunfeng Yuan, Weiming Hua, Yangxi Lic, Yunqiang Duan, CVPR 2019
* [Structured Knowledge Distillation for Semantic Segmentation](https://arxiv.org/pdf/1903.04197.pdf), Yifan Liu, Changyong Shu, Jingdong Wang, Chunhua Shen, CVPR 2019
* [Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention](https://aclanthology.org/P19-1305/), Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo, ACL 2019, [code](https://github.com/KelleyYin/Cross-lingual-Summarization)
* [Distilling Task-Specific Knowledge from BERT into Simple Neural Networks](https://arxiv.org/abs/1903.12136), Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin, arXiv, 2019
* [Multilingual Neural Machine Translation with Knowledge Distillation](https://arxiv.org/abs/1902.10461), Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu, ICLR 2019
* [BAM! Born-Again Multi-Task Networks for Natural Language Understanding](https://arxiv.org/abs/1907.04829), Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le, ACL 2019
* [Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding](https://arxiv.org/abs/1904.09482), Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, arXiv 2019
* [Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection](https://ojs.aaai.org//index.php/AAAI/article/view/4649), AAAI 2019

## 2020
* [Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion](https://arxiv.org/abs/1912.08795), Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz, 2020
* [Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation](https://arxiv.org/abs/2010.07485), Jia Guo, Minghao Chen, Yao Hu, Chen Zhu, Xiaofei He, Deng Cai, 2020
* [Data-Free Adversarial Distillation](https://arxiv.org/pdf/1912.11006.pdf), Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song, 2020
* [Contrastive Representation Distillation](https://arxiv.org/abs/1910.10699v2), Yonglong Tian, Dilip Krishnan, Phillip Isola, ICLR 2020, [code](https://github.com/HobbitLong/RepDistiller)
* [StyleGAN2 Distillation for Feed-forward Image Manipulation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123670171.pdf), Yuri Viazovetskyi, Vladimir Ivashkin, and Evgeny Kashin, ECCV 2020
* [Distilling Knowledge from Graph Convolutional Networks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_Distilling_Knowledge_From_Graph_Convolutional_Networks_CVPR_2020_paper.pdf), Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, Xinchao Wang, CVPR 2020
* [Self-supervised Knowledge Distillation for Few-shot Learning](https://arxiv.org/abs/2006.09785), Jathushan Rajasegaran, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Mubarak Shah, 2020, [code](https://github.com/brjathu/SKD)
* [Online Knowledge Distillation with Diverse Peers](https://arxiv.org/abs/1912.00350), Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng and Chun Chen, AAAI, 2020
* [Intra-class Feature Variation Distillation for Semantic Segmentation](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123520341.pdf), Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, and Yongchao Xu, ECCV 2020
* [Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123690324.pdf), Xiaobo Wang, Tianyu Fu, Shengcai Liao, Shuo Wang, Zhen Lei, and Tao Mei, ECCV 2020
* [Improving Face Recognition from Hard Samples via Distribution Distillation Loss](https://arxiv.org/abs/2002.03662), Yuge Huang, Pengcheng Shen, Ying Tai, Shaoxin Li, Xiaoming Liu, Jilin Li, Feiyue Huang, Rongrong Ji, ECCV 2020
* [Distilling Knowledge Learned in BERT for Text Generation](https://arxiv.org/abs/1911.03829), Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, Jingjing Liu, ACL 2020, [code](https://github.com/ChenRocks/Distill-BERT-Textgen)

## 2021
* [Dataset Distillation with Infinitely Wide Convolutional Networks](https://openreview.net/forum?id=hXWPpJedrVP), Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee, 2021
* [Dataset Meta-Learning from Kernel Ridge-Regression](https://openreview.net/forum?id=l-PrrQrK0QR), Timothy Nguyen, Zhourong Chen, Jaehoon Lee, 2021
* [Up to 100× Faster Data-free Knowledge Distillation](https://arxiv.org/pdf/2112.06253.pdf), Gongfan Fang1, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song, 2021
* [Robustness and Diversity Seeking Data-Free Knowledge Distillation](https://arxiv.org/pdf/2011.03749.pdf), Pengchao Han, Jihong Park, Shiqiang Wang, Yejun Liu, 2021
* [Data-Free Knowledge Transfer: A Survey](https://arxiv.org/pdf/2112.15278.pdf), Yuang Liu, Wei Zhang, Jun Wang, Jianyong Wang, 2021
* [Undistillable: Making A Nasty Teacher That CANNOT teach students](https://openreview.net/forum?id=0zvfm-nZqQs), Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang, ICLR 2021
* [QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning](https://arxiv.org/abs/2107.13892), Kaan Ozkara, Navjot Singh, Deepesh Data, Suhas Diggavi, NeurIPS 2021
* [KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation](https://arxiv.org/abs/2109.10504), Yongfei Liu, Chenfei Wu, Shao-yen Tseng, Vasudev Lal, Xuming He, Nan Duan
* [Online Knowledge Distillation for Efficient Pose Estimation](https://arxiv.org/abs/2108.02092), Zheng Li, Jingwen Ye, Mingli Song, Ying Huang, Zhigeng Pan, ICCV 2021
* [Does Knowledge Distillation Really Work?](https://arxiv.org/abs/2106.05945), Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson, NeurIPS 2021
* [Hierarchical Self-supervised Augmented Knowledge Distillation](https://arxiv.org/abs/2107.13715), Chuanguang Yang, Zhulin An, Linhang Cai, Yongjun Xu, IJCAI 2021
* [DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis With GANs](https://archives.ismir.net/ismir2021/paper/000060.pdf), Javier Nistal, Stefan Lattner, Gaël Richard, ISMIR2021
* [On Self-Distilling Graph Neural Network](https://www.ijcai.org/proceedings/2021/0314.pdf), Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang, IJCAI 2021
* [Graph-Free Knowledge Distillation for Graph Neural Networks](https://www.ijcai.org/proceedings/2021/0320.pdf), Xiang Deng, Zhongfei Zhang, IJCAI 2021
* [Self Supervision to Distillation for Long-Tailed Visual Recognition](https://openaccess.thecvf.com/content/ICCV2021/papers/Li_Self_Supervision_to_Distillation_for_Long-Tailed_Visual_Recognition_ICCV_2021_paper.pdf), Tianhao Li, Limin Wang, Gangshan Wu, ICCV 2021
* [Cross-Layer Distillation with Semantic Calibration](https://arxiv.org/abs/2012.03236), Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen, AAAI 2021
* [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256), Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen, ICCV 2021
* [Training data-efficient image transformers & distillation through attention](https://icml.cc/virtual/2021/poster/8671), Hugo Touvron, Matthieu Cord, Douze Matthijs, Francisco Massa, Alexandre Sablayrolles, Herve Jegou, ICML 2021
* [Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation](https://arxiv.org/abs/2202.03680), Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Xiaodan Liang, ICCV 2021, [code](https://github.com/ADLab-AutoDrive/ICKD)
* [torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation](https://arxiv.org/abs/2011.12913), Yoshitomo Matsubara, International Workshop on Reproducible Research in Pattern Recognition 2021, [code](https://github.com/yoshitomo-matsubara/torchdistill)

## 2022
* [LGD: Label-guided Self-distillation for Object Detection](https://arxiv.org/abs/2109.11496), Peizhen Zhang, Zijian Kang, Tong Yang, Xiangyu Zhang, Nanning Zheng, Jian Sun, AAAI 2022
* [MonoDistill: Learning Spatial Features for Monocular 3D Object Detection](https://openreview.net/forum?id=C54V-xTWfi), Anonymous, ICLR 2022
* [Bag of Instances Aggregation Boosts Self-supervised Distillation](https://openreview.net/forum?id=N0uJGWDw21d), Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian, ICLR 2022
* [Meta Learning for Knowledge Distillation](https://arxiv.org/abs/2106.04570), Wangchunshu Zhou, Canwen Xu, Julian McAuley, 2022
* [Focal and Global Knowledge Distillation for Detectors](https://arxiv.org/abs/2111.11837), Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan, CVPR 2022
* [Self-Distilled StyleGAN: Towards Generation from Internet Photos](https://arxiv.org/abs/2202.12211), Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri, 2022
* [Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation](https://arxiv.org/abs/2112.04840), Gang Li, Xiang Li, Yujie Wang, Shanshan Zhang, Yichao Wu, Ding Liang, AAAI 2022
* [Decoupled Knowledge Distillation](https://arxiv.org/abs/2203.08679), Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang, CVPR 2022, [code](https://github.com/megvii-research/mdistiller)
* [Graph Flow: Cross-layer Graph Flow Distillation for Dual-Efficient Medical Image Segmentation](https://arxiv.org/abs/2203.08667), Wenxuan Zou, Muyi Sun, 2022
* [Dataset Distillation by Matching Training Trajectories](https://arxiv.org/abs/2203.11932), George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, Jun-Yan Zhu, CVPR 2022
* [Knowledge Distillation with the Reused Teacher Classifier](https://arxiv.org/abs/2203.14001), Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen, CVPR 2022
* [Self-Distillation from the Last Mini-Batch for Consistency Regularization](https://arxiv.org/abs/2203.16172), Shen Yiqing, Xu Liwu, Yang Yuzhe, Li Yaqian and Guo Yandong, CVPR 2022 [code](https://github.com/dongkyuk/DLB-Pytorch)
* [DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers](), Xianing Chen, Qiong Cao, Yujie Zhong, Shenghua Gao, CVPR 2022
* [Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning](https://arxiv.org/abs/2203.09249), Lin Zhang, Li Shen, Liang Ding, Dacheng Tao, Ling-Yu Duan, CVPR 2022
* [LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection](https://arxiv.org/abs/2203.14956), Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jiwen Lu, Jie Zhou, 2022
* [Localization Distillation for Dense Object Detection](https://arxiv.org/abs/2102.12252), Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Zuo, Qibin Hou, Ming-Ming Cheng, CVPR 2022, [code](https://github.com/HikariTJU/LD)
* [Localization Distillation for Object Detection](https://arxiv.org/abs/2204.05957), Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, Wangmeng Zuo, Ming-Ming Cheng, 2022, [code](https://github.com/Zzh-tju/Rotated-LD)
* [Cross-Image Relational Knowledge Distillation for Semantic Segmentation](https://arxiv.org/abs/2204.06986), Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, Qian Zhang, CVPR 2022, [code](https://github.com/winycg/CIRKD)
* [Knowledge distillation: A good teacher is patient and consistent](https://arxiv.org/abs/2106.05237), Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov, CVPR 2022
* [Spot-adaptive Knowledge Distillation](https://arxiv.org/abs/2205.02399), Jie Song, Ying Chen, Jingwen Ye, Mingli Song, TIP 2022, [code](https://github.com/zju-vipa/spot-adaptive-pytorch)
* [MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning](https://arxiv.org/abs/2203.03137), Shiming Chen, Ziming Hong, Guo-Sen Xie, Wenhan Yang, Qinmu Peng, Kai Wang, Jian Zhao, Xinge You, CVPR 2022
* [Knowledge Distillation via the Target-aware Transformer](https://arxiv.org/abs/2205.10793), Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022
* [PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection](https://arxiv.org/abs/2205.11098), Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma, arXiv 2022, [code](https://github.com/RunpeiDong/PointDistiller)
* [Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation](https://arxiv.org/abs/2203.06321), Linfeng Zhang, Xin Chen, Xiaobing Tu, Pengfei Wan, Ning Xu, Kaisheng Ma, CVPR 2022
* [Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation](https://arxiv.org/abs/2205.14141), Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo, Tech Report 2022, [code](https://github.com/SwinTransformer/Feature-Distillation)
* [Knowledge Distillation via the Target-aware Transformer](https://arxiv.org/abs/2205.10793), Sihao Lin, Hongwei Xie, Bing Wang, Kaicheng Yu, Xiaojun Chang, Xiaodan Liang, Gang Wang, CVPR 2022
* [BERT Learns to Teach: Knowledge Distillation with Meta Learning](https://arxiv.org/abs/2106.04570), Wangchunshu Zhou, Canwen Xu, Julian McAuley, ACL 2022, [code](https://github.com/JetRunner/MetaDistil)
* [Nearest Neighbor Knowledge Distillation for Neural Machine Translation](https://arxiv.org/abs/2205.00479), Zhixian Yang, Renliang Sun, Xiaojun Wan, NAACL 2022
* [Knowledge Condensation Distillation](https://arxiv.org/abs/2207.05409), Chenxin Li, Mingbao Lin, Zhiyuan Ding, Nie Lin, Yihong Zhuang, Yue Huang, Xinghao Ding, Liujuan Cao, ECCV 2022, [code](https://github.com/dzy3/KCD)
* [Masked Generative Distillation](https://arxiv.org/abs/2205.01529), Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan, ECCV 2022, [code](https://github.com/yzd-v/MGD)
* [DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection](https://arxiv.org/abs/2207.05536), Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang
* [Distilled Dual-Encoder Model for Vision-Language Understanding](https://arxiv.org/abs/2112.08723), Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei, [code](https://github.com/kugwzk/Distilled-DualEncoder)
* [Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection](https://arxiv.org/abs/2207.02541), Hongyu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li, Haiyan Yu, Jian Sun, ECCV 2022, [code](https://github.com/Megvii-BaseDetection/DenseTeacher)
* [Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation](https://arxiv.org/abs/2107.01378), Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang
* [TinyViT: Fast Pretraining Distillation for Small Vision Transformers](), Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, Yunhe Wang, ECCV 2022
* [Self-slimmed Vision Transformer](https://arxiv.org/abs/2111.12624), Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu, ICLR 2022
* [KD-MVS: Knowledge Distillation Based Self-supervised Learning for MVS](https://arxiv.org/abs/2207.10425), Yikang Ding, Qingtian Zhu, Xiangyue Liu, Wentao Yuan, Haotian Zhang, CHi Zhang, ECCV 2022, [code](https://github.com/megvii-research/kd-mvs)
* [Rethinking Data Augmentation for Robust Visual Question Answering](https://arxiv.org/abs/2207.08739), Long Chen, Yuhang Zheng, Jun Xiao, ECCV 2022, [code](https://github.com/ItemZheng/KDDAug)
* [ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval](https://arxiv.org/abs/2205.09153), Yuxiang Lu, Yiding Liu, Jiaxiang Liu, Yunsheng Shi, Zhengjie Huang, Shikun Feng Yu Sun, Hao Tian, Hua Wu, Shuaiqiang Wang, Dawei Yin, Haifeng Wang
* [Prune Your Model Before Distill It](https://arxiv.org/abs/2109.14960), Jinhyuk Park, Albert No, ECCV 2022, [code](https://github.com/ososos888/prune-then-distill)
* [Efficient One Pass Self-distillation with Zipf's Label Smoothing](https://arxiv.org/abs/2207.12980), Jiajun Liang, Linze Li, Zhaodong Bing, Borui Zhao, Yao Tang, Bo Lin, Haoqiang Fan, ECCV 2022, [code](https://github.com/megvii-research/zipfls)
* [R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis](https://arxiv.org/abs/2203.17261), ECCV 2022, [code](https://github.com/snap-research/R2L)
* [D3Former: Debiased Dual Distilled Transformer for Incremental Learning](https://arxiv.org/abs/2208.00777), Abdelrahman Mohamed, Rushali Grandhe, KJ Joseph, Salman Khan, Fahad Khan, [code](https://github.com/abdohelmy/D-3Former)
* [SdAE: Self-distillated Masked Autoencoder](https://arxiv.org/abs/2208.00449), Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong, Qi Tian, ECCV 2022, [code](https://github.com/AbrahamYabo/SdAE)
* [Masked Generative Distillation](https://arxiv.org/abs/2205.01529), Zhendong Yang, Zhe Li, Mingqi Shao, Dachuan Shi, Zehuan Yuan, Chun Yuan, ECCV 2022, [code](https://github.com/yzd-v/MGD)
* [MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition](https://arxiv.org/abs/2208.05768), Chuanguang Yang, Zhulin An, Helong Zhou, Linhang Cai, Xiang Zhi, Jiwen Wu, Yongjun Xu, Qian Zhang, ECCV 2022, [code](https://github.com/winycg/Self-KD-Lib)
* [Mind the Gap in Distilling StyleGANs](https://arxiv.org/abs/2208.08840), Guodong Xu, Yuenan Hou, Ziwei Liu, Chen Change Loy, ECCV 2022, [code](https://github.com/xuguodong03/StyleKD)
* [Prune Your Model Before Distill It](https://arxiv.org/pdf/2109.14960.pdf), Jinhyuk Park and Albert No, ECCV 2022, [code](https://github.com/ososos888/prune-then-distill)
* [HIRE: Distilling high-order relational knowledge from heterogeneous graph neural networks](https://arxiv.org/abs/2207.11887), Jing Liu, Tongya Zheng, Qinfen Hao, Neurocomputing
* [A Fast Knowledge Distillation Framework for Visual Recognition](https://arxiv.org/abs/2112.01528), Zhiqiang Shen, Eric Xing, ECCV 2022, [code](https://github.com/szq0214/FKD)
* [Knowledge Distillation from A Stronger Teacher](https://arxiv.org/abs/2205.10536), Tao Huang, Shan You, Fei Wang, Chen Qian, Chang Xu, NeurIPS 2022, [code](https://github.com/hunto/DIST_KD)
* [ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval](https://arxiv.org/abs/2207.14757), Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara, CBMI 2022, [code](https://github.com/mesnico/ALADIN)
* [Towards Efficient 3D Object Detection with Knowledge Distillation](https://arxiv.org/abs/2205.15156), Jihan Yang, Shaoshuai Shi, Runyu Ding, Zhe Wang, Xiaojuan Qi, NeurlPS 2022, [code](https://github.com/CVMI-Lab/SparseKD)
* [Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher](https://arxiv.org/abs/2110.08532), Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi, COLING 2022
* [Noisy Self-Knowledge Distillation for Text Summarization](https://arxiv.org/abs/2009.07032), Yang Liu, Sheng Shen, Mirella Lapata, arXiv 2021
* [On Distillation of Guided Diffusion Models](https://arxiv.org/abs/2210.03142), Chenlin Meng, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans, arXiv 2022
* [ViTKD: Practical Guidelines for ViT feature knowledge distillation](https://arxiv.org/abs/2209.02432), Zhendong Yang, Zhe Li, Ailing Zeng, Zexian Li, Chun Yuan, Yu Li, arXiv 2022, [code](https://github.com/yzd-v/cls_KD)
* [Self-Regulated Feature Learning via Teacher-free Feature Distillation](https://lilujunai.github.io/Teacher-free-Distillation/), Lujun Li, ECCV 2022, [code](https://github.com/lilujunai/Teacher-free-Distillation)
* [DETRDistill: A Universal Knowledge Distillation Framework for DETR-families](https://arxiv.org/abs/2211.10156v2), Jiahao Chang, Shuo Wang, Guangkai Xu, Zehui Chen, Chenhongyi Yang, Feng Zhao, arXiv 2022
* [Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression](https://ieeexplore.ieee.org/abstract/document/9804342), Yufan Liu, Jiajiong Cao, Bing Li, Weiming Hu, Stephen Maybank, TPAMI 2022
* [Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?](https://proceedings.mlr.press/v162/chandrasegaran22a/chandrasegaran22a.pdf), Keshigeyan Chandrasegaran, Ngoc-Trung Tran, Yunqing Zhao, Ngai-Man Cheung, ICML 2022

## 2023
* [Curriculum Temperature for Knowledge Distillation](https://arxiv.org/abs/2211.16231), Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang, AAAI 2023, [code](https://github.com/zhengli97/CTKD)
* [Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling](https://arxiv.org/abs/2301.00230), Xin Ma, Chang Liu, Chunyu Xie, Long Ye, Yafeng Deng, Xiangyang Ji, arXiv 2023, [code](https://github.com/mx-mark/DMJD)