https://github.com/xv44586/knowledge-distillation-nlp
some demos of Knowledge Distillation in NLP
https://github.com/xv44586/knowledge-distillation-nlp
bert keras knowledge-distillation nlp
Last synced: about 2 months ago
JSON representation
some demos of Knowledge Distillation in NLP
- Host: GitHub
- URL: https://github.com/xv44586/knowledge-distillation-nlp
- Owner: xv44586
- Created: 2020-09-10T06:52:05.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-12-31T10:59:53.000Z (over 4 years ago)
- Last Synced: 2025-03-31T11:51:11.563Z (3 months ago)
- Topics: bert, keras, knowledge-distillation, nlp
- Language: Jupyter Notebook
- Homepage:
- Size: 89.8 KB
- Stars: 20
- Watchers: 1
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 知识蒸馏
知识蒸馏(a.k.a Teacher-Student Model)旨在利用一个小模型(Student)去学习一个大模型(Teacher)中的知识,
期望小模型尽量保持大模型的性能,来减小模型部署阶段的参数量,加速模型推理速度,降低计算资源使用。# 目录结构
- 1.参考[Distilling the Knowledge in a Neural Network](http://arxiv.org/abs/1503.02531) (Hinton et al., 2015),
在cifar10数据上的复现,提供一个对Knowledge Distillation的基本认识,具体内容请查阅:[Knowledge_Distillation_From_Scratch.ipynb](Knowledge_Distillation_From_Scratch.ipynb)
- 2.利用BERT-12 作为Teacher,BERT-3作为student,同时学习ground truth 和 soften labels,性能与Teacher 相当甚至更优,具体内容请查阅:[knowledge_distillation_bert](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_bert.py)
主要参考论文:
- [Distilling Task-Specific Knowledge from BERT into Simple Neural Networks](http://arxiv.org/abs/1903.12136)
- [TinyBERT: Distilling BERT for Natural Language Understanding](http://arxiv.org/abs/1909.10351)
- [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](http://arxiv.org/abs/1910.01108)
- 3.利用模块替换的思路,来进行Knowledge Distillation,具体内容请查阅:[knowledge_distillation_bert_of_theseus](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_bert_of_theseus.py)论文:
- [BERT-of-Theseus: Compressing BERT by Progressive Module Replacing](http://arxiv.org/abs/2002.02925)
Blog:
- [BERT-of-Theseus:基于模块替换的模型压缩方法](https://spaces.ac.cn/archives/7575)
- [模型压缩实践系列之——bert-of-theseus,一个非常亲民的bert压缩方法](https://zhuanlan.zhihu.com/p/112787764)
repo:
- [https://github.com/JetRunner/BERT-of-Theseus](https://github.com/JetRunner/BERT-of-Theseus)
- [https://github.com/bojone/bert-of-theseus](https://github.com/bojone/bert-of-theseus)
- 4.利用不同样本预测的难易程度不同,来动态选择模型的branch classifier,不过由于tensorflow1.X 是静态图,所以当前实现的
demo实际上并不会真的提前结束计算,具体内容请查阅:[knowledge_distillation_fastbert](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_fastbert.py)
论文:
- [FastBERT: a Self-distilling BERT with Adaptive Inference Time](http://arxiv.org/abs/2004.02178)