https://github.com/xv44586/knowledge-distillation-nlp

some demos of Knowledge Distillation in NLP
https://github.com/xv44586/knowledge-distillation-nlp

bert keras knowledge-distillation nlp

Last synced: about 2 months ago
JSON representation

some demos of Knowledge Distillation in NLP

Host: GitHub
URL: https://github.com/xv44586/knowledge-distillation-nlp
Owner: xv44586
Created: 2020-09-10T06:52:05.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-12-31T10:59:53.000Z (over 4 years ago)
Last Synced: 2025-03-31T11:51:11.563Z (3 months ago)
Topics: bert, keras, knowledge-distillation, nlp
Language: Jupyter Notebook
Homepage:
Size: 89.8 KB
Stars: 20
Watchers: 1
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 知识蒸馏
![](./img/knowledge-distillation.png)

知识蒸馏（a.k.a Teacher-Student Model)旨在利用一个小模型（Student）去学习一个大模型（Teacher）中的知识，
期望小模型尽量保持大模型的性能，来减小模型部署阶段的参数量，加速模型推理速度，降低计算资源使用。

# 目录结构
- 1.参考[Distilling the Knowledge in a Neural Network](http://arxiv.org/abs/1503.02531) (Hinton et al., 2015),
在cifar10数据上的复现，提供一个对Knowledge Distillation的基本认识，具体内容请查阅：[Knowledge_Distillation_From_Scratch.ipynb](Knowledge_Distillation_From_Scratch.ipynb)
- 2.利用BERT-12 作为Teacher，BERT-3作为student，同时学习ground truth 和 soften labels，性能与Teacher 相当甚至更优，具体内容请查阅：[knowledge_distillation_bert](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_bert.py)

主要参考论文：
- [Distilling Task-Specific Knowledge from BERT into Simple Neural Networks](http://arxiv.org/abs/1903.12136)
- [TinyBERT: Distilling BERT for Natural Language Understanding](http://arxiv.org/abs/1909.10351)
- [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](http://arxiv.org/abs/1910.01108)

- 3.利用模块替换的思路，来进行Knowledge Distillation，具体内容请查阅：[knowledge_distillation_bert_of_theseus](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_bert_of_theseus.py)

论文：
- [BERT-of-Theseus: Compressing BERT by Progressive Module Replacing](http://arxiv.org/abs/2002.02925)

Blog:
- [BERT-of-Theseus：基于模块替换的模型压缩方法](https://spaces.ac.cn/archives/7575)
- [模型压缩实践系列之——bert-of-theseus，一个非常亲民的bert压缩方法](https://zhuanlan.zhihu.com/p/112787764)

repo:
- [https://github.com/JetRunner/BERT-of-Theseus](https://github.com/JetRunner/BERT-of-Theseus)
- [https://github.com/bojone/bert-of-theseus](https://github.com/bojone/bert-of-theseus)

- 4.利用不同样本预测的难易程度不同，来动态选择模型的branch classifier，不过由于tensorflow1.X 是静态图，所以当前实现的
demo实际上并不会真的提前结束计算，具体内容请查阅：[knowledge_distillation_fastbert](https://github.com/xv44586/Knowledge-Distillation-NLP/knowledge_distillation_fastbert.py)

论文：
- [FastBERT: a Self-distilling BERT with Adaptive Inference Time](http://arxiv.org/abs/2004.02178)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xv44586/knowledge-distillation-nlp

Awesome Lists containing this project

README