https://github.com/shreypandit/knowledge_distillation
Tried out the code for Knowledge Distillation
https://github.com/shreypandit/knowledge_distillation
Last synced: 12 months ago
JSON representation
Tried out the code for Knowledge Distillation
- Host: GitHub
- URL: https://github.com/shreypandit/knowledge_distillation
- Owner: ShreyPandit
- Created: 2021-05-18T10:48:15.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-05-19T11:30:53.000Z (about 5 years ago)
- Last Synced: 2025-02-08T18:13:58.085Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 16.8 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Knowledge_Distillation
Platform used - Tensorflow
Knowledge distillation is a process of training a smaller model using a larger model usually known as a teacher model.
It has various benifits of Transfer learning as the model has lesser parameters than the original Parent (Teacher) model.
There are various different hyperparameter used to train here -
1) Temperature - here the softmax function is modified for creating a type of prediction which not have a hard boundary, In the paper it is discussed that knowing the relation between other images in the dataset about how close the given image is from other classes is helpful. So this type of prediction is created by reducing the exponentiated value by a hyper parameter known as temperature. Its values shouldn't be soo high that the predicted values are very close and no conclusion can be drawn
2) Alpha - This is the parameter which decided the weightage that needs to be given to the custom defined Loss function, how much weightage should be given to the loss recieved from the parent loss function and the student loss function.
Link for paper - https://arxiv.org/abs/1503.02531
The code is reffered from the official Keras documentation