https://github.com/alperiox/compact-language-models-via-pruning-and-knowledge-distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
https://github.com/alperiox/compact-language-models-via-pruning-and-knowledge-distillation
Last synced: 8 months ago
JSON representation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
- Host: GitHub
- URL: https://github.com/alperiox/compact-language-models-via-pruning-and-knowledge-distillation
- Owner: alperiox
- Created: 2024-08-12T10:54:16.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-07T10:16:22.000Z (almost 2 years ago)
- Last Synced: 2025-04-10T00:06:21.667Z (about 1 year ago)
- Language: Python
- Size: 715 KB
- Stars: 44
- Watchers: 1
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Compact Language Models via Pruning and Knowledge Distillation
This project is an unofficial implementation of the paper [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/pdf/2407.14679). It explores techniques for compressing large language models (LLMs) through a combination of pruning and knowledge distillation.
## Overview
The goal of this project is to investigate whether pruning an existing LLM and then re-training it with a small fraction of the original training data can be a viable alternative to training each model variant from scratch. The implementation focuses on:
1. Pruning strategies for width, attention, and MLP layers
2. Combining different pruning axes
3. Knowledge distillation techniques for retraining
4. Searching for optimal compressed architectures
## Project Structure
- `models.py`: Contains the implementation of the GPT model and its components
- `hooks.py`: Implements forward hooks for calculating importance scores
- `pruners.py`: Contains functions for pruning neurons, attention heads, and embeddings
- `utils.py`: Utility functions for data loading, model saving/loading, and evaluation
- `script.py`: Main script for running experiments
## Getting Started
1. Clone the repository
2. Install the required dependencies (from `pyproject.toml` file)
3. Download the training data (Shakespeare dataset) by running the script
4. Adjust hyperparameters in `script.py` as needed
5. Run `script.py` to train the base model and perform pruning experiments
## Key Features
- Implementation of a GPT-style language model
- Flexible pruning strategies for different model components
- Knowledge distillation for model retraining
- Experimental framework for testing various compression configurations
## Usage
The implementation doesn't support any kind of CLI usage, I kind of got focused on the math heavy stuff.
## Results
(work in progress)
## Limitations and Future Work
- This implementation currently focuses on a smaller scale model compared to the paper (like, a few thousand times smaller since I don't got any GPUs?)
- Further optimization of pruning and distillation techniques may be possible (didn't implement depth pruning as my focus is applying the technique on smaller models <15B)
## Acknowledgements
```
@article{minitron2024,
title={Compact Language Models via Pruning and Knowledge Distillation},
author={Saurav Muralidharan and Sharath Turuvekere Sreenivas and Raviraj Joshi and Marcin Chochowski and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jan Kautz and Pavlo Molchanov},
journal={arXiv preprint arXiv:2407.14679},
year={2024},
url={https://arxiv.org/abs/2407.14679},
}
```
- [Andrej Karpathy](https://github.com/karpathy) for literally firing me up for working on my FOMO.
## References
[Original paper](https://arxiv.org/pdf/2407.14679)