{"id":27984404,"url":"https://github.com/jongwooko/distillm","last_synced_at":"2025-05-08T05:01:55.335Z","repository":{"id":221295962,"uuid":"753564120","full_name":"jongwooko/distillm","owner":"jongwooko","description":"Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)","archived":false,"fork":false,"pushed_at":"2025-03-13T04:00:09.000Z","size":8926,"stargazers_count":194,"open_issues_count":4,"forks_count":26,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-13T05:17:59.693Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2402.03898","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jongwooko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-06T11:23:27.000Z","updated_at":"2025-03-13T04:00:12.000Z","dependencies_parsed_at":"2024-05-02T21:39:58.662Z","dependency_job_id":"4c9d17c4-f00c-46bd-9477-9b16eb809ab9","html_url":"https://github.com/jongwooko/distillm","commit_stats":null,"previous_names":["jongwooko/distillm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jongwooko%2Fdistillm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jongwooko%2Fdistillm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jongwooko%2Fdistillm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jongwooko%2Fdistillm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jongwooko","download_url":"https://codeload.github.com/jongwooko/distillm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253002856,"owners_count":21838640,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-08T05:01:51.170Z","updated_at":"2025-05-08T05:01:55.275Z","avatar_url":"https://github.com/jongwooko.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","🔬 OPD with Larger External Teachers — White-Box"],"sub_categories":["大语言对话模型及数据"],"readme":"# DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)\n\n\u003ca href=\"https://arxiv.org/abs/2402.03898\"\u003e\u003cimg src=\"https://img.shields.io/badge/Paper-arXiv:2402.03898-Green\"\u003e\u003c/a\u003e\n\u003ca href=#bibtex\u003e\u003cimg src=\"https://img.shields.io/badge/Paper-BibTex-yellow\"\u003e\u003c/a\u003e\n\nOfficial PyTorch implementation of **DistiLLM**, as presented in our paper: \\\n\\\n**DistiLLM: Towards Streamlined Distillation for Large Language Models** \\\n*[Jongwoo Ko](https://sites.google.com/view/jongwooko), [Sungnyun Kim](https://sungnyunkim.notion.site/Sungnyun-Kim-4770a0182c47469ebdcd357cde97bd32), Tianyi Chen, Se-Young Yun* \\\nKAIST AI and Microsoft\n\n## 🚀 Updates\n- [x] (25.03.11) DistiLLM-2 paper is out! The preliminary code will be available in this repo, and final code will be available in [here](https://github.com/jongwooko/distillm-2), soon.\n- [x] (24.08.12) Remove the dependency on the local transformers, which are outdated. You can work with various types of recent LLMs!\n- [x] (24.05.01) Our paper has been accepted in **ICML 2024**. We are open to receiving any discussions and will reflect them in the camera-ready version. Looking forward to seeing you in Vienna!\n- [x] (24.03.13) Release [**LoRA checkpoints for OpenLLaMa2-3B**](https://drive.google.com/drive/folders/1Yun1aNpn-mz2h-IVH_VdJ1Jhzm0K55Bo?usp=sharing)\n\n## Environment\n```bash\nbash install.sh\n```\n\nOur code is based on [this commit](https://github.com/huggingface/transformers/commit/85fde09c97213bf7e8625f83096bb2a9e183f987) of HuggingFace Transformers **by following MiniLLM**.\n\n## Data\n### Resources\n+ The training/evaluation intruction-response data before processing can be downloaded from this [link](https://conversationhub.blob.core.windows.net/beit-share-public/MiniLLM/data.tar?sv=2021-10-04\u0026st=2023-06-08T11%3A16%3A02Z\u0026se=2033-06-09T11%3A16%3A00Z\u0026sr=c\u0026sp=r\u0026sig=N4pfCVmSeq4L4tS8QbrFVsX6f6q844eft8xSuXdxU48%3D).\n+ The plain-text corpus $\\mathcal{D}_\\text{PT}$ can be download from the HugginFace datasets [repository](https://huggingface.co/datasets/openwebtext).\n\n\n### Data Processing\nGet plain-text corpus $\\mathcal{D}_\\text{PT}$:\n```bash\npython3 tools/get_openwebtext.py\n```\nThis script will replace the continuous `\\n` in each document with a special token \"\u003c@x(x!\u003e\" and write each document in OpenWebText in a line, which is convenient for parallel processing. In `data/openwebtext/data.txt`, we give an example of the resulting format. You can follow this format to prepare other corpus beyond OpenWebText.\n\nTokenize the data and store them in binary files:\n```bash\nbash scripts/gpt2/tools/process_data_dolly.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process Dolly Train / Validation Data\nbash scripts/gpt2/tools/process_data_pretrain.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process OpenWebText Train / Validation Data\n\nbash scripts/opt/tools/process_data_dolly.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process Dolly Train / Validation Data\nbash scripts/opt/tools/process_data_pretrain.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process OpenWebText Corpus Train / Validation Data\n\nbash scripts/llama/tools/process_data_dolly.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process Dolly Train / Validation Data\nbash scripts/llama/tools/process_data_pretrain.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM} # Process OpenWebText Corpus Train / Validation Data\n```\n\n## Base Pre-trained Models\nTo run fine-tuning or standard KD baselines, you need to download the model checkpoints from [Huggingface Model Hub] and put them in `checkpoints/`. For example, for gpt2-large, you can download the model from this [link](https://huggingface.co/gpt2-large/tree/main) and put them in `checkpoints/gpt2-large`.\n\nAlternatively, you can also change the `CKPT` variable in each script to the corresponding model name to enable Transformers to download the base models automatically. For example, set `CKPT=\"gpt2-large\"` in `scripts/gpt2/sft/sft_large.sh` causes download of the gpt2-large base model from the HugginFace model hub.\n\n## Train\nWe provide example commands for GPT-2 models. Similar scripts for model families can be found in `scripts/opt` and `scripts/openllama2`. All our experiments are conducted on 4 \\* 40A100, which can be reduced for small models.\n\n### Baselines\nThe final checkpoints are selected by the **ROUGE-L** scores.\n\n#### Fine-tune the teacher models\n```bash\nbash scripts/gpt2/sft/sft_xlarge.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n#### SFT Baselines\n```bash\nbash scripts/gpt2/sft/sft_base.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/sft/sft_medium.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/sft/sft_large.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### KD Baselines\n```bash\nbash scripts/gpt2/kd/kd_base.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/kd/kd_medium.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/kd/kd_large.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### SeqKD Baselines\nGenerate and process responses with the teacher:\n```bash\nbash scripts/gpt2/tools/generate_data_seqkd.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/tools/process_pseudo_data_seqkd.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\nFine-tune the model with SeqKD:\n```bash\nbash scripts/gpt2/seqkd/seqkd_base.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/seqkd/seqkd_medium.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/seqkd/seqkd_large.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### Student Initialization\nThe final checkpoints are selected by the **validation loss**.\n```bash\nbash scripts/gpt2/init/init_base.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/init/init_medium.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/init/init_large.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### ImitKD Baselines\n```bash\nbash scripts/gpt2/imitkd/imitkd_base_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/imitkd/imitkd_medium_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/imitkd/imitkd_large_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### MiniLLM Baselines\n```bash\nbash scripts/gpt2/minillm/train_base_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/minillm/train_medium_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/minillm/train_large_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n#### GKD Baselines\n```bash\nbash scripts/gpt2/gkd/gkd_base_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/gkd/gkd_medium_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/gkd/gkd_large_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n### DistiLLM\nThe final checkpoints are selected by the **validation loss**.\n```bash\nbash scripts/gpt2/init/init_base.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/init/init_medium.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/init/init_large.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\nThe final checkpoints are selected by the **ROUGE-L** scores.\n```bash\nbash scripts/gpt2/distillm/train_base_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/distillm/train_medium_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\nbash scripts/gpt2/distillm/train_large_xl.sh ${/PATH/TO/DistiLLM} ${MASTER_PORT} ${GPU_NUM}\n```\n\n## Run Evaluation\n```bash\nbash scripts/gpt2/eval/run_eval.sh ${GPU_IDX} ${/PATH/TO/DistiLLM}\nbash scripts/opt/eval/run_eval.sh ${GPU_IDX} ${/PATH/TO/DistiLLM} \nbash scripts/openllama2/eval/run_eval.sh ${GPU_IDX} ${/PATH/TO/DistiLLM} \n```\n\n## Results\nDistiLLM outperforms other KD baselines in terms of both generation performance and training speed for various model families such as GPT-2, OPT, and OpenLLaMA.\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"1394\" src=\"https://github.com/jongwooko/distillm/assets/59277369/19ddac5c-4cd6-4d81-99d8-32723a8e60d8\"\u003e\n\u003c/p\u003e\n\n## Checkpoints (OpenLLaMA-3B)\nWe share the LoRA weights for OpenLLaMA-3B in [google drive](https://drive.google.com/drive/folders/1Yun1aNpn-mz2h-IVH_VdJ1Jhzm0K55Bo?usp=sharing).\n\n## Acknowledgement\nOur code is based on the code of ICLR2024 [MiniLLM: Knowledge Distillation of Large Language Models](https://arxiv.org/pdf/2306.08543.pdf).\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=jongwooko/distillm\u0026type=Date)](https://star-history.com/#jongwooko/distillm\u0026Date)\n\n## BibTeX\nIf you find this repo useful for your research, please consider citing our paper:\n\n```\n@inproceedings{kodistillm,\n  title={DistiLLM: Towards Streamlined Distillation for Large Language Models},\n  author={Ko, Jongwoo and Kim, Sungnyun and Chen, Tianyi and Yun, Se-Young},\n  booktitle={Forty-first International Conference on Machine Learning}\n}\n\n@article{ko2025distillm2,\n      title={DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs}, \n      author={Jongwoo Ko and Tianyi Chen and Sungnyun Kim and Tianyu Ding and Luming Liang and Ilya Zharkov and Se-Young Yun},\n      year={2025},\n      journal={arXiv preprint arXiv:2503.07067}\n}\n```\n\n## Contact\n- Jongwoo Ko: jongwoo.ko@kaist.ac.kr\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjongwooko%2Fdistillm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjongwooko%2Fdistillm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjongwooko%2Fdistillm/lists"}