{"id":20046602,"url":"https://github.com/ncsoft/align-to-distill","last_synced_at":"2025-05-05T09:31:46.130Z","repository":{"id":229120255,"uuid":"774167571","full_name":"ncsoft/Align-to-Distill","owner":"ncsoft","description":"Official implementation of \"Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation\" (LREC-COLING 2024)","archived":false,"fork":false,"pushed_at":"2024-04-24T11:11:06.000Z","size":932,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-24T15:32:00.328Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ncsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-19T04:03:21.000Z","updated_at":"2024-04-24T11:23:13.000Z","dependencies_parsed_at":"2024-04-24T12:27:37.615Z","dependency_job_id":"f7b3ff31-d618-41af-88a8-593d350d1680","html_url":"https://github.com/ncsoft/Align-to-Distill","commit_stats":null,"previous_names":["ncsoft/align-to-distill"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2FAlign-to-Distill","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2FAlign-to-Distill/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2FAlign-to-Distill/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2FAlign-to-Distill/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ncsoft","download_url":"https://codeload.github.com/ncsoft/Align-to-Distill/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224437955,"owners_count":17311110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T11:24:58.736Z","updated_at":"2024-11-13T11:24:59.364Z","avatar_url":"https://github.com/ncsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation\n\nThis is the PyTorch implementation of paper: **[Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation (LREC-COLING 2024)](\u003chttps://arxiv.org/abs/2403.01479\u003e)**. \n\nWe carry out our experiments on standard Transformer with the  [fairseq](https://github.com/pytorch/fairseq) toolkit. If you use any source code included in this repo in your work, please cite the following paper.\n\n```bibtex\n@misc{jin2024aligntodistill,\n      title={Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation}, \n      author={Heegon Jin and Seonil Son and Jemin Park and Youngseok Kim and Hyungjong Noh and Yeonsoo Lee},\n      year={2024},\n      eprint={2403.01479},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n# Requirements and Installation\n\n* [PyTorch](http://pytorch.org/) version \u003e= 1.10.0\n* Python version \u003e= 3.8\n* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)\n* **To install fairseq** and develop locally:\n\n``` bash\ngit clone this_repository\ncd fairseq\npip install --editable ./\n```\n\nWe require a few additional Python dependencies:\n\n``` bash\npip install sacremoses einops\n```\n\n# Prepare dataset\n\n### IWSLT'14 German to English\n\nThe following instructions can be used to train a Transformer model on the [IWSLT'14 German to English dataset](http://workshop2014.iwslt.org/downloads/proceeding.pdf).\n\nFirst download and preprocess the data:\n```bash\n# Download and prepare the data\ncd examples/translation/\nbash prepare-iwslt14.sh\ncd ../..\n\n# Preprocess/binarize the data\nTEXT=examples/translation/iwslt14.tokenized.de-en\nfairseq-preprocess --source-lang de --target-lang en \\\n    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \\\n    --destdir data-bin/iwslt14.tokenized.de-en \\\n    --workers 20\n```\n# Training\nFirst, you need train a teacher model, the training script is the same with fairseq. \nSecond, use the trained teacher model to train an A2D student model. \nThe '--teacher-ckpt-path' argument is used to specify the path to the trained teacher model checkpoint from the first step.\n\nAdjustable arguments for experiments:\n- add '--alpha' (default=0.5) : This argument controls the weight between the cross-entropy loss and the response-based distillation loss.\n- add '--beta' (default=1) : This argument controls the weight between the response-based loss and the attention distillation loss.\n- add '--decay' (default=0.9) : This argument sets the decay rate for the attention distillation loss.\n\nTwo scripts are provided for running the training processes:\n- train_teacher.sh: This script is used to train the teacher model.\n- train_student.sh: This script is used to train the A2D student model using the trained teacher model.\n\n## Train a teacher model\n\n```bash\nbash train_teacher.sh\n```\n\n## Train a student model (with A2D method)\n\n```bash\nbash train_student.sh\n```\n\n## Test a student model (with A2D method)\n\n```bash\nbash test.sh\n```\n\n# Citation\n\nPlease cite as:\n\n``` bibtex\n@misc{jin2024aligntodistill,\n      title={Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation}, \n      author={Heegon Jin and Seonil Son and Jemin Park and Youngseok Kim and Hyungjong Noh and Yeonsoo Lee},\n      year={2024},\n      eprint={2403.01479},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Falign-to-distill","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncsoft%2Falign-to-distill","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Falign-to-distill/lists"}