{"id":13752771,"url":"https://github.com/mit-han-lab/lite-transformer","last_synced_at":"2025-05-09T20:34:15.250Z","repository":{"id":49633980,"uuid":"258533881","full_name":"mit-han-lab/lite-transformer","owner":"mit-han-lab","description":"[ICLR 2020] Lite Transformer with Long-Short Range Attention","archived":true,"fork":false,"pushed_at":"2024-07-11T20:50:46.000Z","size":1389,"stargazers_count":598,"open_issues_count":0,"forks_count":81,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-11-16T05:32:15.757Z","etag":null,"topics":["nlp","pytorch","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2004.11886","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-han-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-24T14:23:35.000Z","updated_at":"2024-11-09T16:10:26.000Z","dependencies_parsed_at":"2024-08-03T09:04:30.863Z","dependency_job_id":"2edaaa8f-083e-496f-a3bd-681ebb75355f","html_url":"https://github.com/mit-han-lab/lite-transformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Flite-transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Flite-transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Flite-transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Flite-transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-han-lab","download_url":"https://codeload.github.com/mit-han-lab/lite-transformer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253321738,"owners_count":21890455,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","pytorch","transformer"],"created_at":"2024-08-03T09:01:10.803Z","updated_at":"2025-05-09T20:34:12.145Z","avatar_url":"https://github.com/mit-han-lab.png","language":"Python","funding_links":[],"categories":["Transformer库与优化","Python"],"sub_categories":[],"readme":"# Lite Transformer\n\n### [paper](https://arxiv.org/abs/2004.11886) | [website](https://hanlab.mit.edu/projects/litetransformer/) | [slides](https://hanlab.mit.edu/projects/litetransformer/Presentation_LiteTransformer.pdf)\n\n```\n@inproceedings{Wu2020LiteTransformer,\n  title={Lite Transformer with Long-Short Range Attention},\n  author={Zhanghao Wu* and Zhijian Liu* and Ji Lin and Yujun Lin and Song Han},\n  booktitle={International Conference on Learning Representations (ICLR)},\n  year={2020}\n}\n```\n\n## Overview\n\n![overview](figures/overview.png?raw=true \"overview\")\n\n## How to Use\n\n### Prerequisite\n\n* Python version \u003e= 3.6\n* [PyTorch](http://pytorch.org/) version \u003e= 1.0.0\n* configargparse \u003e= 0.14\n* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)\n\n### Installation\n\n1. Codebase\n    \n    To install fairseq from source and develop locally:\n    ```bash\n    pip install --editable .\n    ```\n\n2. Costumized Modules\n\n    We also need to build the `lightconv` and `dynamicconv` for GPU support.\n\n    Lightconv_layer\n    ```bash\n    cd fairseq/modules/lightconv_layer\n    python cuda_function_gen.py\n    python setup.py install\n    ```\n    Dynamicconv_layer\n    ```bash\n    cd fairseq/modules/dynamicconv_layer\n    python cuda_function_gen.py\n    python setup.py install\n    ```\n\n### Data Preparation\n#### IWSLT'14 De-En\nWe follow the data preparation in [fairseq](github.com/pytorch/fairseq). To download and preprocess the data, one can run\n```bash\nbash configs/iwslt14.de-en/prepare.sh\n```\n\n#### WMT'14 En-Fr\nWe follow the data pre-processing in [fairseq](github.com/pytorch/fairseq).  To download and preprocess the data, one can run\n```bash\nbash configs/wmt14.en-fr/prepare.sh\n```\n\n#### WMT'16 En-De\nWe follow the data pre-processing in [fairseq](github.com/pytorch/fairseq). One should first download the preprocessed data from the [Google Drive](https://drive.google.com/uc?export=download\u0026id=0B_bZck-ksdkpM25jRUN2X2UxMm8) provided by Google. To binarized the data, one can run\n```bash\nbash configs/wmt16.en-de/prepare.sh [path to the downloaded zip file]\n```\n\n#### WIKITEXT-103\nAs the language model task has many additional codes, we place it in another branch: `language-model`.\nWe follow the data pre-processing in [fairseq](github.com/pytorch/fairseq).  To download and preprocess the data, one can run\n```bash\ngit checkout language-model\nbash configs/wikitext-103/prepare.sh\n```\n\n### Testing\n\nFor example, to test the models on WMT'14 En-Fr, one can run\n```bash\nconfigs/wmt14.en-fr/test.sh [path to the model checkpoints] [gpu-id] [test|valid]\n```\nFor instance, to evaluate Lite Transformer on GPU 0 (with the BLEU score on test set of WMT'14 En-Fr), one can run\n```bash\nconfigs/wmt14.en-fr/test.sh embed496/ 0 test\n```\nWe provide several pretrained models at the bottom. You can download the model and extract the file by\n```bash\ntar -xzvf [filename]\n```\n\n### Training\nWe provided several examples to train Lite Transformer with this repo:\n\nTo train Lite Transformer on WMT'14 En-Fr (with 8 GPUs), one can run\n```bash\npython train.py data/binary/wmt14_en_fr --configs configs/wmt14.en-fr/attention/multibranch_v2/embed496.yml\n```\nTo train Lite Transformer with less GPUs, e.g. 4 GPUS, one can run\n```bash\nCUDA_VISIBLE_DEVICES=0,1,2,3 python train.py data/binary/wmt14_en_fr --configs configs/wmt14.en-fr/attention/multibranch_v2/embed496.yml --update-freq 32\n```\nIn general, to train a model, one can run\n```bash\npython train.py [path to the data binary] --configs [path to config file] [override options]\n```\nNote that `--update-freq` should be adjusted according to the GPU numbers (16 for 8 GPUs, 32 for 4 GPUs).\n\n### Distributed Training (optional)\n\nTo train Lite Transformer in distributed manner. For example on two GPU nodes with totally 16 GPUs.\n```bash\n# On host1\npython -m torch.distributed.launch \\\n        --nproc_per_node=8 \\\n        --nnodes=2 --node_rank=0 \\\n        --master_addr=host1 --master_port=8080 \\\n        train.py data/binary/wmt14_en_fr \\\n        --configs configs/wmt14.en-fr/attention/multibranch_v2/embed496.yml \\\n        --distributed-no-spawn \\\n        --update-freq 8\n# On host2\npython -m torch.distributed.launch \\\n        --nproc_per_node=8 \\\n        --nnodes=2 --node_rank=1 \\\n        --master_addr=host1 --master_port=8080 \\\n        train.py data/binary/wmt14_en_fr \\\n        --configs configs/wmt14.en-fr/attention/multibranch_v2/embed496.yml \\\n        --distributed-no-spawn \\\n        --update-freq 8\n```\n\n## Models\nWe provide the checkpoints for our Lite Transformer reported in the paper:\n| Dataset | \\#Mult-Adds | Test Score | Model and Test Set |\n|:--:|:--:|:--:|:--:|\n| [WMT'14 En-Fr](http://statmt.org/wmt14/translation-task.html#Download) | 90M | 35.3 |[download](https://drive.google.com/open?id=10Iotg0dnt9sJTqEghtNhIIwJL1R3LYBe) |\n| | 360M | 39.1 | [download](https://drive.google.com/open?id=10WMpIrdnDRWa_7afYJsqiiONdWlTLrJs) |\n| | 527M | 39.6 | [download](https://drive.google.com/open?id=10Wfv80wOTkL-hkXNyxM8IVlcroHuuUvA) |\n| [WMT'16 En-De](https://statmt.org/wmt16/translation-task.html#Download) | 90M | 22.5 | [download](https://drive.google.com/open?id=10ArxzUsMZ8gDe6zw5d3xTHYmeUasys1q) |\n| | 360M | 25.6 | [download](https://drive.google.com/open?id=10Fd1iXFiOtuwjxm1K8S2RqiEeCuDhxYn) |\n| | 527M | 26.5 | [download](https://drive.google.com/open?id=10HYj-rcJ4CIPp-BtpckkmYIgzH5Urrz0)|\n| [CNN / DailyMail](https://github.com/abisee/cnn-dailymail) | 800M | 38.3 (R-L) | [download](https://drive.google.com/open?id=14sQZ_H7HMQGhL7Ko1WkktWUvbEslOeu9)|\n| [WIKITEXT-103](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset) | 1147M | 22.2 (PPL) | [download](https://drive.google.com/file/d/14gT1j5VERgtDFfo2Ef1yOiliT9Y2eKe_/view?usp=sharing)|\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Flite-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-han-lab%2Flite-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Flite-transformer/lists"}