{"id":13752766,"url":"https://github.com/mit-han-lab/hardware-aware-transformers","last_synced_at":"2025-05-13T18:38:09.822Z","repository":{"id":51768330,"uuid":"260538658","full_name":"mit-han-lab/hardware-aware-transformers","owner":"mit-han-lab","description":"[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing","archived":false,"fork":false,"pushed_at":"2024-07-14T04:10:59.000Z","size":26814,"stargazers_count":334,"open_issues_count":3,"forks_count":52,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-09T20:51:26.656Z","etag":null,"topics":["efficient-model","hardware-aware","machine-translation","natural-language-processing","specialization","transformer"],"latest_commit_sha":null,"homepage":"https://hat.mit.edu","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mit-han-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-01T19:11:29.000Z","updated_at":"2025-05-06T04:49:29.000Z","dependencies_parsed_at":"2024-07-14T05:23:11.532Z","dependency_job_id":"6a3335e2-1477-4f89-9be8-9fe4b9472fde","html_url":"https://github.com/mit-han-lab/hardware-aware-transformers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fhardware-aware-transformers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fhardware-aware-transformers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fhardware-aware-transformers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mit-han-lab%2Fhardware-aware-transformers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mit-han-lab","download_url":"https://codeload.github.com/mit-han-lab/hardware-aware-transformers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254004718,"owners_count":21998111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficient-model","hardware-aware","machine-translation","natural-language-processing","specialization","transformer"],"created_at":"2024-08-03T09:01:10.728Z","updated_at":"2025-05-13T18:38:09.799Z","avatar_url":"https://github.com/mit-han-lab.png","language":"Python","funding_links":[],"categories":["Transformer库与优化","Python"],"sub_categories":[],"readme":"# HAT: Hardware Aware Transformers for Efficient Natural Language Processing [[paper]](https://arxiv.org/abs/2005.14187) [[website]](https://hat.mit.edu) [[video]](https://youtu.be/N_tH1jIbqCw)\n\n\n```\n@inproceedings{hanruiwang2020hat,\n    title     = {HAT: Hardware-Aware Transformers for Efficient Natural Language Processing},\n    author    = {Wang, Hanrui and Wu, Zhanghao and Liu, Zhijian and Cai, Han and Zhu, Ligeng and Gan, Chuang and Han, Song},\n    booktitle = {Annual Conference of the Association for Computational Linguistics},\n    year      = {2020}\n} \n```\n\n\n## News\n- HAT is covered by [VentureBeat](https://venturebeat.com/ai/new-ai-technique-speeds-up-language-models-on-edge-devices/).\n- HAT is covered by [MIT News](https://news.mit.edu/2020/shrinking-deep-learning-carbon-footprint-0807).\n\n\n## Overview\nWe release the PyTorch code and 50 pre-trained models for HAT: Hardware-Aware Transformers. Within a Transformer supernet (SuperTransformer), we efficiently search for a specialized fast model (SubTransformer) for each hardware with latency feedback. The search cost is reduced by over 10000×.\n![teaser](assets/teaser.jpg)\n\nHAT Framework overview:\n![overview](assets/overview.jpg)\n\nHAT models achieve up to 3× speedup and 3.7× smaller model size with no performance loss.\n![results](assets/results.jpg)\n\n\n## Usage\n\n### Installation\nTo install from source and develop locally:\n\n```bash\ngit clone https://github.com/mit-han-lab/hardware-aware-transformers.git\ncd hardware-aware-transformers\npip install --editable .\n```\n\n### Data Preparation\n\n| Task | task_name | Train | Valid | Test | \n|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|\n| WMT'14 En-De | wmt14.en-de | [WMT'16](https://drive.google.com/uc?export=download\u0026id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | newstest2013 | newstest2014 | \n| WMT'14 En-Fr | wmt14.en-fr | [WMT'14](http://statmt.org/wmt14/translation-task.html#Download) | newstest2012\u00262013 | newstest2014 | \n| WMT'19 En-De | wmt19.en-de | [WMT'19](http://www.statmt.org/wmt19/translation-task.html#download) | newstest2017 | newstest2018 | \n| IWSLT'14 De-En | iwslt14.de-en | [IWSLT'14 train set](https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz) | IWSLT'14 valid set | IWSLT14.TED.dev2010 \u003cbr\u003e IWSLT14.TEDX.dev2012 \u003cbr\u003e IWSLT14.TED.tst2010 \u003cbr\u003e IWSLT14.TED.tst2011 \u003cbr\u003e IWSLT14.TED.tst2012 |  \n\nTo download and preprocess data, run:\n```bash\nbash configs/[task_name]/preprocess.sh\n```\n\nIf you find preprocessing time-consuming, you can directly download the preprocessed data we provide:\n```bash\nbash configs/[task_name]/get_preprocessed.sh\n```\n\n\n### Testing\nWe provide pre-trained models (SubTransformers) on the Machine Translation tasks for evaluations. The #Params and FLOPs do not count in the embedding lookup table and the last output layers because they are dependent on tasks.\n\n| Task | Hardware | Latency | #Params\u003cbr\u003e(M) | FLOPs\u003cbr\u003e(G) | BLEU | Sacre\u003cbr\u003eBLEU | model_name | Link |\n|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------|:-----------:|:-----------|:-----------:|\n| WMT'14 En-De | Raspberry Pi ARM Cortex-A72 CPU | 3.5s \u003cbr\u003e 4.0s \u003cbr\u003e 4.5s \u003cbr\u003e 5.0s \u003cbr\u003e 6.0s \u003cbr\u003e 6.9s | 25.22 \u003cbr\u003e 29.42 \u003cbr\u003e 35.72 \u003cbr\u003e 36.77 \u003cbr\u003e 44.13 \u003cbr\u003e 48.33 | 1.53 \u003cbr\u003e 1.78 \u003cbr\u003e 2.19 \u003cbr\u003e 2.26 \u003cbr\u003e 2.70 \u003cbr\u003e 3.02 | 25.8 \u003cbr\u003e 26.9 \u003cbr\u003e 27.6 \u003cbr\u003e 27.8 \u003cbr\u003e 28.2 \u003cbr\u003e 28.4 | 25.6 \u003cbr\u003e 26.6 \u003cbr\u003e 27.1 \u003cbr\u003e 27.2 \u003cbr\u003e 27.6 \u003cbr\u003e 27.8 | HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e3.5s_bleu@\u003c!-- --\u003e25.8 \u003cbr\u003e HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e4.0s_bleu@\u003c!-- --\u003e26.9 \u003cbr\u003e HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e4.5s_bleu@\u003c!-- --\u003e27.6 \u003cbr\u003e HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e5.0s_bleu@\u003c!-- --\u003e27.8 \u003cbr\u003e HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e6.0s_bleu@\u003c!-- --\u003e28.2 \u003cbr\u003e HAT_wmt14ende_raspberrypi@\u003c!-- --\u003e6.9s_bleu@\u003c!-- --\u003e28.4 | [link](https://www.dropbox.com/s/pmfwwg1d1kmfdh5/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/ko0i65k1664p74u/HAT_wmt14ende_raspberrypi@4.0s_bleu@26.9.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/f4y6u9cbcdykeha/HAT_wmt14ende_raspberrypi@4.5s_bleu@27.6.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/av5vycafxo57x6w/HAT_wmt14ende_raspberrypi@5.0s_bleu@27.8.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/ywedqumq91a4ekn/HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/x7iucaotbeald3q/HAT_wmt14ende_raspberrypi@6.9s_bleu@28.4.pt?dl=0) |\n| WMT'14 En-De | Intel Xeon E5-2640 CPU | 137.9ms \u003cbr\u003e 204.2ms \u003cbr\u003e 278.7ms \u003cbr\u003e 340.2ms \u003cbr\u003e 369.6ms \u003cbr\u003e 450.9ms | 30.47 \u003cbr\u003e 35.72 \u003cbr\u003e 40.97 \u003cbr\u003e 46.23 \u003cbr\u003e 51.48 \u003cbr\u003e 56.73 | 1.87 \u003cbr\u003e 2.19 \u003cbr\u003e 2.54 \u003cbr\u003e 2.86 \u003cbr\u003e 3.21 \u003cbr\u003e 3.53 | 25.8 \u003cbr\u003e 27.6 \u003cbr\u003e 27.9 \u003cbr\u003e 28.1 \u003cbr\u003e 28.2 \u003cbr\u003e 28.5 | 25.6 \u003cbr\u003e 27.1 \u003cbr\u003e 27.3 \u003cbr\u003e 27.5 \u003cbr\u003e 27.6 \u003cbr\u003e 27.9 | HAT_wmt14ende_xeon@\u003c!-- --\u003e137.9ms_bleu@\u003c!-- --\u003e25.8 \u003cbr\u003e HAT_wmt14ende_xeon@\u003c!-- --\u003e204.2ms_bleu@\u003c!-- --\u003e27.6 \u003cbr\u003e HAT_wmt14ende_xeon@\u003c!-- --\u003e278.7ms_bleu@\u003c!-- --\u003e27.9 \u003cbr\u003e HAT_wmt14ende_xeon@\u003c!-- --\u003e340.2ms_bleu@\u003c!-- --\u003e28.1 \u003cbr\u003e HAT_wmt14ende_xeon@\u003c!-- --\u003e369.6ms_bleu@\u003c!-- --\u003e28.2 \u003cbr\u003e HAT_wmt14ende_xeon@\u003c!-- --\u003e450.9ms_bleu@\u003c!-- --\u003e28.5 | [link](https://www.dropbox.com/s/bvq3y6igoyxe1t5/HAT_wmt14ende_xeon@137.9ms_bleu@25.8.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/yg12xz504uw2g1s/HAT_wmt14ende_xeon@204.2ms_bleu@27.6.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/l5ljas8zyg9ik65/HAT_wmt14ende_xeon@278.7ms_bleu@27.9.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/fkp61h8jbyt524i/HAT_wmt14ende_xeon@340.2ms_bleu@28.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/3mv3oaddeb132np/HAT_wmt14ende_xeon@369.6ms_bleu@28.2.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/bjldda9nzj7cpni/HAT_wmt14ende_xeon@450.9ms_bleu@28.5.pt?dl=0) |\n| WMT'14 En-De | Nvidia TITAN Xp GPU | 57.1ms \u003cbr\u003e 91.2ms \u003cbr\u003e 126.0ms \u003cbr\u003e 146.7ms \u003cbr\u003e 208.1ms | 30.47 \u003cbr\u003e 35.72 \u003cbr\u003e 40.97 \u003cbr\u003e 51.20 \u003cbr\u003e 49.38 | 1.87 \u003cbr\u003e 2.19 \u003cbr\u003e 2.54 \u003cbr\u003e 3.17 \u003cbr\u003e 3.09 \u003cbr\u003e | 25.8 \u003cbr\u003e 27.6 \u003cbr\u003e 27.9 \u003cbr\u003e 28.1 \u003cbr\u003e 28.5 | 25.6 \u003cbr\u003e 27.1 \u003cbr\u003e 27.3 \u003cbr\u003e 27.5 \u003cbr\u003e 27.8 | HAT_wmt14ende_titanxp@\u003c!-- --\u003e57.1ms_bleu@\u003c!-- --\u003e25.8 \u003cbr\u003e HAT_wmt14ende_titanxp@\u003c!-- --\u003e91.2ms_bleu@\u003c!-- --\u003e27.6 \u003cbr\u003e HAT_wmt14ende_titanxp@\u003c!-- --\u003e126.0ms_bleu@\u003c!-- --\u003e27.9 \u003cbr\u003e HAT_wmt14ende_titanxp@\u003c!-- --\u003e146.7ms_bleu@\u003c!-- --\u003e28.1 \u003cbr\u003e HAT_wmt14ende_titanxp@\u003c!-- --\u003e208.1ms_bleu@\u003c!-- --\u003e28.5 | [link](https://www.dropbox.com/s/71w5t0qidsxqe1e/HAT_wmt14ende_titanxp@57.1ms_bleu@25.8.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/j0hnmxw6xz6tskh/HAT_wmt14ende_titanxp@91.2ms_bleu@27.6.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/pyetdnbz1zvcfg5/HAT_wmt14ende_titanxp@126.0ms_bleu@27.9.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/ixn832oai2k44j9/HAT_wmt14ende_titanxp@146.7ms_bleu@28.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/owpdwmqwpn9jw14/HAT_wmt14ende_titanxp@208.1ms_bleu@28.5.pt?dl=0) |\n| WMT'14 En-Fr | Raspberry Pi ARM Cortex-A72 CPU | 4.3s \u003cbr\u003e 5.3s \u003cbr\u003e 5.8s \u003cbr\u003e 6.9s \u003cbr\u003e 7.8s \u003cbr\u003e 9.1s | 25.22 \u003cbr\u003e 35.72 \u003cbr\u003e 36.77 \u003cbr\u003e 44.13 \u003cbr\u003e 49.38 \u003cbr\u003e 56.73 | 1.53 \u003cbr\u003e 2.23 \u003cbr\u003e 2.26 \u003cbr\u003e 2.70 \u003cbr\u003e 3.09 \u003cbr\u003e 3.57 | 38.8 \u003cbr\u003e 40.1 \u003cbr\u003e 40.6 \u003cbr\u003e 41.1 \u003cbr\u003e 41.4 \u003cbr\u003e 41.8 | 36.0 \u003cbr\u003e 37.3 \u003cbr\u003e 37.8 \u003cbr\u003e 38.3 \u003cbr\u003e 38.5 \u003cbr\u003e 38.9 | HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e4.3s_bleu@\u003c!-- --\u003e38.8 \u003cbr\u003e HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e5.3s_bleu@\u003c!-- --\u003e40.1 \u003cbr\u003e HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e5.8s_bleu@\u003c!-- --\u003e40.6 \u003cbr\u003e HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e6.9s_bleu@\u003c!-- --\u003e41.1 \u003cbr\u003e HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e7.8s_bleu@\u003c!-- --\u003e41.4 \u003cbr\u003e HAT_wmt14enfr_raspberrypi@\u003c!-- --\u003e9.1s_bleu@\u003c!-- --\u003e41.8 | [link](https://www.dropbox.com/s/ku97fwz1oj1a112/HAT_wmt14enfr_raspberrypi@4.3s_bleu@38.8.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/9noopb605fqmjpl/HAT_wmt14enfr_raspberrypi@5.3s_bleu@40.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/vmdkh0ctpdac7gr/HAT_wmt14enfr_raspberrypi@5.8s_bleu@40.6.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/dbo9abn5pnb6qgz/HAT_wmt14enfr_raspberrypi@6.9s_bleu@41.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/x8tsbxbwkk64ejg/HAT_wmt14enfr_raspberrypi@7.8s_bleu@41.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/zbsbl5e96y3t5zl/HAT_wmt14enfr_raspberrypi@9.1s_bleu@41.8.pt?dl=0) |\n| WMT'14 En-Fr | Intel Xeon E5-2640 CPU | 154.7ms \u003cbr\u003e 208.8ms \u003cbr\u003e 329.4ms \u003cbr\u003e 394.5ms \u003cbr\u003e 442.0ms | 30.47 \u003cbr\u003e 35.72 \u003cbr\u003e 44.13 \u003cbr\u003e 51.48 \u003cbr\u003e 56.73 | 1.84 \u003cbr\u003e 2.23 \u003cbr\u003e 2.70 \u003cbr\u003e 3.28 \u003cbr\u003e 3.57 | 39.1 \u003cbr\u003e 40.0 \u003cbr\u003e 41.1 \u003cbr\u003e 41.4 \u003cbr\u003e 41.7 | 36.3 \u003cbr\u003e 37.2 \u003cbr\u003e 38.2 \u003cbr\u003e 38.5 \u003cbr\u003e 38.8 | HAT_wmt14enfr_xeon@\u003c!-- --\u003e154.7ms_bleu@\u003c!-- --\u003e39.1 \u003cbr\u003e HAT_wmt14enfr_xeon@\u003c!-- --\u003e208.8ms_bleu@\u003c!-- --\u003e40.0 \u003cbr\u003e HAT_wmt14enfr_xeon@\u003c!-- --\u003e329.4ms_bleu@\u003c!-- --\u003e41.1 \u003cbr\u003e HAT_wmt14enfr_xeon@\u003c!-- --\u003e394.5ms_bleu@\u003c!-- --\u003e41.4 \u003cbr\u003e HAT_wmt14enfr_xeon@\u003c!-- --\u003e442.0ms_bleu@\u003c!-- --\u003e41.7 | [link](https://www.dropbox.com/s/6xswl0oesuvmqk5/HAT_wmt14enfr_xeon@154.7ms_bleu@39.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/ot3zt8nenda54j7/HAT_wmt14enfr_xeon@208.8ms_bleu@40.0.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/epe2lvus4l40v9o/HAT_wmt14enfr_xeon@329.4ms_bleu@41.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/qnt2qzkb3i054c6/HAT_wmt14enfr_xeon@394.5ms_bleu@41.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/79zcolb53jbhchk/HAT_wmt14enfr_xeon@442.0ms_bleu@41.7.pt?dl=0) |\n| WMT'14 En-Fr | Nvidia TITAN Xp GPU | 69.3ms \u003cbr\u003e 94.9ms \u003cbr\u003e 132.9ms \u003cbr\u003e 168.3ms \u003cbr\u003e 208.3ms | 30.47 \u003cbr\u003e 35.72 \u003cbr\u003e 40.97 \u003cbr\u003e 46.23 \u003cbr\u003e 51.48 | 1.84 \u003cbr\u003e 2.23 \u003cbr\u003e 2.51 \u003cbr\u003e 2.90 \u003cbr\u003e 3.25 | 39.1 \u003cbr\u003e 40.0 \u003cbr\u003e 40.7 \u003cbr\u003e 41.1 \u003cbr\u003e 41.7 | 36.3 \u003cbr\u003e 37.2 \u003cbr\u003e 37.8 \u003cbr\u003e 38.3 \u003cbr\u003e 38.8 | HAT_wmt14enfr_titanxp@\u003c!-- --\u003e69.3ms_bleu@\u003c!-- --\u003e39.1 \u003cbr\u003e HAT_wmt14enfr_titanxp@\u003c!-- --\u003e94.9ms_bleu@\u003c!-- --\u003e40.0 \u003cbr\u003e HAT_wmt14enfr_titanxp@\u003c!-- --\u003e132.9ms_bleu@\u003c!-- --\u003e40.7 \u003cbr\u003e HAT_wmt14enfr_titanxp@\u003c!-- --\u003e168.3ms_bleu@\u003c!-- --\u003e41.1 \u003cbr\u003e HAT_wmt14enfr_titanxp@\u003c!-- --\u003e208.3ms_bleu@\u003c!-- --\u003e41.7 | [link](https://www.dropbox.com/s/hvy255ls277onjw/HAT_wmt14enfr_titanxp@69.3ms_bleu@39.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/rvfv99jbh4n7qys/HAT_wmt14enfr_titanxp@94.9ms_bleu@40.0.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/u6u3u40pr4f5mzh/HAT_wmt14enfr_titanxp@132.9ms_bleu@40.7.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/wlbbmnrl61dx4z7/HAT_wmt14enfr_titanxp@168.3ms_bleu@41.1.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/e41lnsktn5bb2fz/HAT_wmt14enfr_titanxp@208.3ms_bleu@41.7.pt?dl=0) |\n| WMT'19 En-De | Nvidia TITAN Xp GPU | 55.7ms \u003cbr\u003e 93.2ms \u003cbr\u003e 134.5ms \u003cbr\u003e 176.1ms \u003cbr\u003e 204.5ms \u003cbr\u003e 237.8ms | 36.89 \u003cbr\u003e 42.28 \u003cbr\u003e 40.97 \u003cbr\u003e 46.23 \u003cbr\u003e 51.48 \u003cbr\u003e 56.73 | 2.27 \u003cbr\u003e 2.63 \u003cbr\u003e 2.54 \u003cbr\u003e 2.86 \u003cbr\u003e 3.18 \u003cbr\u003e 3.53 | 42.4 \u003cbr\u003e 44.4 \u003cbr\u003e 45.4 \u003cbr\u003e 46.2 \u003cbr\u003e 46.5 \u003cbr\u003e 46.7 | 41.9 \u003cbr\u003e 43.9 \u003cbr\u003e 44.7 \u003cbr\u003e 45.6 \u003cbr\u003e 45.7 \u003cbr\u003e 46.0 | HAT_wmt19ende_titanxp@\u003c!-- --\u003e55.7ms_bleu@\u003c!-- --\u003e42.4 \u003cbr\u003e HAT_wmt19ende_titanxp@\u003c!-- --\u003e93.2ms_bleu@\u003c!-- --\u003e44.4 \u003cbr\u003e HAT_wmt19ende_titanxp@\u003c!-- --\u003e134.5ms_bleu@\u003c!-- --\u003e45.4 \u003cbr\u003e HAT_wmt19ende_titanxp@\u003c!-- --\u003e176.1ms_bleu@\u003c!-- --\u003e46.2 \u003cbr\u003e HAT_wmt19ende_titanxp@\u003c!-- --\u003e204.5ms_bleu@\u003c!-- --\u003e46.5 \u003cbr\u003e HAT_wmt19ende_titanxp@\u003c!-- --\u003e237.8ms_bleu@\u003c!-- --\u003e46.7 | [link](https://www.dropbox.com/s/6pokem8orb75ldh/HAT_wmt19ende_titanxp@55.7ms_bleu@42.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/zgcd70pzf1sle4z/HAT_wmt19ende_titanxp@93.2ms_bleu@44.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/mm827rst6a144zy/HAT_wmt19ende_titanxp@134.5ms_bleu@45.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/y0vov0n9zt50n9c/HAT_wmt19ende_titanxp@176.1ms_bleu@46.2.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/w1si4mgf1e3l8oj/HAT_wmt19ende_titanxp@204.5ms_bleu@46.5.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/rljih3t0hglp39a/HAT_wmt19ende_titanxp@237.8ms_bleu@46.7.pt?dl=0) |\n| IWSLT'14 De-En | Nvidia TITAN Xp GPU | 45.6ms \u003cbr\u003e 74.5ms \u003cbr\u003e 109.0ms \u003cbr\u003e 137.8ms \u003cbr\u003e 168.8ms | 16.82 \u003cbr\u003e 19.98 \u003cbr\u003e 23.13 \u003cbr\u003e 27.33 \u003cbr\u003e 31.54 | 0.78 \u003cbr\u003e 0.93 \u003cbr\u003e 1.13 \u003cbr\u003e 1.32 \u003cbr\u003e 1.52 | 33.4 \u003cbr\u003e 34.2 \u003cbr\u003e 34.5 \u003cbr\u003e 34.7 \u003cbr\u003e 34.8 | 32.5 \u003cbr\u003e 33.3 \u003cbr\u003e 33.6 \u003cbr\u003e 33.8 \u003cbr\u003e 33.9 | HAT_iwslt14deen_titanxp@\u003c!-- --\u003e45.6ms_bleu@\u003c!-- --\u003e33.4 \u003cbr\u003e HAT_iwslt14deen_titanxp@\u003c!-- --\u003e74.5ms_bleu@\u003c!-- --\u003e34.2 \u003cbr\u003e HAT_iwslt14deen_titanxp@\u003c!-- --\u003e109.0ms_bleu@\u003c!-- --\u003e34.5 \u003cbr\u003e HAT_iwslt14deen_titanxp@\u003c!-- --\u003e137.8ms_bleu@\u003c!-- --\u003e34.7 \u003cbr\u003e HAT_iwslt14deen_titanxp@\u003c!-- --\u003e168.8ms_bleu@\u003c!-- --\u003e34.8 | [link](https://www.dropbox.com/s/ntj1gfskish8vz3/HAT_iwslt14deen_titanxp@45.6ms_bleu@33.4.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/gjq46181s3xbz0k/HAT_iwslt14deen_titanxp@74.5ms_bleu@34.2.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/fg3r3tk2vjg0diq/HAT_iwslt14deen_titanxp@109.0ms_bleu@34.5.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/3j5vu5jh71xwec1/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.pt?dl=0) \u003cbr\u003e [link](https://www.dropbox.com/s/5xy9hdjuc5c6sw5/HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8.pt?dl=0) |\n\n\n\n#### Download models:\n```bash\npython download_model.py --model-name=[model_name]\n# for example\npython download_model.py --model-name=HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8\n# to download all models\npython download_model.py --download-all\n```\n\n#### Test BLEU (SacreBLEU) score:\n```bash\nbash configs/[task_name]/test.sh \\\n    [model_file] \\\n    configs/[task_name]/subtransformer/[model_name].yml \\\n    [normal|sacre]\n# for example\nbash configs/wmt14.en-de/test.sh \\\n    ./downloaded_models/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.pt \\\n    configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.yml \\\n    normal\n# another example\nbash configs/iwslt14.de-en/test.sh \\\n    ./downloaded_models/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.pt \\\n    configs/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.yml \\\n    sacre\n```\n\n#### Test Latency, model size and FLOPs\nTo profile the latency, model size and FLOPs (FLOPs profiling needs [torchprofile](https://github.com/mit-han-lab/torchprofile.git)), you can run the commands below. By default, only the model size is profiled:\n```bash\npython train.py \\\n    --configs=configs/[task_name]/subtransformer/[model_name].yml \\\n    --sub-configs=configs/[task_name]/subtransformer/common.yml \\\n    [--latgpu|--latcpu|--profile-flops]\n# for example\npython train.py \\\n    --configs=configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.yml \\\n    --sub-configs=configs/wmt14.en-de/subtransformer/common.yml --latcpu\n# another example\npython train.py \\\n    --configs=configs/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.yml \\\n    --sub-configs=configs/iwslt14.de-en/subtransformer/common.yml --profile-flops\n```\n\n\n### Training\n\n#### 1. Train a SuperTransformer\nThe SuperTransformer is a supernet that contains many SubTransformers with weight-sharing.\nBy default, we train WMT tasks on 8 GPUs. Please adjust `--update-freq` according to GPU numbers (`128/x` for x GPUs). Note that for IWSLT, we only train on one GPU with `--update-freq=1`. \n```bash\npython train.py --configs=configs/[task_name]/supertransformer/[search_space].yml\n# for example\npython train.py --configs=configs/wmt14.en-de/supertransformer/space0.yml\n# another example\nCUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --configs=configs/wmt14.en-fr/supertransformer/space0.yml --update-freq=32\n```\nIn the `--configs` file, SuperTransformer model architecture, SubTransformer search space and training settings are specified.\n\nWe also provide pre-trained SuperTransformers for the four tasks as below. To download, run `python download_model.py --model-name=[model_name]`.\n\n| Task | search_space | model_name | Link |\n|:-----------:|:-----------:|:-----------|:-----------:|\n| WMT'14 En-De | space0 | HAT_wmt14ende_super_space0 | [link](https://www.dropbox.com/s/pkdddxvvpw9a4vq/HAT_wmt14ende_super_space0.pt?dl=0) |\n| WMT'14 En-Fr | space0 | HAT_wmt14enfr_super_space0 | [link](https://www.dropbox.com/s/asegvw9qzpxui6a/HAT_wmt14enfr_super_space0.pt?dl=0) |\n| WMT'19 En-De | space0 | HAT_wmt19ende_super_space0 | [link](https://www.dropbox.com/s/uc0lw6jdep1vazc/HAT_wmt19ende_super_space0.pt?dl=0) |\n| IWSLT'14 De-En | space1 | HAT_iwslt14deen_super_space1 | [link](https://www.dropbox.com/s/yv0mn8ns36gxkhs/HAT_iwslt14deen_super_space1.pt?dl=0) |\n\n\n#### 2. Evolutionary Search\nThe second step of HAT is to perform an evolutionary search in the trained SuperTransformer with a hardware latency constraint in the loop. We train a latency predictor to get fast and accurate latency feedback.\n\n##### 2.1 Generate a latency dataset\n```bash\npython latency_dataset.py --configs=configs/[task_name]/latency_dataset/[hardware_name].yml\n# for example\npython latency_dataset.py --configs=configs/wmt14.en-de/latency_dataset/cpu_raspberrypi.yml\n```\n`hardware_name` can be `cpu_raspberrypi`, `cpu_xeon` and `gpu_titanxp`. The `--configs` file contains the design space in which we sample models to get (model_architecture, real_latency) data pairs.\n\nWe provide the datasets we collect in the [latency_dataset](./latency_dataset) folder.\n\n##### 2.2 Train a latency predictor\nThen train a predictor with collected dataset:\n```bash\npython latency_predictor.py --configs=configs/[task_name]/latency_predictor/[hardware_name].yml\n# for example\npython latency_predictor.py --configs=configs/wmt14.en-de/latency_predictor/cpu_raspberrypi.yml\n```\nThe `--configs` file contains the predictor's model architecture and training settings.\nWe provide pre-trained predictors in [latency_dataset/predictors](./latency_dataset/predictors) folder.\n\n##### 2.3 Run evolutionary search with a latency constraint\n```bash\npython evo_search.py --configs=[supertransformer_config_file].yml --evo-configs=[evo_settings].yml\n# for example\npython evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_titanxp.yml\n```\nThe `--configs` file points to the SuperTransformer training config file. `--evo-configs` file includes evolutionary search settings, and also specifies the desired latency constraint `latency-constraint`. Note that the `feature-norm` and `lat-norm` here should be the same as those when training the latency predictor. `--write-config-path` specifies the location to write out the searched SubTransformer architecture. \n\n\n#### 3. Train a Searched SubTransformer\nFinally, we train the search SubTransformer from scratch:\n```bash\npython train.py --configs=[subtransformer_architecture].yml --sub-configs=configs/[task_name]/subtransformer/common.yml\n# for example\npython train.py --configs=configs/wmt14.en-de/subtransformer/wmt14ende_titanxp@200ms.yml --sub-configs=configs/wmt14.en-de/subtransformer/common.yml\n```\n\n`--configs` points to the `--write-config-path` in step 2.3. `--sub-configs` contains training settings for the SubTransformer.\n\nAfter training a SubTransformer, you can test its performance with the methods in [Testing](#testing) section.\n\n### Dependencies\n* Python \u003e= 3.6\n* [PyTorch](http://pytorch.org/) \u003e= 1.0.0\n* configargparse \u003e= 0.14\n* New model training requires NVIDIA GPUs and [NCCL](https://github.com/NVIDIA/nccl)\n\n## Related works on efficient deep learning\n\n[MicroNet for Efficient Language Modeling](https://arxiv.org/abs/2005.07877)\n\n[Lite Transformer with Long-Short Range Attention](https://arxiv.org/abs/2004.11886)\n\n[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494)\n\n[Once-for-All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791)\n\n[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332)\n\n## Contact\nIf you have any questions, feel free to contact [Hanrui Wang](https://hanruiwang.me) through Email ([hanrui@mit.edu](mailto:hanrui@mit.edu)) or Github issues. Pull requests are highly welcomed! \n\n## Licence\n\nThis repository is released under the MIT license. See [LICENSE](./LICENSE) for more information.\n\n## Acknowledgements\n\nWe are thankful to [fairseq](https://github.com/pytorch/fairseq) as the backbone of this repo.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fhardware-aware-transformers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmit-han-lab%2Fhardware-aware-transformers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmit-han-lab%2Fhardware-aware-transformers/lists"}