https://github.com/mit-han-lab/hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
https://github.com/mit-han-lab/hardware-aware-transformers
efficient-model hardware-aware machine-translation natural-language-processing specialization transformer
Last synced: about 1 year ago
JSON representation
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Host: GitHub
URL: https://github.com/mit-han-lab/hardware-aware-transformers
Owner: mit-han-lab
License: other
Created: 2020-05-01T19:11:29.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2024-07-14T04:10:59.000Z (almost 2 years ago)
Last Synced: 2025-05-09T20:51:26.656Z (about 1 year ago)
Topics: efficient-model, hardware-aware, machine-translation, natural-language-processing, specialization, transformer
Language: Python
Homepage: https://hat.mit.edu
Size: 25.6 MB
Stars: 334
Watchers: 13
Forks: 52
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

StarryDivineSky - mit-han-lab/hardware-aware-transformers
README

          # HAT: Hardware Aware Transformers for Efficient Natural Language Processing [[paper]](https://arxiv.org/abs/2005.14187) [[website]](https://hat.mit.edu) [[video]](https://youtu.be/N_tH1jIbqCw)

```

@inproceedings{hanruiwang2020hat,

    title     = {HAT: Hardware-Aware Transformers for Efficient Natural Language Processing},

    author    = {Wang, Hanrui and Wu, Zhanghao and Liu, Zhijian and Cai, Han and Zhu, Ligeng and Gan, Chuang and Han, Song},

    booktitle = {Annual Conference of the Association for Computational Linguistics},

    year      = {2020}

} 

```

## News

- HAT is covered by [VentureBeat](https://venturebeat.com/ai/new-ai-technique-speeds-up-language-models-on-edge-devices/).

- HAT is covered by [MIT News](https://news.mit.edu/2020/shrinking-deep-learning-carbon-footprint-0807).

## Overview

We release the PyTorch code and 50 pre-trained models for HAT: Hardware-Aware Transformers. Within a Transformer supernet (SuperTransformer), we efficiently search for a specialized fast model (SubTransformer) for each hardware with latency feedback. The search cost is reduced by over 10000×.

![teaser](assets/teaser.jpg)

HAT Framework overview:

![overview](assets/overview.jpg)

HAT models achieve up to 3× speedup and 3.7× smaller model size with no performance loss.

![results](assets/results.jpg)

## Usage

### Installation

To install from source and develop locally:

```bash

git clone https://github.com/mit-han-lab/hardware-aware-transformers.git

cd hardware-aware-transformers

pip install --editable .

```

### Data Preparation

| Task | task_name | Train | Valid | Test | 

|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|

| WMT'14 En-De | wmt14.en-de | [WMT'16](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | newstest2013 | newstest2014 | 

| WMT'14 En-Fr | wmt14.en-fr | [WMT'14](http://statmt.org/wmt14/translation-task.html#Download) | newstest2012&2013 | newstest2014 | 

| WMT'19 En-De | wmt19.en-de | [WMT'19](http://www.statmt.org/wmt19/translation-task.html#download) | newstest2017 | newstest2018 | 

| IWSLT'14 De-En | iwslt14.de-en | [IWSLT'14 train set](https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz) | IWSLT'14 valid set | IWSLT14.TED.dev2010 
 IWSLT14.TEDX.dev2012 
 IWSLT14.TED.tst2010 
 IWSLT14.TED.tst2011 
 IWSLT14.TED.tst2012 |  

To download and preprocess data, run:

```bash

bash configs/[task_name]/preprocess.sh

```

If you find preprocessing time-consuming, you can directly download the preprocessed data we provide:

```bash

bash configs/[task_name]/get_preprocessed.sh

```

### Testing

We provide pre-trained models (SubTransformers) on the Machine Translation tasks for evaluations. The #Params and FLOPs do not count in the embedding lookup table and the last output layers because they are dependent on tasks.

| Task | Hardware | Latency | #Params
(M) | FLOPs
(G) | BLEU | Sacre
BLEU | model_name | Link |

|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------|:-----------:|:-----------|:-----------:|

| WMT'14 En-De | Raspberry Pi ARM Cortex-A72 CPU | 3.5s 
 4.0s 
 4.5s 
 5.0s 
 6.0s 
 6.9s | 25.22 
 29.42 
 35.72 
 36.77 
 44.13 
 48.33 | 1.53 
 1.78 
 2.19 
 2.26 
 2.70 
 3.02 | 25.8 
 26.9 
 27.6 
 27.8 
 28.2 
 28.4 | 25.6 
 26.6 
 27.1 
 27.2 
 27.6 
 27.8 | HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8 
 HAT_wmt14ende_raspberrypi@4.0s_bleu@26.9 
 HAT_wmt14ende_raspberrypi@4.5s_bleu@27.6 
 HAT_wmt14ende_raspberrypi@5.0s_bleu@27.8 
 HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2 
 HAT_wmt14ende_raspberrypi@6.9s_bleu@28.4 | [link](https://www.dropbox.com/s/pmfwwg1d1kmfdh5/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.pt?dl=0) 
 [link](https://www.dropbox.com/s/ko0i65k1664p74u/HAT_wmt14ende_raspberrypi@4.0s_bleu@26.9.pt?dl=0) 
 [link](https://www.dropbox.com/s/f4y6u9cbcdykeha/HAT_wmt14ende_raspberrypi@4.5s_bleu@27.6.pt?dl=0) 
 [link](https://www.dropbox.com/s/av5vycafxo57x6w/HAT_wmt14ende_raspberrypi@5.0s_bleu@27.8.pt?dl=0) 
 [link](https://www.dropbox.com/s/ywedqumq91a4ekn/HAT_wmt14ende_raspberrypi@6.0s_bleu@28.2.pt?dl=0) 
 [link](https://www.dropbox.com/s/x7iucaotbeald3q/HAT_wmt14ende_raspberrypi@6.9s_bleu@28.4.pt?dl=0) |

| WMT'14 En-De | Intel Xeon E5-2640 CPU | 137.9ms 
 204.2ms 
 278.7ms 
 340.2ms 
 369.6ms 
 450.9ms | 30.47 
 35.72 
 40.97 
 46.23 
 51.48 
 56.73 | 1.87 
 2.19 
 2.54 
 2.86 
 3.21 
 3.53 | 25.8 
 27.6 
 27.9 
 28.1 
 28.2 
 28.5 | 25.6 
 27.1 
 27.3 
 27.5 
 27.6 
 27.9 | HAT_wmt14ende_xeon@137.9ms_bleu@25.8 
 HAT_wmt14ende_xeon@204.2ms_bleu@27.6 
 HAT_wmt14ende_xeon@278.7ms_bleu@27.9 
 HAT_wmt14ende_xeon@340.2ms_bleu@28.1 
 HAT_wmt14ende_xeon@369.6ms_bleu@28.2 
 HAT_wmt14ende_xeon@450.9ms_bleu@28.5 | [link](https://www.dropbox.com/s/bvq3y6igoyxe1t5/HAT_wmt14ende_xeon@137.9ms_bleu@25.8.pt?dl=0) 
 [link](https://www.dropbox.com/s/yg12xz504uw2g1s/HAT_wmt14ende_xeon@204.2ms_bleu@27.6.pt?dl=0) 
 [link](https://www.dropbox.com/s/l5ljas8zyg9ik65/HAT_wmt14ende_xeon@278.7ms_bleu@27.9.pt?dl=0) 
 [link](https://www.dropbox.com/s/fkp61h8jbyt524i/HAT_wmt14ende_xeon@340.2ms_bleu@28.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/3mv3oaddeb132np/HAT_wmt14ende_xeon@369.6ms_bleu@28.2.pt?dl=0) 
 [link](https://www.dropbox.com/s/bjldda9nzj7cpni/HAT_wmt14ende_xeon@450.9ms_bleu@28.5.pt?dl=0) |

| WMT'14 En-De | Nvidia TITAN Xp GPU | 57.1ms 
 91.2ms 
 126.0ms 
 146.7ms 
 208.1ms | 30.47 
 35.72 
 40.97 
 51.20 
 49.38 | 1.87 
 2.19 
 2.54 
 3.17 
 3.09 
 | 25.8 
 27.6 
 27.9 
 28.1 
 28.5 | 25.6 
 27.1 
 27.3 
 27.5 
 27.8 | HAT_wmt14ende_titanxp@57.1ms_bleu@25.8 
 HAT_wmt14ende_titanxp@91.2ms_bleu@27.6 
 HAT_wmt14ende_titanxp@126.0ms_bleu@27.9 
 HAT_wmt14ende_titanxp@146.7ms_bleu@28.1 
 HAT_wmt14ende_titanxp@208.1ms_bleu@28.5 | [link](https://www.dropbox.com/s/71w5t0qidsxqe1e/HAT_wmt14ende_titanxp@57.1ms_bleu@25.8.pt?dl=0) 
 [link](https://www.dropbox.com/s/j0hnmxw6xz6tskh/HAT_wmt14ende_titanxp@91.2ms_bleu@27.6.pt?dl=0) 
 [link](https://www.dropbox.com/s/pyetdnbz1zvcfg5/HAT_wmt14ende_titanxp@126.0ms_bleu@27.9.pt?dl=0) 
 [link](https://www.dropbox.com/s/ixn832oai2k44j9/HAT_wmt14ende_titanxp@146.7ms_bleu@28.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/owpdwmqwpn9jw14/HAT_wmt14ende_titanxp@208.1ms_bleu@28.5.pt?dl=0) |

| WMT'14 En-Fr | Raspberry Pi ARM Cortex-A72 CPU | 4.3s 
 5.3s 
 5.8s 
 6.9s 
 7.8s 
 9.1s | 25.22 
 35.72 
 36.77 
 44.13 
 49.38 
 56.73 | 1.53 
 2.23 
 2.26 
 2.70 
 3.09 
 3.57 | 38.8 
 40.1 
 40.6 
 41.1 
 41.4 
 41.8 | 36.0 
 37.3 
 37.8 
 38.3 
 38.5 
 38.9 | HAT_wmt14enfr_raspberrypi@4.3s_bleu@38.8 
 HAT_wmt14enfr_raspberrypi@5.3s_bleu@40.1 
 HAT_wmt14enfr_raspberrypi@5.8s_bleu@40.6 
 HAT_wmt14enfr_raspberrypi@6.9s_bleu@41.1 
 HAT_wmt14enfr_raspberrypi@7.8s_bleu@41.4 
 HAT_wmt14enfr_raspberrypi@9.1s_bleu@41.8 | [link](https://www.dropbox.com/s/ku97fwz1oj1a112/HAT_wmt14enfr_raspberrypi@4.3s_bleu@38.8.pt?dl=0) 
 [link](https://www.dropbox.com/s/9noopb605fqmjpl/HAT_wmt14enfr_raspberrypi@5.3s_bleu@40.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/vmdkh0ctpdac7gr/HAT_wmt14enfr_raspberrypi@5.8s_bleu@40.6.pt?dl=0) 
 [link](https://www.dropbox.com/s/dbo9abn5pnb6qgz/HAT_wmt14enfr_raspberrypi@6.9s_bleu@41.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/x8tsbxbwkk64ejg/HAT_wmt14enfr_raspberrypi@7.8s_bleu@41.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/zbsbl5e96y3t5zl/HAT_wmt14enfr_raspberrypi@9.1s_bleu@41.8.pt?dl=0) |

| WMT'14 En-Fr | Intel Xeon E5-2640 CPU | 154.7ms 
 208.8ms 
 329.4ms 
 394.5ms 
 442.0ms | 30.47 
 35.72 
 44.13 
 51.48 
 56.73 | 1.84 
 2.23 
 2.70 
 3.28 
 3.57 | 39.1 
 40.0 
 41.1 
 41.4 
 41.7 | 36.3 
 37.2 
 38.2 
 38.5 
 38.8 | HAT_wmt14enfr_xeon@154.7ms_bleu@39.1 
 HAT_wmt14enfr_xeon@208.8ms_bleu@40.0 
 HAT_wmt14enfr_xeon@329.4ms_bleu@41.1 
 HAT_wmt14enfr_xeon@394.5ms_bleu@41.4 
 HAT_wmt14enfr_xeon@442.0ms_bleu@41.7 | [link](https://www.dropbox.com/s/6xswl0oesuvmqk5/HAT_wmt14enfr_xeon@154.7ms_bleu@39.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/ot3zt8nenda54j7/HAT_wmt14enfr_xeon@208.8ms_bleu@40.0.pt?dl=0) 
 [link](https://www.dropbox.com/s/epe2lvus4l40v9o/HAT_wmt14enfr_xeon@329.4ms_bleu@41.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/qnt2qzkb3i054c6/HAT_wmt14enfr_xeon@394.5ms_bleu@41.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/79zcolb53jbhchk/HAT_wmt14enfr_xeon@442.0ms_bleu@41.7.pt?dl=0) |

| WMT'14 En-Fr | Nvidia TITAN Xp GPU | 69.3ms 
 94.9ms 
 132.9ms 
 168.3ms 
 208.3ms | 30.47 
 35.72 
 40.97 
 46.23 
 51.48 | 1.84 
 2.23 
 2.51 
 2.90 
 3.25 | 39.1 
 40.0 
 40.7 
 41.1 
 41.7 | 36.3 
 37.2 
 37.8 
 38.3 
 38.8 | HAT_wmt14enfr_titanxp@69.3ms_bleu@39.1 
 HAT_wmt14enfr_titanxp@94.9ms_bleu@40.0 
 HAT_wmt14enfr_titanxp@132.9ms_bleu@40.7 
 HAT_wmt14enfr_titanxp@168.3ms_bleu@41.1 
 HAT_wmt14enfr_titanxp@208.3ms_bleu@41.7 | [link](https://www.dropbox.com/s/hvy255ls277onjw/HAT_wmt14enfr_titanxp@69.3ms_bleu@39.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/rvfv99jbh4n7qys/HAT_wmt14enfr_titanxp@94.9ms_bleu@40.0.pt?dl=0) 
 [link](https://www.dropbox.com/s/u6u3u40pr4f5mzh/HAT_wmt14enfr_titanxp@132.9ms_bleu@40.7.pt?dl=0) 
 [link](https://www.dropbox.com/s/wlbbmnrl61dx4z7/HAT_wmt14enfr_titanxp@168.3ms_bleu@41.1.pt?dl=0) 
 [link](https://www.dropbox.com/s/e41lnsktn5bb2fz/HAT_wmt14enfr_titanxp@208.3ms_bleu@41.7.pt?dl=0) |

| WMT'19 En-De | Nvidia TITAN Xp GPU | 55.7ms 
 93.2ms 
 134.5ms 
 176.1ms 
 204.5ms 
 237.8ms | 36.89 
 42.28 
 40.97 
 46.23 
 51.48 
 56.73 | 2.27 
 2.63 
 2.54 
 2.86 
 3.18 
 3.53 | 42.4 
 44.4 
 45.4 
 46.2 
 46.5 
 46.7 | 41.9 
 43.9 
 44.7 
 45.6 
 45.7 
 46.0 | HAT_wmt19ende_titanxp@55.7ms_bleu@42.4 
 HAT_wmt19ende_titanxp@93.2ms_bleu@44.4 
 HAT_wmt19ende_titanxp@134.5ms_bleu@45.4 
 HAT_wmt19ende_titanxp@176.1ms_bleu@46.2 
 HAT_wmt19ende_titanxp@204.5ms_bleu@46.5 
 HAT_wmt19ende_titanxp@237.8ms_bleu@46.7 | [link](https://www.dropbox.com/s/6pokem8orb75ldh/HAT_wmt19ende_titanxp@55.7ms_bleu@42.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/zgcd70pzf1sle4z/HAT_wmt19ende_titanxp@93.2ms_bleu@44.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/mm827rst6a144zy/HAT_wmt19ende_titanxp@134.5ms_bleu@45.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/y0vov0n9zt50n9c/HAT_wmt19ende_titanxp@176.1ms_bleu@46.2.pt?dl=0) 
 [link](https://www.dropbox.com/s/w1si4mgf1e3l8oj/HAT_wmt19ende_titanxp@204.5ms_bleu@46.5.pt?dl=0) 
 [link](https://www.dropbox.com/s/rljih3t0hglp39a/HAT_wmt19ende_titanxp@237.8ms_bleu@46.7.pt?dl=0) |

| IWSLT'14 De-En | Nvidia TITAN Xp GPU | 45.6ms 
 74.5ms 
 109.0ms 
 137.8ms 
 168.8ms | 16.82 
 19.98 
 23.13 
 27.33 
 31.54 | 0.78 
 0.93 
 1.13 
 1.32 
 1.52 | 33.4 
 34.2 
 34.5 
 34.7 
 34.8 | 32.5 
 33.3 
 33.6 
 33.8 
 33.9 | HAT_iwslt14deen_titanxp@45.6ms_bleu@33.4 
 HAT_iwslt14deen_titanxp@74.5ms_bleu@34.2 
 HAT_iwslt14deen_titanxp@109.0ms_bleu@34.5 
 HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7 
 HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8 | [link](https://www.dropbox.com/s/ntj1gfskish8vz3/HAT_iwslt14deen_titanxp@45.6ms_bleu@33.4.pt?dl=0) 
 [link](https://www.dropbox.com/s/gjq46181s3xbz0k/HAT_iwslt14deen_titanxp@74.5ms_bleu@34.2.pt?dl=0) 
 [link](https://www.dropbox.com/s/fg3r3tk2vjg0diq/HAT_iwslt14deen_titanxp@109.0ms_bleu@34.5.pt?dl=0) 
 [link](https://www.dropbox.com/s/3j5vu5jh71xwec1/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.pt?dl=0) 
 [link](https://www.dropbox.com/s/5xy9hdjuc5c6sw5/HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8.pt?dl=0) |

#### Download models:

```bash

python download_model.py --model-name=[model_name]

# for example

python download_model.py --model-name=HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8

# to download all models

python download_model.py --download-all

```

#### Test BLEU (SacreBLEU) score:

```bash

bash configs/[task_name]/test.sh \

    [model_file] \

    configs/[task_name]/subtransformer/[model_name].yml \

    [normal|sacre]

# for example

bash configs/wmt14.en-de/test.sh \

    ./downloaded_models/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.pt \

    configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.yml \

    normal

# another example

bash configs/iwslt14.de-en/test.sh \

    ./downloaded_models/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.pt \

    configs/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.yml \

    sacre

```

#### Test Latency, model size and FLOPs

To profile the latency, model size and FLOPs (FLOPs profiling needs [torchprofile](https://github.com/mit-han-lab/torchprofile.git)), you can run the commands below. By default, only the model size is profiled:

```bash

python train.py \

    --configs=configs/[task_name]/subtransformer/[model_name].yml \

    --sub-configs=configs/[task_name]/subtransformer/common.yml \

    [--latgpu|--latcpu|--profile-flops]

# for example

python train.py \

    --configs=configs/wmt14.en-de/subtransformer/HAT_wmt14ende_raspberrypi@3.5s_bleu@25.8.yml \

    --sub-configs=configs/wmt14.en-de/subtransformer/common.yml --latcpu

# another example

python train.py \

    --configs=configs/iwslt14.de-en/subtransformer/HAT_iwslt14deen_titanxp@137.8ms_bleu@34.7.yml \

    --sub-configs=configs/iwslt14.de-en/subtransformer/common.yml --profile-flops

```

### Training

#### 1. Train a SuperTransformer

The SuperTransformer is a supernet that contains many SubTransformers with weight-sharing.

By default, we train WMT tasks on 8 GPUs. Please adjust `--update-freq` according to GPU numbers (`128/x` for x GPUs). Note that for IWSLT, we only train on one GPU with `--update-freq=1`. 

```bash

python train.py --configs=configs/[task_name]/supertransformer/[search_space].yml

# for example

python train.py --configs=configs/wmt14.en-de/supertransformer/space0.yml

# another example

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --configs=configs/wmt14.en-fr/supertransformer/space0.yml --update-freq=32

```

In the `--configs` file, SuperTransformer model architecture, SubTransformer search space and training settings are specified.

We also provide pre-trained SuperTransformers for the four tasks as below. To download, run `python download_model.py --model-name=[model_name]`.

| Task | search_space | model_name | Link |

|:-----------:|:-----------:|:-----------|:-----------:|

| WMT'14 En-De | space0 | HAT_wmt14ende_super_space0 | [link](https://www.dropbox.com/s/pkdddxvvpw9a4vq/HAT_wmt14ende_super_space0.pt?dl=0) |

| WMT'14 En-Fr | space0 | HAT_wmt14enfr_super_space0 | [link](https://www.dropbox.com/s/asegvw9qzpxui6a/HAT_wmt14enfr_super_space0.pt?dl=0) |

| WMT'19 En-De | space0 | HAT_wmt19ende_super_space0 | [link](https://www.dropbox.com/s/uc0lw6jdep1vazc/HAT_wmt19ende_super_space0.pt?dl=0) |

| IWSLT'14 De-En | space1 | HAT_iwslt14deen_super_space1 | [link](https://www.dropbox.com/s/yv0mn8ns36gxkhs/HAT_iwslt14deen_super_space1.pt?dl=0) |

#### 2. Evolutionary Search

The second step of HAT is to perform an evolutionary search in the trained SuperTransformer with a hardware latency constraint in the loop. We train a latency predictor to get fast and accurate latency feedback.

##### 2.1 Generate a latency dataset

```bash

python latency_dataset.py --configs=configs/[task_name]/latency_dataset/[hardware_name].yml

# for example

python latency_dataset.py --configs=configs/wmt14.en-de/latency_dataset/cpu_raspberrypi.yml

```

`hardware_name` can be `cpu_raspberrypi`, `cpu_xeon` and `gpu_titanxp`. The `--configs` file contains the design space in which we sample models to get (model_architecture, real_latency) data pairs.

We provide the datasets we collect in the [latency_dataset](./latency_dataset) folder.

##### 2.2 Train a latency predictor

Then train a predictor with collected dataset:

```bash

python latency_predictor.py --configs=configs/[task_name]/latency_predictor/[hardware_name].yml

# for example

python latency_predictor.py --configs=configs/wmt14.en-de/latency_predictor/cpu_raspberrypi.yml

```

The `--configs` file contains the predictor's model architecture and training settings.

We provide pre-trained predictors in [latency_dataset/predictors](./latency_dataset/predictors) folder.

##### 2.3 Run evolutionary search with a latency constraint

```bash

python evo_search.py --configs=[supertransformer_config_file].yml --evo-configs=[evo_settings].yml

# for example

python evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_titanxp.yml

```

The `--configs` file points to the SuperTransformer training config file. `--evo-configs` file includes evolutionary search settings, and also specifies the desired latency constraint `latency-constraint`. Note that the `feature-norm` and `lat-norm` here should be the same as those when training the latency predictor. `--write-config-path` specifies the location to write out the searched SubTransformer architecture. 

#### 3. Train a Searched SubTransformer

Finally, we train the search SubTransformer from scratch:

```bash

python train.py --configs=[subtransformer_architecture].yml --sub-configs=configs/[task_name]/subtransformer/common.yml

# for example

python train.py --configs=configs/wmt14.en-de/subtransformer/wmt14ende_titanxp@200ms.yml --sub-configs=configs/wmt14.en-de/subtransformer/common.yml

```

`--configs` points to the `--write-config-path` in step 2.3. `--sub-configs` contains training settings for the SubTransformer.

After training a SubTransformer, you can test its performance with the methods in [Testing](#testing) section.

### Dependencies

* Python >= 3.6

* [PyTorch](http://pytorch.org/) >= 1.0.0

* configargparse >= 0.14

* New model training requires NVIDIA GPUs and [NCCL](https://github.com/NVIDIA/nccl)

## Related works on efficient deep learning

[MicroNet for Efficient Language Modeling](https://arxiv.org/abs/2005.07877)

[Lite Transformer with Long-Short Range Attention](https://arxiv.org/abs/2004.11886)

[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/abs/1802.03494)

[Once-for-All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791)

[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332)

## Contact

If you have any questions, feel free to contact [Hanrui Wang](https://hanruiwang.me) through Email ([hanrui@mit.edu](mailto:hanrui@mit.edu)) or Github issues. Pull requests are highly welcomed! 

## Licence

This repository is released under the MIT license. See [LICENSE](./LICENSE) for more information.

## Acknowledgements

We are thankful to [fairseq](https://github.com/pytorch/fairseq) as the backbone of this repo.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mit-han-lab/hardware-aware-transformers

Awesome Lists containing this project

README