https://github.com/MachineLearningSystem/AutoMoE

AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
https://github.com/MachineLearningSystem/AutoMoE

Last synced: 2 months ago
JSON representation

AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers

Host: GitHub
URL: https://github.com/MachineLearningSystem/AutoMoE
Owner: MachineLearningSystem
License: mit
Fork: true (microsoft/AutoMoE)
Created: 2023-03-27T02:06:17.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2022-10-21T06:01:38.000Z (over 2 years ago)
Last Synced: 2024-11-07T09:43:20.827Z (8 months ago)
Homepage:
Size: 15.6 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

awesome-AI-system - AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23

README

        # AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers

This repository contains code, data and pretrained models used in [AutoMoE (pre-print)](https://arxiv.org/abs/2210.07535). This repository builds on [Hardware Aware Transformer (HAT)'s repository](https://github.com/mit-han-lab/hardware-aware-transformers).

## AutoMoE Framework

![AutoMoE Framework](images/framework.png)

## AutoMoE Key Result

The following table shows the performance of AutoMoE vs. baselines on standard machine translation benchmarks: WMT'14 En-De, WMT'14 En-Fr and WMT'19 En-De.

| WMT’14 En-De         | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |

|----------------|--------|---------|------|------|------|------|

| Transformer | Dense | 176 | 0 | 10.6 | 28.4 |  184 |

| Evolved Transformer | NAS over Dense | 47 | 0 | 2.9 | 28.2 | 2,192,000 |

| HAT | NAS over Dense | 56 | 0 | 3.5 | 28.2 | 264 |

| AutoMoE (6 Experts) | NAS over Sparse | 45 | 62 | 2.9 | 28.2 | 224 | 

| WMT’14 En-Fr         | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |

|----------------|--------|---------|------|------|------|------|

| Transformer |  Dense | 176 | 0 | 10.6 | 41.2 | 240 |

| Evolved Transformer | NAS over Dense | 175 | 0 | 10.8 | 41.3 | 2,192,000  |

| HAT | NAS over Dense | 57 | 0 | 3.6 | 41.5 | 248 |

| AutoMoE (6 Experts) | NAS over Sparse | 46 | 72 | 2.9 | 41.6 | 236  |

| AutoMoE (16 Experts) | NAS over Sparse | 135 | 65 | 3.0 | 41.9 | 236 | 

| WMT’19 En-De        | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |

|----------------|--------|---------|------|------|------|------|

| Transformer |  Dense | 176 | 0 | 10.6 | 46.1 | 184 |

| HAT | NAS over Dense | 63 | 0 | 4.1 | 45.8 | 264 |

| AutoMoE (2 Experts) | NAS over Sparse | 45 | 41 | 2.8 | 45.5 | 248  |

| AutoMoE (16 Experts) | NAS over Sparse | 69 | 81 | 3.2 | 45.9 | 248 | 

## Quick Setup

### (1) Install

Run the following commands to install AutoMoE:

```bash

git clone https://github.com/UBC-NLP/AutoMoE.git

cd AutoMoE

pip install --editable .

```

### (2) Prepare Data

Run the following commands to download preprocessed MT data:

```bash

bash configs/[task_name]/get_preprocessed.sh

```

where `[task_name]` can be `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de`.

### (3) Run full AutoMoE pipeline

Run the following commands to start AutoMoE pipeline:

```bash

python generate_script.py --task wmt14.en-de --output_dir /tmp --num_gpus 4 --trial_run 0 --hardware_spec gpu_titanxp --max_experts 6 --frac_experts 1 > automoe.sh

bash automoe.sh

```

where,

* `task` - MT dataset to use: `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de` (default: `wmt14.en-de`)

* `output_dir` - Output directory to write files generated during experiment (default: `/tmp`)

* `num_gpus` - Number of GPUs to use (default: `4`)

* `trial_run` - Run trial run (useful to quickly check if everything runs fine without errors.): 0 (final run), 1 (dry/dummy/trial run) (default: `0`)

* `hardware_spec` - Hardware specification: `gpu_titanxp` (For GPU) (default: `gpu_titanxp`)

* `max_experts` - Maximum experts (for Supernet) to use (default: `6`)

* `frac_experts` - Fractional (varying FFN. intermediate size) experts: 0 (Standard experts) or 1 (Fractional) (default: `1`)

* `supernet_ckpt` - Skip supernet training by specifiying checkpoint from [pretrained models](https://1drv.ms/u/s!AlflMXNPVy-wgb9w-aq0XZypZjqX3w?e=VmaK4n) (default: `None`)

* `latency_compute` - Use (partially) gold or predictor latency (default: `gold`)

* `latiter` - Number of latency measurements for using (partially) gold latency (default: `100`)

* `latency_constraint` - Latency constraint in terms of milliseconds (default: `200`)

* `evo_iter` - Number of iterations for evolutionary search (default: `10`)

## Contact

If you have questions, contact Ganesh (`[email protected]`), Subho (`[email protected]`) and/or create GitHub issue.

## Citation

If you use this code, please cite:

```

@misc{jawahar2022automoe,

      title={AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers}, 

      author={Ganesh Jawahar and Subhabrata Mukherjee and Xiaodong Liu and Young Jin Kim and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Ahmed Hassan Awadallah and Sebastien Bubeck and Jianfeng Gao},

      year={2022},

      eprint={2210.07535},

      archivePrefix={arXiv},

      primaryClass={cs.CL}

}

```

## License

See LICENSE.txt for license information.

## Acknowledgements

* [Hardware Aware Transformer](https://github.com/mit-han-lab/hardware-aware-transformers) from `mit-han-lab`

* [fairseq](https://github.com/facebookresearch/fairseq) from `facebookresearch`

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 

trademarks or logos is subject to and must follow 

[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).

Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.

Any use of third-party trademarks or logos are subject to those third-party's policies.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MachineLearningSystem/AutoMoE

Awesome Lists containing this project

README