Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MachineLearningSystem/AutoMoE
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
https://github.com/MachineLearningSystem/AutoMoE
Last synced: about 1 month ago
JSON representation
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
- Host: GitHub
- URL: https://github.com/MachineLearningSystem/AutoMoE
- Owner: MachineLearningSystem
- License: mit
- Fork: true (microsoft/AutoMoE)
- Created: 2023-03-27T02:06:17.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2022-10-21T06:01:38.000Z (about 2 years ago)
- Last Synced: 2024-08-02T19:35:25.511Z (5 months ago)
- Homepage:
- Size: 15.6 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
- awesome-AI-system - AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23
README
# AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers
This repository contains code, data and pretrained models used in [AutoMoE (pre-print)](https://arxiv.org/abs/2210.07535). This repository builds on [Hardware Aware Transformer (HAT)'s repository](https://github.com/mit-han-lab/hardware-aware-transformers).
## AutoMoE Framework
![AutoMoE Framework](images/framework.png)## AutoMoE Key Result
The following table shows the performance of AutoMoE vs. baselines on standard machine translation benchmarks: WMT'14 En-De, WMT'14 En-Fr and WMT'19 En-De.
| WMT’14 En-De | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU | GPU Hours |
|----------------|--------|---------|------|------|------|------|
| Transformer | Dense | 176 | 0 | 10.6 | 28.4 | 184 |
| Evolved Transformer | NAS over Dense | 47 | 0 | 2.9 | 28.2 | 2,192,000 |
| HAT | NAS over Dense | 56 | 0 | 3.5 | 28.2 | 264 |
| AutoMoE (6 Experts) | NAS over Sparse | 45 | 62 | 2.9 | 28.2 | 224 || WMT’14 En-Fr | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU | GPU Hours |
|----------------|--------|---------|------|------|------|------|
| Transformer | Dense | 176 | 0 | 10.6 | 41.2 | 240 |
| Evolved Transformer | NAS over Dense | 175 | 0 | 10.8 | 41.3 | 2,192,000 |
| HAT | NAS over Dense | 57 | 0 | 3.6 | 41.5 | 248 |
| AutoMoE (6 Experts) | NAS over Sparse | 46 | 72 | 2.9 | 41.6 | 236 |
| AutoMoE (16 Experts) | NAS over Sparse | 135 | 65 | 3.0 | 41.9 | 236 || WMT’19 En-De | Network | \# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU | GPU Hours |
|----------------|--------|---------|------|------|------|------|
| Transformer | Dense | 176 | 0 | 10.6 | 46.1 | 184 |
| HAT | NAS over Dense | 63 | 0 | 4.1 | 45.8 | 264 |
| AutoMoE (2 Experts) | NAS over Sparse | 45 | 41 | 2.8 | 45.5 | 248 |
| AutoMoE (16 Experts) | NAS over Sparse | 69 | 81 | 3.2 | 45.9 | 248 |## Quick Setup
### (1) Install
Run the following commands to install AutoMoE:
```bash
git clone https://github.com/UBC-NLP/AutoMoE.git
cd AutoMoE
pip install --editable .
```### (2) Prepare Data
Run the following commands to download preprocessed MT data:
```bash
bash configs/[task_name]/get_preprocessed.sh
```
where `[task_name]` can be `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de`.### (3) Run full AutoMoE pipeline
Run the following commands to start AutoMoE pipeline:
```bash
python generate_script.py --task wmt14.en-de --output_dir /tmp --num_gpus 4 --trial_run 0 --hardware_spec gpu_titanxp --max_experts 6 --frac_experts 1 > automoe.sh
bash automoe.sh
```
where,
* `task` - MT dataset to use: `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de` (default: `wmt14.en-de`)
* `output_dir` - Output directory to write files generated during experiment (default: `/tmp`)
* `num_gpus` - Number of GPUs to use (default: `4`)
* `trial_run` - Run trial run (useful to quickly check if everything runs fine without errors.): 0 (final run), 1 (dry/dummy/trial run) (default: `0`)
* `hardware_spec` - Hardware specification: `gpu_titanxp` (For GPU) (default: `gpu_titanxp`)
* `max_experts` - Maximum experts (for Supernet) to use (default: `6`)
* `frac_experts` - Fractional (varying FFN. intermediate size) experts: 0 (Standard experts) or 1 (Fractional) (default: `1`)
* `supernet_ckpt` - Skip supernet training by specifiying checkpoint from [pretrained models](https://1drv.ms/u/s!AlflMXNPVy-wgb9w-aq0XZypZjqX3w?e=VmaK4n) (default: `None`)
* `latency_compute` - Use (partially) gold or predictor latency (default: `gold`)
* `latiter` - Number of latency measurements for using (partially) gold latency (default: `100`)
* `latency_constraint` - Latency constraint in terms of milliseconds (default: `200`)
* `evo_iter` - Number of iterations for evolutionary search (default: `10`)## Contact
If you have questions, contact Ganesh (`[email protected]`), Subho (`[email protected]`) and/or create GitHub issue.## Citation
If you use this code, please cite:
```
@misc{jawahar2022automoe,
title={AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers},
author={Ganesh Jawahar and Subhabrata Mukherjee and Xiaodong Liu and Young Jin Kim and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Ahmed Hassan Awadallah and Sebastien Bubeck and Jianfeng Gao},
year={2022},
eprint={2210.07535},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```## License
See LICENSE.txt for license information.## Acknowledgements
* [Hardware Aware Transformer](https://github.com/mit-han-lab/hardware-aware-transformers) from `mit-han-lab`
* [fairseq](https://github.com/facebookresearch/fairseq) from `facebookresearch`## Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.