{"id":13604680,"url":"https://github.com/MachineLearningSystem/AutoMoE","last_synced_at":"2025-04-12T02:31:22.729Z","repository":{"id":185461616,"uuid":"619368604","full_name":"MachineLearningSystem/AutoMoE","owner":"MachineLearningSystem","description":"AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers","archived":false,"fork":true,"pushed_at":"2022-10-21T06:01:38.000Z","size":16408,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-11-07T09:43:20.827Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"microsoft/AutoMoE","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MachineLearningSystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null}},"created_at":"2023-03-27T02:06:17.000Z","updated_at":"2023-03-26T18:00:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/MachineLearningSystem/AutoMoE","commit_stats":null,"previous_names":["machinelearningsystem/automoe"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAutoMoE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAutoMoE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAutoMoE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MachineLearningSystem%2FAutoMoE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MachineLearningSystem","download_url":"https://codeload.github.com/MachineLearningSystem/AutoMoE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248506901,"owners_count":21115503,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:50.057Z","updated_at":"2025-04-12T02:31:21.252Z","avatar_url":"https://github.com/MachineLearningSystem.png","language":null,"readme":"# AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers\n\nThis repository contains code, data and pretrained models used in [AutoMoE (pre-print)](https://arxiv.org/abs/2210.07535). This repository builds on [Hardware Aware Transformer (HAT)'s repository](https://github.com/mit-han-lab/hardware-aware-transformers).\n\n## AutoMoE Framework\n![AutoMoE Framework](images/framework.png)\n\n## AutoMoE Key Result\n\nThe following table shows the performance of AutoMoE vs. baselines on standard machine translation benchmarks: WMT'14 En-De, WMT'14 En-Fr and WMT'19 En-De.\n\n| WMT’14 En-De         | Network | \\# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |\n|----------------|--------|---------|------|------|------|------|\n| Transformer | Dense | 176 | 0 | 10.6 | 28.4 |  184 |\n| Evolved Transformer | NAS over Dense | 47 | 0 | 2.9 | 28.2 | 2,192,000 |\n| HAT | NAS over Dense | 56 | 0 | 3.5 | 28.2 | 264 |\n| AutoMoE (6 Experts) | NAS over Sparse | 45 | 62 | 2.9 | 28.2 | 224 | \n\n| WMT’14 En-Fr         | Network | \\# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |\n|----------------|--------|---------|------|------|------|------|\n| Transformer |  Dense | 176 | 0 | 10.6 | 41.2 | 240 |\n| Evolved Transformer | NAS over Dense | 175 | 0 | 10.8 | 41.3 | 2,192,000  |\n| HAT | NAS over Dense | 57 | 0 | 3.6 | 41.5 | 248 |\n| AutoMoE (6 Experts) | NAS over Sparse | 46 | 72 | 2.9 | 41.6 | 236  |\n| AutoMoE (16 Experts) | NAS over Sparse | 135 | 65 | 3.0 | 41.9 | 236 | \n\n| WMT’19 En-De        | Network | \\# Active Params (M) | Sparsity (%) | FLOPs (G) | BLEU  | GPU Hours  |\n|----------------|--------|---------|------|------|------|------|\n| Transformer |  Dense | 176 | 0 | 10.6 | 46.1 | 184 |\n| HAT | NAS over Dense | 63 | 0 | 4.1 | 45.8 | 264 |\n| AutoMoE (2 Experts) | NAS over Sparse | 45 | 41 | 2.8 | 45.5 | 248  |\n| AutoMoE (16 Experts) | NAS over Sparse | 69 | 81 | 3.2 | 45.9 | 248 | \n\n\n## Quick Setup\n\n### (1) Install\nRun the following commands to install AutoMoE:\n```bash\ngit clone https://github.com/UBC-NLP/AutoMoE.git\ncd AutoMoE\npip install --editable .\n```\n\n### (2) Prepare Data\nRun the following commands to download preprocessed MT data:\n```bash\nbash configs/[task_name]/get_preprocessed.sh\n```\nwhere `[task_name]` can be `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de`.\n\n### (3) Run full AutoMoE pipeline\nRun the following commands to start AutoMoE pipeline:\n```bash\npython generate_script.py --task wmt14.en-de --output_dir /tmp --num_gpus 4 --trial_run 0 --hardware_spec gpu_titanxp --max_experts 6 --frac_experts 1 \u003e automoe.sh\nbash automoe.sh\n```\nwhere,\n* `task` - MT dataset to use: `wmt14.en-de` or `wmt14.en-fr` or `wmt19.en-de` (default: `wmt14.en-de`)\n* `output_dir` - Output directory to write files generated during experiment (default: `/tmp`)\n* `num_gpus` - Number of GPUs to use (default: `4`)\n* `trial_run` - Run trial run (useful to quickly check if everything runs fine without errors.): 0 (final run), 1 (dry/dummy/trial run) (default: `0`)\n* `hardware_spec` - Hardware specification: `gpu_titanxp` (For GPU) (default: `gpu_titanxp`)\n* `max_experts` - Maximum experts (for Supernet) to use (default: `6`)\n* `frac_experts` - Fractional (varying FFN. intermediate size) experts: 0 (Standard experts) or 1 (Fractional) (default: `1`)\n* `supernet_ckpt` - Skip supernet training by specifiying checkpoint from [pretrained models](https://1drv.ms/u/s!AlflMXNPVy-wgb9w-aq0XZypZjqX3w?e=VmaK4n) (default: `None`)\n* `latency_compute` - Use (partially) gold or predictor latency (default: `gold`)\n* `latiter` - Number of latency measurements for using (partially) gold latency (default: `100`)\n* `latency_constraint` - Latency constraint in terms of milliseconds (default: `200`)\n* `evo_iter` - Number of iterations for evolutionary search (default: `10`)\n\n## Contact\nIf you have questions, contact Ganesh (`ganeshjwhr@gmail.com`), Subho (`Subhabrata.Mukherjee@microsoft.com`) and/or create GitHub issue.\n\n## Citation\nIf you use this code, please cite:\n```\n@misc{jawahar2022automoe,\n      title={AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers}, \n      author={Ganesh Jawahar and Subhabrata Mukherjee and Xiaodong Liu and Young Jin Kim and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Ahmed Hassan Awadallah and Sebastien Bubeck and Jianfeng Gao},\n      year={2022},\n      eprint={2210.07535},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n\n## License\nSee LICENSE.txt for license information.\n\n## Acknowledgements\n* [Hardware Aware Transformer](https://github.com/mit-han-lab/hardware-aware-transformers) from `mit-han-lab`\n* [fairseq](https://github.com/facebookresearch/fairseq) from `facebookresearch`\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","funding_links":[],"categories":["Paper-Code"],"sub_categories":["MoE"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FAutoMoE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMachineLearningSystem%2FAutoMoE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMachineLearningSystem%2FAutoMoE/lists"}