Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
moe-paper-models
A sumary of MoE experimental setups across a number of different papers.
https://github.com/AdamG012/moe-paper-models
Last synced: 4 days ago
JSON representation
-
Datasets, Citations and Open Source
- Memory Efficient NLLB - 200(Eval) | 16K | N | 0 | Releasing some results such as experts pruned etc Every fourth FFN sublayer is replaced with an MoE layer. NLLB-200 requires 4x32 V100s to run. This usesthe 80% pruned model. |
- Megablocks
- GShard
- NLLB - 200(Eval)/LID curated data/Paracrawl and CommonCrawl (Monolingual) | 16K | Y | 26/49 | Every fourth layer is an MoE layer. |
- Task-Level MoE
- Switch Transformer
- Evo MoE
- Random Routing
- Gating Dropout - 50 | 435K | N | 1/5 | |
- BASE Layers
- Stable-MoE
- Outrageously Large MoEs
- GLaM
- ST-MoE
- Expert Choice Routing
- Hash Layers (vs BASE) - 103/BST | 2 | Y (partly) | 43 | |
- Deepspeed-MoE - h/Trivia-QA/WebQS | 256/512 | Y | 15/36 | |
- M6-T Sparse Experts
Categories
Sub Categories