Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/sinkhorn-router-pytorch
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
https://github.com/lucidrains/sinkhorn-router-pytorch
artificial-intelligence deep-learning mixture-of-experts routing
Last synced: 21 days ago
JSON representation
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
- Host: GitHub
- URL: https://github.com/lucidrains/sinkhorn-router-pytorch
- Owner: lucidrains
- License: mit
- Created: 2024-08-23T16:51:49.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-29T16:57:58.000Z (4 months ago)
- Last Synced: 2024-12-17T03:55:47.277Z (23 days ago)
- Topics: artificial-intelligence, deep-learning, mixture-of-experts, routing
- Language: Python
- Homepage:
- Size: 27.3 KB
- Stars: 32
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Sinkhorn Router - Pytorch (wip)
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise. Will contain both a causal and non-causal variant. The causal variant will follow the example used in Megatron
## Install
```bash
$ pip install sinkhorn-router-pytorch
```## Usage
```python
import torch
from torch import nn
from sinkhorn_router_pytorch import SinkhornRouterexperts = nn.Parameter(torch.randn(8, 8, 512, 256)) # (experts, heads, dim [in], dim [out])
router = SinkhornRouter(
dim = 512,
experts = experts,
competitive = True,
causal = False,
)x = torch.randn(1, 8, 1017, 512)
out = router(x) # (1, 8, 1017, 256)
```## Citations
```bibtex
@article{Shoeybi2019MegatronLMTM,
title = {Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism},
author = {Mohammad Shoeybi and Mostofa Patwary and Raul Puri and Patrick LeGresley and Jared Casper and Bryan Catanzaro},
journal = {ArXiv},
year = {2019},
volume = {abs/1909.08053},
url = {https://api.semanticscholar.org/CorpusID:202660670}
}
``````bibtex
@article{Anthony2024BlackMambaMO,
title = {BlackMamba: Mixture of Experts for State-Space Models},
author = {Quentin Anthony and Yury Tokpanov and Paolo Glorioso and Beren Millidge},
journal = {ArXiv},
year = {2024},
volume = {abs/2402.01771},
url = {https://api.semanticscholar.org/CorpusID:267413070}
}
``````bibtex
@article{Csordas2023SwitchHeadAT,
title = {SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention},
author = {R'obert Csord'as and Piotr Piekos and Kazuki Irie and J{\"u}rgen Schmidhuber},
journal = {ArXiv},
year = {2023},
volume = {abs/2312.07987},
url = {https://api.semanticscholar.org/CorpusID:266191825}
}
```