https://github.com/lucidrains/sinkhorn-router-pytorch

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
https://github.com/lucidrains/sinkhorn-router-pytorch

artificial-intelligence deep-learning mixture-of-experts routing

Last synced: 7 months ago
JSON representation

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise

Host: GitHub
URL: https://github.com/lucidrains/sinkhorn-router-pytorch
Owner: lucidrains
License: mit
Created: 2024-08-23T16:51:49.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-29T16:57:58.000Z (11 months ago)
Last Synced: 2024-12-17T03:55:47.277Z (7 months ago)
Topics: artificial-intelligence, deep-learning, mixture-of-experts, routing
Language: Python
Homepage:
Size: 27.3 KB
Stars: 32
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ## Sinkhorn Router - Pytorch (wip)

Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise. Will contain both a causal and non-causal variant. The causal variant will follow the example used in Megatron

## Install

```bash

$ pip install sinkhorn-router-pytorch

```

## Usage

```python

import torch

from torch import nn

from sinkhorn_router_pytorch import SinkhornRouter

experts = nn.Parameter(torch.randn(8, 8, 512, 256)) # (experts, heads, dim [in], dim [out])

router = SinkhornRouter(

    dim = 512,

    experts = experts,

    competitive = True,

    causal = False,

)

x = torch.randn(1, 8, 1017, 512)

out = router(x) # (1, 8, 1017, 256)

```

## Citations

```bibtex

@article{Shoeybi2019MegatronLMTM,

    title   = {Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism},

    author  = {Mohammad Shoeybi and Mostofa Patwary and Raul Puri and Patrick LeGresley and Jared Casper and Bryan Catanzaro},

    journal = {ArXiv},

    year    = {2019},

    volume  = {abs/1909.08053},

    url     = {https://api.semanticscholar.org/CorpusID:202660670}

}

```

```bibtex

@article{Anthony2024BlackMambaMO,

    title   = {BlackMamba: Mixture of Experts for State-Space Models},

    author  = {Quentin Anthony and Yury Tokpanov and Paolo Glorioso and Beren Millidge},

    journal = {ArXiv},

    year    = {2024},

    volume  = {abs/2402.01771},

    url     = {https://api.semanticscholar.org/CorpusID:267413070}

}

```

```bibtex

@article{Csordas2023SwitchHeadAT,

    title   = {SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention},

    author  = {R'obert Csord'as and Piotr Piekos and Kazuki Irie and J{\"u}rgen Schmidhuber},

    journal = {ArXiv},

    year    = {2023},

    volume  = {abs/2312.07987},

    url     = {https://api.semanticscholar.org/CorpusID:266191825}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/sinkhorn-router-pytorch

Awesome Lists containing this project

README