https://github.com/kyegomez/moe-mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
https://github.com/kyegomez/moe-mamba

ai ml moe multi-modal-fusion multi-modality swarms

Last synced: 6 months ago
JSON representation

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

Host: GitHub
URL: https://github.com/kyegomez/moe-mamba
Owner: kyegomez
License: mit
Created: 2024-01-21T23:30:45.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-04-06T12:53:33.000Z (7 months ago)
Last Synced: 2025-04-19T20:17:06.178Z (7 months ago)
Topics: ai, ml, moe, multi-modal-fusion, multi-modality, swarms
Language: Python
Homepage: https://discord.gg/GYbXvDGevY
Size: 2.17 MB
Stars: 102
Watchers: 5
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# MoE Mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta. The `SwitchMoE` architecture is from the Switch Transformer paper. And, I still need help with it. If you want to help please join the Agora discord and server and help in the MoE Mamba channel.

[PAPER LINK](https://arxiv.org/abs/2401.04081)

## Install

```bash

pip install moe-mamba

```

## Usage

### `MoEMambaBlock` 

```python

import torch 

from moe_mamba import MoEMambaBlock

x = torch.randn(1, 10, 512)

model = MoEMambaBlock(

    dim=512,

    depth=6,

    d_state=128,

    expand=4,

    num_experts=4,

)

out = model(x)

print(out)

```

### `MoEMamba`

```python

import torch 

from moe_mamba.model import MoEMamba 

# Create a tensor of shape (1, 1024, 512)

x = torch.randint(0, 10000, (1, 512))

# Create a MoEMamba model

model = MoEMamba(

    num_tokens=10000,

    dim=512,

    depth=1,

    d_state=512,

    causal=True,

    shared_qk=True,

    exact_window_size=True,

    dim_head=64,

    m_expand=4,

    num_experts=4,

)

# Forward pass

out = model(x)

# Print the shape of the output tensor

print(out)

```

## Code Quality 🧹

- `make style` to format the code

- `make check_code_quality` to check code quality (PEP8 basically)

- `black .`

- `ruff . --fix`

## Citation

```bibtex

@misc{pióro2024moemamba,

    title={MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts}, 

    author={Maciej Pióro and Kamil Ciebiera and Krystian Król and Jan Ludziejewski and Sebastian Jaszczur},

    year={2024},

    eprint={2401.04081},

    archivePrefix={arXiv},

    primaryClass={cs.LG}

}

```

# License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/moe-mamba

Awesome Lists containing this project

README