https://github.com/kyegomez/moe-mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
https://github.com/kyegomez/moe-mamba
ai ml moe multi-modal-fusion multi-modality swarms
Last synced: 6 months ago
JSON representation
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
- Host: GitHub
- URL: https://github.com/kyegomez/moe-mamba
- Owner: kyegomez
- License: mit
- Created: 2024-01-21T23:30:45.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-06T12:53:33.000Z (7 months ago)
- Last Synced: 2025-04-19T20:17:06.178Z (7 months ago)
- Topics: ai, ml, moe, multi-modal-fusion, multi-modality, swarms
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 2.17 MB
- Stars: 102
- Watchers: 5
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# MoE Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta. The `SwitchMoE` architecture is from the Switch Transformer paper. And, I still need help with it. If you want to help please join the Agora discord and server and help in the MoE Mamba channel.
[PAPER LINK](https://arxiv.org/abs/2401.04081)
## Install
```bash
pip install moe-mamba
```
## Usage
### `MoEMambaBlock`
```python
import torch
from moe_mamba import MoEMambaBlock
x = torch.randn(1, 10, 512)
model = MoEMambaBlock(
dim=512,
depth=6,
d_state=128,
expand=4,
num_experts=4,
)
out = model(x)
print(out)
```
### `MoEMamba`
```python
import torch
from moe_mamba.model import MoEMamba
# Create a tensor of shape (1, 1024, 512)
x = torch.randint(0, 10000, (1, 512))
# Create a MoEMamba model
model = MoEMamba(
num_tokens=10000,
dim=512,
depth=1,
d_state=512,
causal=True,
shared_qk=True,
exact_window_size=True,
dim_head=64,
m_expand=4,
num_experts=4,
)
# Forward pass
out = model(x)
# Print the shape of the output tensor
print(out)
```
## Code Quality 🧹
- `make style` to format the code
- `make check_code_quality` to check code quality (PEP8 basically)
- `black .`
- `ruff . --fix`
## Citation
```bibtex
@misc{pióro2024moemamba,
title={MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts},
author={Maciej Pióro and Kamil Ciebiera and Krystian Król and Jan Ludziejewski and Sebastian Jaszczur},
year={2024},
eprint={2401.04081},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
# License
MIT