https://github.com/kyegomez/mhmoe
Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch
https://github.com/kyegomez/mhmoe
ai artificial-intelligence attention chicken machine-learning ml moe transformers
Last synced: about 1 year ago
JSON representation
Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch
- Host: GitHub
- URL: https://github.com/kyegomez/mhmoe
- Owner: kyegomez
- License: mit
- Created: 2024-04-26T20:19:26.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-06T12:54:56.000Z (about 1 year ago)
- Last Synced: 2025-04-19T20:17:00.146Z (about 1 year ago)
- Topics: ai, artificial-intelligence, attention, chicken, machine-learning, ml, moe, transformers
- Language: Python
- Homepage: https://discord.gg/7VckQVxvKk
- Size: 2.16 MB
- Stars: 24
- Watchers: 1
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# Multi-Head Mixture of Experts (MHMoE)
MH-MoE to collectively attend to information from various representation
spaces within different experts to deepen context understanding while significantly enhancing expert activation.
## install
`pip3 install mh-moe`
## usage
```python
import torch
from mh_moe.main import MHMoE
# Define model parameters
dim = 512
heads = 8
num_experts = 4
num_layers = 3
# Create MHMoE model instance
model = MHMoE(dim, heads, num_experts, num_layers)
# Generate dummy input
batch_size = 10
seq_length = 20
dummy_input = torch.rand(batch_size, seq_length, dim)
dummy_mask = torch.ones(batch_size, seq_length) # Example mask
# Forward pass through the model
output = model(dummy_input, dummy_mask)
# Print output and its shape
print(output)
print(output.shape)
```