https://github.com/kyegomez/mhmoe

Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch
https://github.com/kyegomez/mhmoe

ai artificial-intelligence attention chicken machine-learning ml moe transformers

Last synced: 5 months ago
JSON representation

Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch

Host: GitHub
URL: https://github.com/kyegomez/mhmoe
Owner: kyegomez
License: mit
Created: 2024-04-26T20:19:26.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-06T12:54:56.000Z (6 months ago)
Last Synced: 2025-04-19T20:17:00.146Z (6 months ago)
Topics: ai, artificial-intelligence, attention, chicken, machine-learning, ml, moe, transformers
Language: Python
Homepage: https://discord.gg/7VckQVxvKk
Size: 2.16 MB
Stars: 24
Watchers: 1
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Multi-Head Mixture of Experts (MHMoE)

MH-MoE to collectively attend to information from various representation

spaces within different experts to deepen context understanding while significantly enhancing expert activation. 

## install

`pip3 install mh-moe`

## usage

```python

import torch

from mh_moe.main import MHMoE

# Define model parameters

dim = 512

heads = 8

num_experts = 4

num_layers = 3

# Create MHMoE model instance

model = MHMoE(dim, heads, num_experts, num_layers)

# Generate dummy input

batch_size = 10

seq_length = 20

dummy_input = torch.rand(batch_size, seq_length, dim)

dummy_mask = torch.ones(batch_size, seq_length)  # Example mask

# Forward pass through the model

output = model(dummy_input, dummy_mask)

# Print output and its shape

print(output)

print(output.shape)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/mhmoe

Awesome Lists containing this project

README