https://github.com/kyegomez/mm1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
https://github.com/kyegomez/mm1
ai artificial-intelligence deep-learning gpt4 machine-learning ml mm1 multi-modal multi-modal-revolution multi-modality
Last synced: 6 months ago
JSON representation
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
- Host: GitHub
- URL: https://github.com/kyegomez/mm1
- Owner: kyegomez
- License: mit
- Created: 2024-03-19T02:51:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-12T12:53:01.000Z (6 months ago)
- Last Synced: 2025-04-15T11:13:20.169Z (6 months ago)
- Topics: ai, artificial-intelligence, deep-learning, gpt4, machine-learning, ml, mm1, multi-modal, multi-modal-revolution, multi-modality
- Language: Python
- Homepage: https://discord.gg/7VckQVxvKk
- Size: 2.17 MB
- Stars: 23
- Watchers: 2
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# MM1
PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training".`img -> encoder -> connector -> llm -> tokens`
## install
`pip3 install mm1-torch`## usage
```python
import torch
from mm1_torch.main import MM1# Tensors
x = torch.randint(0, 100, (1, 512)) # Create a random tensor of shape (1, 512)
img = torch.randn(1, 3, 224, 224) # Create a random image tensor of shape (1, 3, 224, 224)# Create a model
model = MM1(
dim=512, # Dimension of the input tensor
depth=12, # Number of transformer layers
heads=8, # Number of attention heads
dim_head=64, # Dimension of each attention head
dropout=0.1, # Dropout rate
num_experts=4, # Number of experts in mixture-of-experts
num_experts_per_tok=2, # Number of experts per token in mixture-of-experts
encoder_dim=512, # Dimension of the encoder output
encoder_depth=12, # Number of encoder transformer layers
encoder_heads=8, # Number of encoder attention heads
use_moe=True, # Whether to use mixture-of-experts
return_logits=True # Whether to return logits or probabilities
)# Forward
out = model(x, img) # Forward pass through the model
print(out.shape) # Print the shape of the output tensor (torch.Size([2, 3, 512]))
print(out) # Print the output tensor
```### `CAbstractor`
```python
import torch
from mm1_torch.main import CAbstractor# Tensors
x = torch.randn(1, 100, 512)# Create a model
model = CAbstractor(
dim=512,
depth=12,
heads=8,
)# Forward
out = model(x)
print(out.shape)```
# License
MIT## Todo
- [x] Implement the deformable attention
- [ ] Create a training script for Huggingface datasets
- [ ] Create unit tests for every module