https://github.com/kyegomez/mm1

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
https://github.com/kyegomez/mm1

ai artificial-intelligence deep-learning gpt4 machine-learning ml mm1 multi-modal multi-modal-revolution multi-modality

Last synced: 6 months ago
JSON representation

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"

Host: GitHub
URL: https://github.com/kyegomez/mm1
Owner: kyegomez
License: mit
Created: 2024-03-19T02:51:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-12T12:53:01.000Z (6 months ago)
Last Synced: 2025-04-15T11:13:20.169Z (6 months ago)
Topics: ai, artificial-intelligence, deep-learning, gpt4, machine-learning, ml, mm1, multi-modal, multi-modal-revolution, multi-modality
Language: Python
Homepage: https://discord.gg/7VckQVxvKk
Size: 2.17 MB
Stars: 23
Watchers: 2
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# MM1 

PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training".

`img -> encoder -> connector -> llm -> tokens` 

## install

`pip3 install mm1-torch`

## usage

```python

import torch

from mm1_torch.main import MM1

# Tensors

x = torch.randint(0, 100, (1, 512))  # Create a random tensor of shape (1, 512)

img = torch.randn(1, 3, 224, 224)  # Create a random image tensor of shape (1, 3, 224, 224)

# Create a model

model = MM1(

    dim=512,  # Dimension of the input tensor

    depth=12,  # Number of transformer layers

    heads=8,  # Number of attention heads

    dim_head=64,  # Dimension of each attention head

    dropout=0.1,  # Dropout rate

    num_experts=4,  # Number of experts in mixture-of-experts

    num_experts_per_tok=2,  # Number of experts per token in mixture-of-experts

    encoder_dim=512,  # Dimension of the encoder output

    encoder_depth=12,  # Number of encoder transformer layers

    encoder_heads=8,  # Number of encoder attention heads

    use_moe=True,  # Whether to use mixture-of-experts

    return_logits=True  # Whether to return logits or probabilities

)

# Forward

out = model(x, img)  # Forward pass through the model

print(out.shape)  # Print the shape of the output tensor (torch.Size([2, 3, 512]))

print(out)  # Print the output tensor

```

### `CAbstractor`

```python

import torch

from mm1_torch.main import CAbstractor

# Tensors

x = torch.randn(1, 100, 512)

# Create a model

model = CAbstractor(

    dim=512,

    depth=12,

    heads=8,

)

# Forward

out = model(x)

print(out.shape)

```

# License

MIT

## Todo

- [x] Implement the deformable attention

- [ ] Create a training script for Huggingface datasets

- [ ] Create unit tests for every module

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/mm1

Awesome Lists containing this project

README