Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kyegomez/mc-vit
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
https://github.com/kyegomez/mc-vit
ai multi-modal multi-modal-transformers multi-modality open-source transformer transformers vit
Last synced: 11 days ago
JSON representation
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
- Host: GitHub
- URL: https://github.com/kyegomez/mc-vit
- Owner: kyegomez
- License: mit
- Created: 2024-02-09T04:10:38.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-10-07T07:19:47.000Z (about 1 month ago)
- Last Synced: 2024-10-29T07:09:50.359Z (22 days ago)
- Topics: ai, multi-modal, multi-modal-transformers, multi-modality, open-source, transformer, transformers, vit
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 2.17 MB
- Stars: 16
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# MC -VIT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"## Install
`$ pip install mcvit`## Usage
```python
import torch
from mcvit.model import MCViT# Initialize the MCViT model
mcvit = MCViT(
dim=512,
attn_seq_len=256,
dim_head=64,
dropout=0.1,
chunks=16,
depth=12,
cross_attn_heads=8,
)# Create a random tensor to represent a video
x = torch.randn(
1, 3, 256, 256, 256
) # (batch, channels, frames, height, width)# Pass the tensor through the model
output = mcvit(x)print(
output.shape
) # Outputs the shape of the tensor after passing through the model```
# License
MIT