Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kyegomez/palm2-vadapter

Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter"
https://github.com/kyegomez/palm2-vadapter

ai attention attention-is-all-you-need attention-mechanisms deeplearning ml models multi-modal neural-nets transformers

Last synced: 11 days ago
JSON representation

Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter"

Awesome Lists containing this project

README

        

[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Palm2 Adapter
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter".

This model uses a perceiver resampler with a depth of 1 + a tiny palm to efficiently learn the features behind the images and then map them to the same space as the big model.

## install
`$ pip install palm-vadapter`

## usage
```python
import torch
from palm_vadapter.main import PaLM2VAdapter

# Random text and image tensors
text = torch.randint(0, 1000, (1, 32), dtype=torch.long)

# Image tensor
img = torch.randn(1, 3, 224, 224)

# Initialize PaLM2VAdapter model
model = PaLM2VAdapter(
tiny_dim=512,
dim=512,
num_tokens=10000,
seq_length=32,
depth=6,
heads=8,
image_size=224,
patch_size=16,
)

# Forward pass through the model
out = model(text, img)

# Print the shape of the output
print(out.shape)
```

# License
MIT

## Citation
```bibtex
@misc{xiao2024palm2vadapter,
title={PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter},
author={Junfei Xiao and Zheng Xu and Alan Yuille and Shen Yan and Boyu Wang},
year={2024},
eprint={2402.10896},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

## Todo
- [ ] Add video processing for every frame