Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kyegomez/palm2-vadapter
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter"
https://github.com/kyegomez/palm2-vadapter
ai attention attention-is-all-you-need attention-mechanisms deeplearning ml models multi-modal neural-nets transformers
Last synced: 11 days ago
JSON representation
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter"
- Host: GitHub
- URL: https://github.com/kyegomez/palm2-vadapter
- Owner: kyegomez
- License: mit
- Created: 2024-02-19T18:32:10.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-09-09T14:50:48.000Z (2 months ago)
- Last Synced: 2024-09-16T06:18:11.213Z (2 months ago)
- Topics: ai, attention, attention-is-all-you-need, attention-mechanisms, deeplearning, ml, models, multi-modal, neural-nets, transformers
- Language: Python
- Homepage: https://discord.gg/GYbXvDGevY
- Size: 2.17 MB
- Stars: 17
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# Palm2 Adapter
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter".This model uses a perceiver resampler with a depth of 1 + a tiny palm to efficiently learn the features behind the images and then map them to the same space as the big model.
## install
`$ pip install palm-vadapter`## usage
```python
import torch
from palm_vadapter.main import PaLM2VAdapter# Random text and image tensors
text = torch.randint(0, 1000, (1, 32), dtype=torch.long)# Image tensor
img = torch.randn(1, 3, 224, 224)# Initialize PaLM2VAdapter model
model = PaLM2VAdapter(
tiny_dim=512,
dim=512,
num_tokens=10000,
seq_length=32,
depth=6,
heads=8,
image_size=224,
patch_size=16,
)# Forward pass through the model
out = model(text, img)# Print the shape of the output
print(out.shape)
```# License
MIT## Citation
```bibtex
@misc{xiao2024palm2vadapter,
title={PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter},
author={Junfei Xiao and Zheng Xu and Alan Yuille and Shen Yan and Boyu Wang},
year={2024},
eprint={2402.10896},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```## Todo
- [ ] Add video processing for every frame