https://github.com/kyegomez/vitar
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
https://github.com/kyegomez/vitar
Last synced: 6 months ago
JSON representation
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
- Host: GitHub
- URL: https://github.com/kyegomez/vitar
- Owner: kyegomez
- License: mit
- Created: 2024-03-28T22:26:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-11T15:49:45.000Z (11 months ago)
- Last Synced: 2025-03-28T23:05:39.888Z (7 months ago)
- Language: Python
- Size: 2.16 MB
- Stars: 33
- Watchers: 4
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.gg/qUtxnK2NMf)
# Vitar Implementation
Implementation of the paper: "ViTAR: Vision Transformer with Any Resolution" [PAPER LINK](https://arxiv.org/abs/2403.18361)## Install
```
$ pip3 install -U vitar
```## Example
```python
import torch
from vitar.main import Vitar# Create a random input tensor
x = torch.randn(1, 3, 224, 224)# Initialize the Vitar model with specified parameters
model = Vitar(
512,
8,
depth=12,
patch_size=16,
image_size=224,
channels=3,
ffn_dim=2048,
num_classes=1000,
)# Pass the input tensor through the model
out = model(x)# Print the output tensor
print(out)
```# Todo
- [ ] Implement a training script to train this model on coco and report results
- [ ] Massive pre-training script on multiple datasets to test out the architecture