https://github.com/kyegomez/vitar

Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
https://github.com/kyegomez/vitar

Last synced: 6 months ago
JSON representation

Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch

Host: GitHub
URL: https://github.com/kyegomez/vitar
Owner: kyegomez
License: mit
Created: 2024-03-28T22:26:58.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-11T15:49:45.000Z (11 months ago)
Last Synced: 2025-03-28T23:05:39.888Z (7 months ago)
Language: Python
Size: 2.16 MB
Stars: 33
Watchers: 4
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          [![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Vitar Implementation

Implementation of the paper: "ViTAR: Vision Transformer with Any Resolution" [PAPER LINK](https://arxiv.org/abs/2403.18361)

## Install

```

$ pip3 install -U vitar

```

## Example

```python

import torch

from vitar.main import Vitar

# Create a random input tensor

x = torch.randn(1, 3, 224, 224)

# Initialize the Vitar model with specified parameters

model = Vitar(

    512,

    8,

    depth=12,

    patch_size=16,

    image_size=224,

    channels=3,

    ffn_dim=2048,

    num_classes=1000,

)

# Pass the input tensor through the model

out = model(x)

# Print the output tensor

print(out)

```

# Todo

- [ ] Implement a training script to train this model on coco and report results

- [ ] Massive pre-training script on multiple datasets to test out the architecture

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kyegomez/vitar

Awesome Lists containing this project

README