https://github.com/SforAiDl/vformer

A modular PyTorch library for vision transformer models
https://github.com/SforAiDl/vformer

pytorch vision-transformer

Last synced: 2 months ago
JSON representation

A modular PyTorch library for vision transformer models

Host: GitHub
URL: https://github.com/SforAiDl/vformer
Owner: SforAiDl
License: mit
Created: 2021-09-08T05:14:02.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2023-10-28T12:48:35.000Z (over 1 year ago)
Last Synced: 2025-04-21T18:46:40.569Z (3 months ago)
Topics: pytorch, vision-transformer
Language: Python
Homepage: https://vformer.readthedocs.io/
Size: 178 KB
Stars: 162
Watchers: 5
Forks: 22
Open Issues: 15
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.rst
- License: LICENSE
- Citation: CITATION.cff
- Authors: AUTHORS.rst

Awesome Lists containing this project

README

        
VFormer

A modular PyTorch library for vision transformers models




[![Tests](https://github.com/SforAiDl/vformer/actions/workflows/package-test.yml/badge.svg)](https://github.com/SforAiDl/vformer/actions/workflows/package-test.yml)

[![Docs](https://readthedocs.org/projects/vformer/badge/?version=latest)](https://vformer.readthedocs.io/en/latest/?badge=latest)

[![codecov](https://codecov.io/gh/SforAiDl/vformer/branch/main/graph/badge.svg?token=5QKCZ67CM2)](https://codecov.io/gh/SforAiDl/vformer)

[![Downloads](https://pepy.tech/badge/vformer)](https://pepy.tech/project/vformer)

**[Documentation](https://vformer.readthedocs.io/en/latest/)**



## Library Features

- Contains implementations of prominent ViT architectures broken down into modular components like encoder, attention mechanism, and decoder

- Makes it easy to develop custom models by composing components of different architectures

- Contains utilities for visualizing attention maps of models using techniques such as gradient rollout

## Installation

### From source (recommended)

```shell

git clone https://github.com/SforAiDl/vformer.git

cd vformer/

python setup.py install

```

### From PyPI

```shell

pip install vformer

```

## Models supported

- [x] [Vanilla ViT](https://arxiv.org/abs/2010.11929)

- [x] [Swin Transformer](https://arxiv.org/abs/2103.14030)

- [x] [Pyramid Vision Transformer](https://arxiv.org/abs/2102.12122)

- [x] [CrossViT](https://arxiv.org/abs/2103.14899)

- [x] [Compact Vision Transformer](https://arxiv.org/abs/2104.05704)

- [x] [Compact Convolutional Transformer](https://arxiv.org/abs/2104.05704)

- [x] [Visformer](https://arxiv.org/abs/2104.12533)

- [x] [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413)

- [x] [CvT](https://arxiv.org/abs/2103.15808)

- [x] [ConViT](https://arxiv.org/abs/2103.10697)

- [x] [ViViT](https://arxiv.org/abs/2103.15691)

- [x] [Perceiver IO](https://arxiv.org/abs/2107.14795)

- [x] [Memory Efficient Attention](https://arxiv.org/abs/2112.05682)

## Example usage

To instantiate and use a Swin Transformer model -

```python

import torch

from vformer.models.classification import SwinTransformer

image = torch.randn(1, 3, 224, 224)       # Example data

model = SwinTransformer(

        img_size=224,

        patch_size=4,

        in_channels=3,

        n_classes=10,

        embed_dim=96,

        depths=[2, 2, 6, 2],

        num_heads=[3, 6, 12, 24],

        window_size=7,

        drop_rate=0.2,

    )

logits = model(image)

```

`VFormer` has a modular design and allows for easy experimentation using blocks/modules of different architectures. For example, if desired, you can use just the encoder or the windowed attention layer of the Swin Transformer model.

```python

from vformer.attention import WindowAttention

window_attn = WindowAttention(

        dim=128,

        window_size=7,

        num_heads=2,

        **kwargs,

    )

```

```python

from vformer.encoder import SwinEncoder

swin_encoder = SwinEncoder(

        dim=128,

        input_resolution=(224, 224),

        depth=2,

        num_heads=2,

        window_size=7,

        **kwargs,

    )

```

Please refer to our [documentation](https://vformer.readthedocs.io/en/latest/) to know more.




### References

- [vit-pytorch](https://github.com/lucidrains/vit-pytorch)

- [Swin-Transformer](https://github.com/microsoft/Swin-Transformer)

- [PVT](https://github.com/whai362/PVT)

- [vit-explain](https://github.com/jacobgil/vit-explain)

- [CrossViT](https://github.com/IBM/CrossViT)

- [Compact-Transformers](https://github.com/SHI-Labs/Compact-Transformers)

- [Visformer](https://github.com/danczs/Visformer)

- [DPT](https://github.com/isl-org/DPT)

- [CvT](https://github.com/microsoft/CvT)

- [convit](https://github.com/facebookresearch/convit)

- [ViViT-pytorch](https://github.com/rishikksh20/ViViT-pytorch)

- [perceiver-pytorch](https://github.com/lucidrains/perceiver-pytorch)

- [memory-efficient-attention](https://github.com/AminRezaei0x443/memory-efficient-attention)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SforAiDl/vformer

Awesome Lists containing this project

README

VFormer

A modular PyTorch library for vision transformers models