Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/SforAiDl/vformer
A modular PyTorch library for vision transformer models
https://github.com/SforAiDl/vformer
pytorch vision-transformer
Last synced: 3 months ago
JSON representation
A modular PyTorch library for vision transformer models
- Host: GitHub
- URL: https://github.com/SforAiDl/vformer
- Owner: SforAiDl
- License: mit
- Created: 2021-09-08T05:14:02.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-28T12:48:35.000Z (over 1 year ago)
- Last Synced: 2024-10-03T06:54:26.361Z (5 months ago)
- Topics: pytorch, vision-transformer
- Language: Python
- Homepage: https://vformer.readthedocs.io/
- Size: 178 KB
- Stars: 163
- Watchers: 6
- Forks: 22
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.rst
- License: LICENSE
- Citation: CITATION.cff
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
VFormer
A modular PyTorch library for vision transformers models
[](https://github.com/SforAiDl/vformer/actions/workflows/package-test.yml)
[](https://vformer.readthedocs.io/en/latest/?badge=latest)
[](https://codecov.io/gh/SforAiDl/vformer)
[](https://pepy.tech/project/vformer)**[Documentation](https://vformer.readthedocs.io/en/latest/)**
## Library Features
- Contains implementations of prominent ViT architectures broken down into modular components like encoder, attention mechanism, and decoder
- Makes it easy to develop custom models by composing components of different architectures
- Contains utilities for visualizing attention maps of models using techniques such as gradient rollout## Installation
### From source (recommended)
```shell
git clone https://github.com/SforAiDl/vformer.git
cd vformer/
python setup.py install```
### From PyPI
```shell
pip install vformer
```
## Models supported
- [x] [Vanilla ViT](https://arxiv.org/abs/2010.11929)
- [x] [Swin Transformer](https://arxiv.org/abs/2103.14030)
- [x] [Pyramid Vision Transformer](https://arxiv.org/abs/2102.12122)
- [x] [CrossViT](https://arxiv.org/abs/2103.14899)
- [x] [Compact Vision Transformer](https://arxiv.org/abs/2104.05704)
- [x] [Compact Convolutional Transformer](https://arxiv.org/abs/2104.05704)
- [x] [Visformer](https://arxiv.org/abs/2104.12533)
- [x] [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413)
- [x] [CvT](https://arxiv.org/abs/2103.15808)
- [x] [ConViT](https://arxiv.org/abs/2103.10697)
- [x] [ViViT](https://arxiv.org/abs/2103.15691)
- [x] [Perceiver IO](https://arxiv.org/abs/2107.14795)
- [x] [Memory Efficient Attention](https://arxiv.org/abs/2112.05682)## Example usage
To instantiate and use a Swin Transformer model -
```python
import torch
from vformer.models.classification import SwinTransformerimage = torch.randn(1, 3, 224, 224) # Example data
model = SwinTransformer(
img_size=224,
patch_size=4,
in_channels=3,
n_classes=10,
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
drop_rate=0.2,
)
logits = model(image)
````VFormer` has a modular design and allows for easy experimentation using blocks/modules of different architectures. For example, if desired, you can use just the encoder or the windowed attention layer of the Swin Transformer model.
```python
from vformer.attention import WindowAttention
window_attn = WindowAttention(
dim=128,
window_size=7,
num_heads=2,
**kwargs,
)```
```python
from vformer.encoder import SwinEncoder
swin_encoder = SwinEncoder(
dim=128,
input_resolution=(224, 224),
depth=2,
num_heads=2,
window_size=7,
**kwargs,
)```
Please refer to our [documentation](https://vformer.readthedocs.io/en/latest/) to know more.
### References
- [vit-pytorch](https://github.com/lucidrains/vit-pytorch)
- [Swin-Transformer](https://github.com/microsoft/Swin-Transformer)
- [PVT](https://github.com/whai362/PVT)
- [vit-explain](https://github.com/jacobgil/vit-explain)
- [CrossViT](https://github.com/IBM/CrossViT)
- [Compact-Transformers](https://github.com/SHI-Labs/Compact-Transformers)
- [Visformer](https://github.com/danczs/Visformer)
- [DPT](https://github.com/isl-org/DPT)
- [CvT](https://github.com/microsoft/CvT)
- [convit](https://github.com/facebookresearch/convit)
- [ViViT-pytorch](https://github.com/rishikksh20/ViViT-pytorch)
- [perceiver-pytorch](https://github.com/lucidrains/perceiver-pytorch)
- [memory-efficient-attention](https://github.com/AminRezaei0x443/memory-efficient-attention)