https://github.com/shoufachen/adaptformer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"
https://github.com/shoufachen/adaptformer

adapter neurips-2022 recognition vision-transformer visual-adapter

Last synced: 3 months ago
JSON representation

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Host: GitHub
URL: https://github.com/shoufachen/adaptformer
Owner: ShoufaChen
License: mit
Created: 2022-05-26T13:37:49.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-09-16T00:55:50.000Z (almost 3 years ago)
Last Synced: 2025-04-02T02:40:00.944Z (3 months ago)
Topics: adapter, neurips-2022, recognition, vision-transformer, visual-adapter
Language: Python
Homepage: https://arxiv.org/abs/2205.13535
Size: 2.67 MB
Stars: 348
Watchers: 7
Forks: 21
Open Issues: 21
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


## [NeurIPS 2022] AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

### [Project Page](http://www.shoufachen.com/adaptformer-page/) |  [arXiv](https://arxiv.org/abs/2205.13535)

![teaser](figs/teaser.gif)



This is a PyTorch implementation of the paper [AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition](https://arxiv.org/abs/2205.13535).

[Shoufa Chen](https://www.shoufachen.com/)¹\*,

[Chongjian Ge](https://chongjiange.github.io/)¹\*,

[Zhan Tong](https://scholar.google.com/citations?user=6FsgWBMAAAAJ)²,

[Jiangliu Wang](https://laura-wang.github.io/)^2,3,

[Yibing Song](https://ybsong00.github.io/)²,

[Jue Wang](http://juewang725.github.io/)²,

[Ping Luo](http://luoping.me/)¹ 


¹The University of Hong Kong, ²Tencent AI Lab, ³The Chinese University of Hong Kong  

\*denotes equal contribution

### Catalog

- [x] Video code

- [x] Image code

### Usage

#### Install

* Tesla V100 (32G): CUDA 10.1 + PyTorch 1.6.0 + torchvision 0.7.0

* timm 0.4.8

* einops

* easydict

#### Data Preparation

See [DATASET.md](DATASET.md).

#### Training

Start

```bash

# video

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch \

    --nproc_per_node=8 --nnodes=8 \

    --node_rank=$1 --master_addr=$2 --master_port=22234 \

    --use_env main_video.py \

    --finetune /path/to/pre_trained/checkpoints \

    --output_dir /path/to/output \

    --batch_size 16 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval \

    --data_path /path/to/SSV2 --data_set SSV2 \

    --ffn_adapt

```

on each of 8 nodes. `--master_addr` is set as the ip of the node 0. and `--node_rank` is 0, 1, ..., 7 for each node.

```bash

# image

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main_image.py \

    --batch_size 128 --cls_token \

    --finetune /path/to/pre_trained/mae_pretrain_vit_b.pth \

    --dist_eval --data_path /path/to/data \

    --output_dir /path/to/output  \

    --drop_path 0.0  --blr 0.1 \

    --dataset cifar100 --ffn_adapt

```

To obtain the pre-trained checkpoint, see [PRETRAIN.md](PRETRAIN.md).

### Acknowledgement

The project is based on [MAE](https://github.com/facebookresearch/mae), [VideoMAE](https://github.com/MCG-NJU/VideoMAE), [timm](https://github.com/rwightman/pytorch-image-models), and [MAM](https://github.com/jxhe/unify-parameter-efficient-tuning).

Thanks for their awesome works.

### Citation

```

@article{chen2022adaptformer,

      title={AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition},

      author={Chen, Shoufa and Ge, Chongjian and Tong, Zhan and Wang, Jiangliu and Song, Yibing and Wang, Jue and Luo, Ping},

      journal={arXiv preprint arXiv:2205.13535},

      year={2022}

}

```

### License

This project is under the MIT license. See [LICENSE](LICENSE) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shoufachen/adaptformer

Awesome Lists containing this project

README