Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/VITA-Group/AsViT

[ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou
https://github.com/VITA-Group/AsViT

network-complexity neural-architecture-search vision-transformer

Last synced: 5 days ago
JSON representation

[ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

Host: GitHub
URL: https://github.com/VITA-Group/AsViT
Owner: VITA-Group
License: mit
Created: 2022-01-21T00:56:31.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-02-21T03:07:02.000Z (over 2 years ago)
Last Synced: 2024-08-02T15:35:57.400Z (3 months ago)
Topics: network-complexity, neural-architecture-search, vision-transformer
Language: Python
Homepage:
Size: 387 KB
Stars: 76
Watchers: 6
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# As-ViT: Auto-scaling Vision Transformers without Training [[PDF](https://openreview.net/pdf?id=H94a1_Pyr-6)]

[![MIT licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg)](LICENSE.md)

Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

In ICLR 2022.

**Note**: We implemented topology search (sec. 3.3) and scaling (sec. 3.4) in this code base in PyTorch. Our training code is based on Tensorflow and Keras on TPU, which will be released soon.

## Overview

We present As-ViT, a framework that unifies the automatic architecture design and scaling for ViT (vision transformer), in a training-free strategy.

Highlights:
* **Trainig-free ViT Architecture Design**: we design a "seed" ViT topology by leveraging a training-free search process. This extremely fast search is fulfilled by our comprehensive study of ViT's network complexity (length distorsion), yielding a strong Kendall-tau correlation with ground-truth accuracies.
* **Trainig-free ViT Architecture Scaling**: starting from the "seed" topology, we automate the scaling rule for ViTs by growing widths/depths to different ViT layers. This will generate a series of architectures with different numbers of parameters in a single run.
* **Efficient ViT Training via Progressive Tokenization**: we observe that ViTs can tolerate coarse tokenization in early training stages, and further propose to train ViTs faster and cheaper with a progressive tokenization strategy.

teaser
Left: Length Distortion shows a strong correlation with ViT's accuracy. Middle: Auto scaling rule of As-ViT. Right: Progressive re-tokenization for efficient ViT training.

## Prerequisites
- Ubuntu 18.04
- Python 3.6.9
- CUDA 11.0 (lower versions may work but were not tested)
- NVIDIA GPU + CuDNN v7.6

This repository has been tested on V100 GPU. Configurations may need to be changed on different platforms.

## Installation
* Clone this repo:
```bash
git clone https://github.com/VITA-Grou/AsViT.git
cd AsViT
```
* Install dependencies:
```bash
pip install -r requirements.txt
```

## 1. Seed As-ViT Topology Search
```bash
CUDA_VISIBLE_DEVICES=0 python ./search/reinforce.py --save_dir ./output/REINFORCE-imagenet --data_path /path/to/imagenet
```
This job will return you a seed topology. For example, our search seed topology is `8,2,3|4,1,2|4,1,4|4,1,6|32`, which can be explained as below:

Stage1Stage2Stage3Stage4HeadKernel K1Split S1Expansion E1Kernel K2Split S2Expansion E2Kernel K3Split S3Expansion E3Kernel K4Split S4Expansion E482341241441632

## 2. Scaling
```bash
CUDA_VISIBLE_DEVICES=0 python ./search/grow.py --save_dir ./output/GROW-imagenet \
--arch "[arch]" --data_path /path/to/imagenet
```
Here `[arch]` is the seed topology (output from step 1 above).
This job will return you a series of topologies. For example, our largest topology (As-ViT Large) is `8,2,3,5|4,1,2,2|4,1,4,5|4,1,6,2|32,180`, which can be explained as below:

Stage1Stage2Stage3Stage4HeadInitial Hidden SizeKernel K1Split S1Expansion E1Layers L1Kernel K2Split S2Expansion E2Layers L2Kernel K3Split S3Expansion E3Layers L3Kernel K4Split S4Expansion E4Layers L4823541224145416232180

### 3. Evaluation
Tensorflow and Keras code for training on TPU. To be released soon.

## Citation
```
@inproceedings{chen2021asvit,
title={Auto-scaling Vision Transformers without Training},
author={Chen, Wuyang and Huang, Wei and Du, Xianzhi and Song, Xiaodan and Wang, Zhangyang and Zhou, Denny},
booktitle={International Conference on Learning Representations},
year={2022}
}
```