https://github.com/fudan-zvg/SOFT

[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity
https://github.com/fudan-zvg/SOFT

linear-complexity linear-transformer self-attention softmax-free transformer

Last synced: 5 months ago
JSON representation

[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity

Host: GitHub
URL: https://github.com/fudan-zvg/SOFT
Owner: fudan-zvg
License: mit
Created: 2021-09-11T10:10:33.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2024-03-16T01:28:20.000Z (over 1 year ago)
Last Synced: 2025-04-06T13:11:53.417Z (6 months ago)
Topics: linear-complexity, linear-transformer, self-attention, softmax-free, transformer
Language: Python
Homepage:
Size: 5.06 MB
Stars: 308
Watchers: 7
Forks: 25
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Softmax-free Linear Transformers

![image](resources/structure.png)

> [**SOFT: Softmax-free Transformer with Linear Complexity**](https://arxiv.org/abs/2110.11945),            

> Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang  

> **NeurIPS 2021**

> [**Softmax-free Linear Transformers**](https://arxiv.org/abs/2207.03341),            

> Jiachen Lu, Junge Zhang, Xiatian Zhu, Jianfeng Feng, Tao Xiang, Li Zhang  

> **IJCV 2024**

## What's new

1. We propose a normalized softmax-free self-attention with stronger generalizability.

2. SOFT is now avaliable on more vision tasks (object detection and semantic segmentation).

## NEWS

- [2024/02/12] Our journal extension [Softmax-free Linear Transformer](https://arxiv.org/abs/2207.03341) is accepted by IJCV.

- [2022/07/05] SOFT is now available for downstream tasks! An efficient normalization is applied to SOFT. Please refer to [SOFT-Norm](https://github.com/fudan-zvg/SOFT/tree/normalization)

## Requirments

* timm==0.3.2

* torch>=1.7.0 and torchvision that matches the PyTorch installation

* cuda>=10.2

Compilation may be fail on cuda < 10.2.  

We have compiled it successfully on `cuda 10.2` and `cuda 11.2`. 

### Data preparation

Download and extract ImageNet train and val images from http://image-net.org/.

The directory structure is the standard layout for the torchvision [`datasets.ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder), and the training and validation data is expected to be in the `train/` folder and `val` folder respectively:

```

/path/to/imagenet/

  train/

    class1/

      img1.jpeg

    class2/

      img2.jpeg

  val/

    class1/

      img3.jpeg

    class/2

      img4.jpeg

```

## Installation

```shell script

git clone https://github.com/fudan-zvg/SOFT.git

python -m pip install -e SOFT

```

## Main results

### ImageNet-1K Image Classification

| Model       | Resolution | Params | FLOPs | Top-1 % | Config |Pretrained Model|

|-------------|:----------:|:------:|:-----:|:-------:|--------|--------

| SOFT-Tiny   | 224        | 13M    | 1.9G  | 79.3    |[SOFT_Tiny.yaml](config/SOFT_Tiny.yaml), [SOFT_Tiny_cuda.yaml](config/SOFT_Tiny_cuda.yaml)|[SOFT_Tiny](https://drive.google.com/file/d/1S04DCotIOkP0DaBb8WStQ513z82qT9de/view?usp=sharing), [SOFT_Tiny_cuda](https://drive.google.com/file/d/1inDKh3Wz_2KQgGH_2ywU5H_gLKZpIz_u/view?usp=sharing)

| SOFT-Small  | 224        | 24M    | 3.3G  | 82.2    |[SOFT_Small.yaml](config/SOFT_Small.yaml), [SOFT_Small_cuda.yaml](config/SOFT_Small_cuda.yaml)|

| SOFT-Medium | 224        | 45M    | 7.2G  | 82.9    |[SOFT_Meidum.yaml](config/SOFT_Medium.yaml), [SOFT_Meidum_cuda.yaml](config/SOFT_Medium_cuda.yaml)|

| SOFT-Large  | 224        | 64M    | 11.0G | 83.1    |[SOFT_Large.yaml](config/SOFT_Large.yaml), [SOFT_Large_cuda.yaml](config/SOFT_Large_cuda.yaml)|

| SOFT-Huge   | 224        | 87M    | 16.3G | 83.3    |[SOFT_Huge.yaml](config/SOFT_Huge.yaml), [SOFT_Huge_cuda.yaml](config/SOFT_Huge_cuda.yaml)|

| SOFT-Tiny-Norm   | 224        | 13M    | 1.9G  | 79.4    |[SOFT_Tiny_norm.yaml](config/SOFT_Tiny_norm.yaml)|[SOFT_Tiny_norm](https://drive.google.com/file/d/1Isy5b9v_4pyIXDqhKPNRq3WKH0etDlfl/view?usp=sharing)|

| SOFT-Small-Norm  | 224        | 24M    | 3.3G  | 82.4    |[SOFT_Small_norm.yaml](config/SOFT_Small_norm.yaml)|[SOFT_Small_norm](https://drive.google.com/file/d/1OBjn7FzVdNP1Urqxq7X0yDykyPhxAAW1/view?usp=sharing)|

| SOFT-Medium-Norm | 224        | 45M    | 7.2G  | 83.1    |[SOFT_Meidum_norm.yaml](config/SOFT_Medium_norm.yaml)|[SOFT_Medium_norm](https://drive.google.com/file/d/1K2C6daaJn3jwurWh38uvV7rexirWjuzh/view?usp=sharing)|

| SOFT-Large-Norm  | 224        | 64M    | 11.0G | 83.3    |[SOFT_Large_norm.yaml](config/SOFT_Large_norm.yaml)|[SOFT_Large_norm](https://drive.google.com/file/d/1aRYuF_gbBGyiXUDKEcpHJmM04SdvTUdP/view?usp=sharing)|

| SOFT-Huge-Norm   | 224        | 87M    | 16.3G | 83.4    |[SOFT_Huge_norm.yaml](config/SOFT_Huge_norm.yaml)|

### COCO Object Detection (2017 val)

| Backbone     | Method | lr schd | box mAP | mask mAP | Params |

|-------------|:----------:|:------:|:-----:|:-------:|:--------:|

|SOFT-Tiny-Norm | RetinaNet | 1x | 40.0 | - | 23M|

|SOFT-Tiny-Norm | Mask R-CNN | 1x | 41.2 | 38.2 | 33M|

|SOFT-Small-Norm | RetinaNet | 1x | 42.8 | - | 34M|

|SOFT-Small-Norm | Mask R-CNN | 1x | 43.8 | 40.1 | 44M|

|SOFT-Medium-Norm | RetinaNet | 1x | 44.3 | - | 55M|

|SOFT-Medium-Norm | Mask R-CNN | 1x | 46.6 | 42.0 | 65M|

|SOFT-Large-Norm | RetinaNet | 1x | 45.3 | - | 74M|

|SOFT-Large-Norm | Mask R-CNN | 1x | 47.0 | 42.2 | 84M|

### ADE20K Semantic Segmentation (val)

| Backbone     | Method | Crop size| lr schd | mIoU | Params |

|-------------|:----------:|:----------:|:------:|:-----:|:-------:|

|SOFT-Small-Norm | UperNet |512x512| 1x | 46.2 | 54M|

|SOFT-Medium-Norm | UperNet |512x512 | 1x | 48.0 | 76M|

## Get Started

### Train

We have two implementations of Gaussian Kernel: `PyTorch` version and 

the exact form of Gaussian function implemented by `cuda`. The config file containing `cuda` is the 

cuda implementation. Both implementations yield same performance. 

Please **install** SOFT before running the `cuda` version. 

```shell

./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE}

# For example, train SOFT-Tiny on Imagenet training dataset with 8 GPUs

./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml

```

### Test

```shell

./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE} --eval_checkpoint ${CHECKPOINT_FILE} --eval

# For example, test SOFT-Tiny on Imagenet validation dataset with 8 GPUs

./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml --eval_checkpoint ${CHECKPOINT_FILE} --eval

```

## Reference

```bibtex

@inproceedings{SOFT,

    title={SOFT: Softmax-free Transformer with Linear Complexity}, 

    author={Lu, Jiachen and Yao, Jinghan and Zhang, Junge and Zhu, Xiatian and Xu, Hang and Gao, Weiguo and Xu, Chunjing and Xiang, Tao and Zhang, Li},

    booktitle={NeurIPS},

    year={2021}

}

```

```bibtex

@article{Softmax,

    title={Softmax-free Linear Transformers}, 

    author={Lu, Jiachen and Zhang, Li and Zhang, Junge and Zhu, Xiatian and Feng, Jianfeng and Xiang, Tao},

    journal={International Journal of Coumputer Vision},

    year={2024}

}

```

## License

[MIT](LICENSE)

## Acknowledgement

Thanks to previous open-sourced repo:  

[Detectron2](https://github.com/facebookresearch/detectron2)  

[T2T-ViT](https://github.com/yitu-opensource/T2T-ViT)  

[PVT](https://github.com/whai362/PVT)   

[Nystromformer](https://github.com/mlpen/Nystromformer)   

[pytorch-image-models](https://github.com/rwightman/pytorch-image-models)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fudan-zvg/SOFT

Awesome Lists containing this project

README