https://github.com/fudan-zvg/SOFT
[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity
https://github.com/fudan-zvg/SOFT
linear-complexity linear-transformer self-attention softmax-free transformer
Last synced: 5 months ago
JSON representation
[NeurIPS 2021 Spotlight] & [IJCV 2024] SOFT: Softmax-free Transformer with Linear Complexity
- Host: GitHub
- URL: https://github.com/fudan-zvg/SOFT
- Owner: fudan-zvg
- License: mit
- Created: 2021-09-11T10:10:33.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2024-03-16T01:28:20.000Z (over 1 year ago)
- Last Synced: 2025-04-06T13:11:53.417Z (6 months ago)
- Topics: linear-complexity, linear-transformer, self-attention, softmax-free, transformer
- Language: Python
- Homepage:
- Size: 5.06 MB
- Stars: 308
- Watchers: 7
- Forks: 25
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Softmax-free Linear Transformers

> [**SOFT: Softmax-free Transformer with Linear Complexity**](https://arxiv.org/abs/2110.11945),
> Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang
> **NeurIPS 2021**> [**Softmax-free Linear Transformers**](https://arxiv.org/abs/2207.03341),
> Jiachen Lu, Junge Zhang, Xiatian Zhu, Jianfeng Feng, Tao Xiang, Li Zhang
> **IJCV 2024**## What's new
1. We propose a normalized softmax-free self-attention with stronger generalizability.
2. SOFT is now avaliable on more vision tasks (object detection and semantic segmentation).## NEWS
- [2024/02/12] Our journal extension [Softmax-free Linear Transformer](https://arxiv.org/abs/2207.03341) is accepted by IJCV.
- [2022/07/05] SOFT is now available for downstream tasks! An efficient normalization is applied to SOFT. Please refer to [SOFT-Norm](https://github.com/fudan-zvg/SOFT/tree/normalization)## Requirments
* timm==0.3.2* torch>=1.7.0 and torchvision that matches the PyTorch installation
* cuda>=10.2
Compilation may be fail on cuda < 10.2.
We have compiled it successfully on `cuda 10.2` and `cuda 11.2`.### Data preparation
Download and extract ImageNet train and val images from http://image-net.org/.
The directory structure is the standard layout for the torchvision [`datasets.ImageFolder`](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder), and the training and validation data is expected to be in the `train/` folder and `val` folder respectively:```
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class/2
img4.jpeg
```
## Installation
```shell script
git clone https://github.com/fudan-zvg/SOFT.git
python -m pip install -e SOFT
```## Main results
### ImageNet-1K Image Classification| Model | Resolution | Params | FLOPs | Top-1 % | Config |Pretrained Model|
|-------------|:----------:|:------:|:-----:|:-------:|--------|--------
| SOFT-Tiny | 224 | 13M | 1.9G | 79.3 |[SOFT_Tiny.yaml](config/SOFT_Tiny.yaml), [SOFT_Tiny_cuda.yaml](config/SOFT_Tiny_cuda.yaml)|[SOFT_Tiny](https://drive.google.com/file/d/1S04DCotIOkP0DaBb8WStQ513z82qT9de/view?usp=sharing), [SOFT_Tiny_cuda](https://drive.google.com/file/d/1inDKh3Wz_2KQgGH_2ywU5H_gLKZpIz_u/view?usp=sharing)
| SOFT-Small | 224 | 24M | 3.3G | 82.2 |[SOFT_Small.yaml](config/SOFT_Small.yaml), [SOFT_Small_cuda.yaml](config/SOFT_Small_cuda.yaml)|
| SOFT-Medium | 224 | 45M | 7.2G | 82.9 |[SOFT_Meidum.yaml](config/SOFT_Medium.yaml), [SOFT_Meidum_cuda.yaml](config/SOFT_Medium_cuda.yaml)|
| SOFT-Large | 224 | 64M | 11.0G | 83.1 |[SOFT_Large.yaml](config/SOFT_Large.yaml), [SOFT_Large_cuda.yaml](config/SOFT_Large_cuda.yaml)|
| SOFT-Huge | 224 | 87M | 16.3G | 83.3 |[SOFT_Huge.yaml](config/SOFT_Huge.yaml), [SOFT_Huge_cuda.yaml](config/SOFT_Huge_cuda.yaml)|
| SOFT-Tiny-Norm | 224 | 13M | 1.9G | 79.4 |[SOFT_Tiny_norm.yaml](config/SOFT_Tiny_norm.yaml)|[SOFT_Tiny_norm](https://drive.google.com/file/d/1Isy5b9v_4pyIXDqhKPNRq3WKH0etDlfl/view?usp=sharing)|
| SOFT-Small-Norm | 224 | 24M | 3.3G | 82.4 |[SOFT_Small_norm.yaml](config/SOFT_Small_norm.yaml)|[SOFT_Small_norm](https://drive.google.com/file/d/1OBjn7FzVdNP1Urqxq7X0yDykyPhxAAW1/view?usp=sharing)|
| SOFT-Medium-Norm | 224 | 45M | 7.2G | 83.1 |[SOFT_Meidum_norm.yaml](config/SOFT_Medium_norm.yaml)|[SOFT_Medium_norm](https://drive.google.com/file/d/1K2C6daaJn3jwurWh38uvV7rexirWjuzh/view?usp=sharing)|
| SOFT-Large-Norm | 224 | 64M | 11.0G | 83.3 |[SOFT_Large_norm.yaml](config/SOFT_Large_norm.yaml)|[SOFT_Large_norm](https://drive.google.com/file/d/1aRYuF_gbBGyiXUDKEcpHJmM04SdvTUdP/view?usp=sharing)|
| SOFT-Huge-Norm | 224 | 87M | 16.3G | 83.4 |[SOFT_Huge_norm.yaml](config/SOFT_Huge_norm.yaml)|### COCO Object Detection (2017 val)
| Backbone | Method | lr schd | box mAP | mask mAP | Params |
|-------------|:----------:|:------:|:-----:|:-------:|:--------:|
|SOFT-Tiny-Norm | RetinaNet | 1x | 40.0 | - | 23M|
|SOFT-Tiny-Norm | Mask R-CNN | 1x | 41.2 | 38.2 | 33M|
|SOFT-Small-Norm | RetinaNet | 1x | 42.8 | - | 34M|
|SOFT-Small-Norm | Mask R-CNN | 1x | 43.8 | 40.1 | 44M|
|SOFT-Medium-Norm | RetinaNet | 1x | 44.3 | - | 55M|
|SOFT-Medium-Norm | Mask R-CNN | 1x | 46.6 | 42.0 | 65M|
|SOFT-Large-Norm | RetinaNet | 1x | 45.3 | - | 74M|
|SOFT-Large-Norm | Mask R-CNN | 1x | 47.0 | 42.2 | 84M|### ADE20K Semantic Segmentation (val)
| Backbone | Method | Crop size| lr schd | mIoU | Params |
|-------------|:----------:|:----------:|:------:|:-----:|:-------:|
|SOFT-Small-Norm | UperNet |512x512| 1x | 46.2 | 54M|
|SOFT-Medium-Norm | UperNet |512x512 | 1x | 48.0 | 76M|
## Get Started### Train
We have two implementations of Gaussian Kernel: `PyTorch` version and
the exact form of Gaussian function implemented by `cuda`. The config file containing `cuda` is the
cuda implementation. Both implementations yield same performance.
Please **install** SOFT before running the `cuda` version.
```shell
./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE}
# For example, train SOFT-Tiny on Imagenet training dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml
```### Test
```shell
./dist_train.sh ${GPU_NUM} --data ${DATA_PATH} --config ${CONFIG_FILE} --eval_checkpoint ${CHECKPOINT_FILE} --eval
# For example, test SOFT-Tiny on Imagenet validation dataset with 8 GPUs
./dist_train.sh 8 --data ${DATA_PATH} --config config/SOFT_Tiny.yaml --eval_checkpoint ${CHECKPOINT_FILE} --eval
```
## Reference```bibtex
@inproceedings{SOFT,
title={SOFT: Softmax-free Transformer with Linear Complexity},
author={Lu, Jiachen and Yao, Jinghan and Zhang, Junge and Zhu, Xiatian and Xu, Hang and Gao, Weiguo and Xu, Chunjing and Xiang, Tao and Zhang, Li},
booktitle={NeurIPS},
year={2021}
}
``````bibtex
@article{Softmax,
title={Softmax-free Linear Transformers},
author={Lu, Jiachen and Zhang, Li and Zhang, Junge and Zhu, Xiatian and Feng, Jianfeng and Xiang, Tao},
journal={International Journal of Coumputer Vision},
year={2024}
}
```## License
[MIT](LICENSE)
## Acknowledgement
Thanks to previous open-sourced repo:
[Detectron2](https://github.com/facebookresearch/detectron2)
[T2T-ViT](https://github.com/yitu-opensource/T2T-ViT)
[PVT](https://github.com/whai362/PVT)
[Nystromformer](https://github.com/mlpen/Nystromformer)
[pytorch-image-models](https://github.com/rwightman/pytorch-image-models)