Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Little-Podi/GRM
[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".
https://github.com/Little-Podi/GRM
attention-mechanism cvpr2023 object-tracking pytorch single-object-tracking tracking vision-transformer visual-tracking
Last synced: 6 days ago
JSON representation
[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: "Generalized Relation Modeling for Transformer Tracking".
- Host: GitHub
- URL: https://github.com/Little-Podi/GRM
- Owner: Little-Podi
- License: mit
- Created: 2023-03-23T08:51:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-30T03:16:06.000Z (10 months ago)
- Last Synced: 2024-08-02T06:12:10.369Z (3 months ago)
- Topics: attention-mechanism, cvpr2023, object-tracking, pytorch, single-object-tracking, tracking, vision-transformer, visual-tracking
- Language: Python
- Homepage:
- Size: 660 KB
- Stars: 67
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- Awesome-Visual-Object-Tracking - [code
README
# GRM
The official PyTorch implementation of our **CVPR 2023** paper:
**Generalized Relation Modeling for Transformer Tracking**
[Shenyuan Gao](https://github.com/Little-Podi), [Chunluan Zhou](https://www.sites.google.com/view/chunluanzhou), [Jun Zhang](https://eejzhang.people.ust.hk)
[[CVF Open Access](https://openaccess.thecvf.com/content/CVPR2023/html/Gao_Generalized_Relation_Modeling_for_Transformer_Tracking_CVPR_2023_paper.html)] [[ArXiv Preprint](https://arxiv.org/abs/2303.16580)] [[YouTube Video](https://youtu.be/bQKN3HV-8XI)] [[Trained Models](https://github.com/Little-Podi/GRM/releases/tag/downloads)] [[Raw Results](https://github.com/Little-Podi/GRM/releases/tag/downloads)] [[SOTA Paper List](https://github.com/Little-Podi/Transformer_Tracking)]
## Highlight
![](GRM.png)
### :bookmark:Brief Introduction
Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let the template interact with all parts inside the search region throughout all the encoder layers. This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose **generalized relation modeling** (GRM) based on adaptive token division. The proposed method is a generalized formulation of attention-based relation modeling for Transformer tracking, which inherits the merits of both previous two-stream and one-stream pipelines whilst enabling more flexible relation modeling by selecting appropriate search tokens to interact with template tokens.
### :bookmark:Strong Performance
| Variant | GRM-GOT | GRM | GRM-L320 |
| :-----------------------------: | :---------------------: | :---------------------: | :---------------------: |
| Model Config | ViT-B, 256^2 resolution | ViT-B, 256^2 resolution | ViT-L, 320^2 resolution |
| Training Setting | only GOT, 100 epochs | 4 datasets, 300 epochs | 4 datasets, 300 epochs |
| GOT-10k (AO / SR 0.5 / SR 0.75) | 73.4 / 82.9 / 70.4 | - | - |
| LaSOT (AUC / Norm P / P) | - | 69.9 / 79.3 / 75.8 | 71.4 / 81.2 / 77.9 |
| TrackingNet (AUC / Norm P / P) | - | 84.0 / 88.7 / 83.3 | 84.4 / 88.9 / 84.0 |
| AVisT (AUC / OP50 / OP75) | - | 54.5 / 63.1 / 45.2 | 55.1 / 63.8 / 46.9 |
| NfS30 (AUC) | - | 65.6 | 66.0 |
| UAV123 (AUC) | - | 70.2 | 72.2 |### :bookmark:Inference Speed
Our baseline model (backbone: ViT-B, resolution: 256x256) can run at **45 fps** (frames per second) on a single NVIDIA GeForce RTX 3090.
### :bookmark:Training Cost
It takes **less than half a day** to train our baseline model for 300 epochs on 8 NVIDIA GeForce RTX 3090 (each of which has 24GB GPU memory).
## Release
**Trained Models** (including the baseline model GRM, GRM-GOT and a stronger variant GRM-L320) [[download zip file](https://github.com/Little-Podi/GRM/releases/download/downloads/Trained_Models.zip)]
**Raw Results** (including raw tracking results on six datasets we benchmarked in the paper and listed above) [[download zip file](https://github.com/Little-Podi/GRM/releases/download/downloads/Raw_Results.zip)]
Download and unzip these two zip files into the `output` directory under GRM project path, then both of them can be directly used by our code.
## Let's Get Started
- ### Environment
Our experiments are conducted with Ubuntu 20.04 and CUDA 11.6.
- ### Preparation
- Clone our repository to your local project directory.
- Download the pre-trained weights from [MAE](https://github.com/facebookresearch/mae) or [DeiT](https://github.com/facebookresearch/deit/blob/main/README_deit.md), and place the files into the `pretrained_models` directory under GRM project path. You may want to try different pre-trained weights, so I list the links of pre-trained models integrated in this project.
| Backbone Type | Model File | Checkpoint Link |
| :-----------: | :--------------------------------------------: | :----------------------------------------------------------: |
| 'vit_base' | 'mae_pretrain_vit_base.pth' | [download](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth) |
| 'vit_large' | 'mae_pretrain_vit_large.pth' | [download](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth) |
| 'vit_base' | 'deit_base_patch16_224-b5f2ef4d.pth' | [download](https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth) |
| 'vit_base' | 'deit_base_distilled_patch16_224-df68dfff.pth' | [download](https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_224-df68dfff.pth) |- Download the training datasets ([LaSOT](http://vision.cs.stonybrook.edu/~lasot/download.html), [TrackingNet](https://github.com/SilvioGiancola/TrackingNet-devkit), [GOT-10k](http://got-10k.aitestunion.com/downloads), [COCO2017](https://cocodataset.org/#download)) and testing datasets ([NfS](http://ci2cv.net/nfs/index.html), [UAV123](https://cemse.kaust.edu.sa/ivul/uav123), [AVisT](https://sites.google.com/view/avist-benchmark)) to your disk, the organized directory should look like:
```
--LaSOT/
|--airplane
|...
|--zebra
--TrackingNet/
|--TRAIN_0
|...
|--TEST
--GOT10k/
|--test
|--train
|--val
--COCO/
|--annotations
|--images
--NFS30/
|--anno
|--sequences
--UAV123/
|--anno
|--data_seq
--AVisT/
|--anno
|--full_occlusion
|--out_of_view
|--sequences
```- Edit the paths in `lib/test/evaluation/local.py` and `lib/train/adim/local.py` to the proper ones.
- ### Installation
We use conda to manage the environment.
```
conda create --name grm python=3.9
conda activate grm
bash install.sh
```
- ### Training- Multiple GPU training by DDP (suppose you have 8 GPU)
```
python tracking/train.py --mode multiple --nproc 8
```- Single GPU debugging (too slow, not recommended for training)
```
python tracking/train.py
```- For GOT-10k evaluation, remember to set `--config vitb_256_got_ep100`.
- To pursuit performance, switch to a stronger variant by setting `--config vitl_320_ep300`.
- ### Evaluation
- Make sure you have prepared the trained model.
- LaSOT
```
python tracking/test.py --dataset lasot
```
Then evaluate the raw results using the [official MATLAB toolkit](https://github.com/HengLan/LaSOT_Evaluation_Toolkit).
- TrackingNet
```
python tracking/test.py --dataset trackingnet
python lib/test/utils/transform_trackingnet.py
```
Then upload `test/tracking_results/grm/vitb_256_ep300/trackingnet_submit.zip` to the [online evaluation server](https://eval.ai/web/challenges/challenge-page/1805/overview).
- GOT-10k
```
python tracking/test.py --param vitb_256_got_ep100 --dataset got10k_test
python lib/test/utils/transform_got10k.py
```
Then upload `test/tracking_results/grm/vitb_256_got_ep100/got10k_submit.zip` to the [online evaluation server](http://got-10k.aitestunion.com/submit_instructions).
- NfS30, UAV123, AVisT
```
python tracking/test.py --dataset nfs
python tracking/test.py --dataset uav
python tracking/test.py --dataset avist
python tracking/analysis_results.py
```
- For multiple threads inference, just add `--threads 40` after `tracking/test.py` (suppose you want to use 40 threads in total).
- To show the immediate prediction results during inference, modify `settings.show_result = True` in `lib/test/evaluation/local.py` (may have bugs if you try this on a remote sever).
- Please refer to [DynamicViT Example](https://github.com/raoyongming/DynamicViT/blob/master/viz_example.ipynb) for the visualization of search token division results.## Acknowledgement
:heart::heart::heart:Our idea is implemented base on the following projects. We really appreciate their excellent open-source works!
- [OSTrack](https://github.com/botaoye/OSTrack) [[related paper](https://arxiv.org/abs/2203.11991)]
- [AiATrack](https://github.com/Little-Podi/AiATrack) [[related paper](https://arxiv.org/abs/2207.09603)]
- [DynamicViT](https://github.com/raoyongming/DynamicViT) [[related paper](https://arxiv.org/abs/2106.02034)]
- [PyTracking](https://github.com/visionml/pytracking) [[related paper](https://arxiv.org/abs/2208.06888)]## Citation
If any parts of our paper and code help your research, please consider citing us and giving a star to our repository.
```
@inproceedings{gao2023generalized,
title={Generalized Relation Modeling for Transformer Tracking},
author={Gao, Shenyuan and Zhou, Chunluan and Zhang, Jun},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18686--18695},
year={2023}
}
```## Contact
If you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub [homepage](https://github.com/Little-Podi). Suggestions and collaborations are also highly welcome!