{"id":13521215,"url":"https://github.com/Little-Podi/GRM","last_synced_at":"2025-03-31T20:30:58.979Z","repository":{"id":154449385,"uuid":"617853742","full_name":"Little-Podi/GRM","owner":"Little-Podi","description":"[CVPR'23] The official PyTorch implementation of our CVPR 2023 paper: \"Generalized Relation Modeling for Transformer Tracking\".","archived":false,"fork":false,"pushed_at":"2023-12-30T03:16:06.000Z","size":676,"stargazers_count":69,"open_issues_count":0,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-02T05:32:46.616Z","etag":null,"topics":["attention-mechanism","cvpr2023","object-tracking","pytorch","single-object-tracking","tracking","vision-transformer","visual-tracking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Little-Podi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-23T08:51:45.000Z","updated_at":"2024-10-17T10:49:23.000Z","dependencies_parsed_at":"2023-12-30T04:23:19.923Z","dependency_job_id":"08765f45-82b1-4670-a29f-b28420d6251d","html_url":"https://github.com/Little-Podi/GRM","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Little-Podi%2FGRM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Little-Podi%2FGRM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Little-Podi%2FGRM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Little-Podi%2FGRM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Little-Podi","download_url":"https://codeload.github.com/Little-Podi/GRM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246535803,"owners_count":20793326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-mechanism","cvpr2023","object-tracking","pytorch","single-object-tracking","tracking","vision-transformer","visual-tracking"],"created_at":"2024-08-01T06:00:30.890Z","updated_at":"2025-03-31T20:30:58.638Z","avatar_url":"https://github.com/Little-Podi.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["CVPR 2023"],"readme":"# GRM\n\nThe official PyTorch implementation of our **CVPR 2023** paper:\n\n**Generalized Relation Modeling for Transformer Tracking**\n\n[Shenyuan Gao](https://github.com/Little-Podi), [Chunluan Zhou](https://www.sites.google.com/view/chunluanzhou), [Jun Zhang](https://eejzhang.people.ust.hk)\n\n[[CVF Open Access](https://openaccess.thecvf.com/content/CVPR2023/html/Gao_Generalized_Relation_Modeling_for_Transformer_Tracking_CVPR_2023_paper.html)] [[ArXiv Preprint](https://arxiv.org/abs/2303.16580)] [[YouTube Video](https://youtu.be/bQKN3HV-8XI)] [[Trained Models](https://github.com/Little-Podi/GRM/releases/tag/downloads)] [[Raw Results](https://github.com/Little-Podi/GRM/releases/tag/downloads)] [[SOTA Paper List](https://github.com/Little-Podi/Transformer_Tracking)]\n\n## Highlight\n\n![](GRM.png)\n\n### :bookmark:Brief Introduction\n\nCompared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let the template interact with all parts inside the search region throughout all the encoder layers. This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose **generalized relation modeling** (GRM) based on adaptive token division. The proposed method is a generalized formulation of attention-based relation modeling for Transformer tracking, which inherits the merits of both previous two-stream and one-stream pipelines whilst enabling more flexible relation modeling by selecting appropriate search tokens to interact with template tokens.\n\n### :bookmark:Strong Performance\n\n|             Variant             |         GRM-GOT         |           GRM           |        GRM-L320         |\n| :-----------------------------: | :---------------------: | :---------------------: | :---------------------: |\n|          Model Config           | ViT-B, 256^2 resolution | ViT-B, 256^2 resolution | ViT-L, 320^2 resolution |\n|        Training Setting         |  only GOT, 100 epochs   | 4 datasets, 300 epochs  | 4 datasets, 300 epochs  |\n| GOT-10k (AO / SR 0.5 / SR 0.75) |   73.4 / 82.9 / 70.4    |            -            |            -            |\n|    LaSOT (AUC / Norm P / P)     |            -            |   69.9 / 79.3 / 75.8    |   71.4 / 81.2 / 77.9    |\n| TrackingNet (AUC / Norm P / P)  |            -            |   84.0 / 88.7 / 83.3    |   84.4 / 88.9 / 84.0    |\n|    AVisT (AUC / OP50 / OP75)    |            -            |   54.5 / 63.1 / 45.2    |   55.1 / 63.8 / 46.9    |\n|           NfS30 (AUC)           |            -            |          65.6           |          66.0           |\n|          UAV123 (AUC)           |            -            |          70.2           |          72.2           |\n\n### :bookmark:Inference Speed\n\nOur baseline model (backbone: ViT-B, resolution: 256x256) can run at **45 fps** (frames per second) on a single NVIDIA GeForce RTX 3090.\n\n### :bookmark:Training Cost\n\nIt takes **less than half a day** to train our baseline model for 300 epochs on 8 NVIDIA GeForce RTX 3090 (each of which has 24GB GPU memory).\n\n## Release\n\n**Trained Models** (including the baseline model GRM, GRM-GOT and a stronger variant GRM-L320) [[download zip file](https://github.com/Little-Podi/GRM/releases/download/downloads/Trained_Models.zip)]\n\n**Raw Results** (including raw tracking results on six datasets we benchmarked in the paper and listed above) [[download zip file](https://github.com/Little-Podi/GRM/releases/download/downloads/Raw_Results.zip)]\n\nDownload and unzip these two zip files into the `output` directory under GRM project path, then both of them can be directly used by our code.\n\n## Let's Get Started\n\n- ### Environment\n\n  Our experiments are conducted with Ubuntu 20.04 and CUDA 11.6.\n\n- ### Preparation\n\n  - Clone our repository to your local project directory.\n\n  - Download the pre-trained weights from [MAE](https://github.com/facebookresearch/mae) or [DeiT](https://github.com/facebookresearch/deit/blob/main/README_deit.md), and place the files into the `pretrained_models` directory under GRM project path. You may want to try different pre-trained weights, so I list the links of pre-trained models integrated in this project.\n\n    | Backbone Type |                   Model File                   |                       Checkpoint Link                        |\n    | :-----------: | :--------------------------------------------: | :----------------------------------------------------------: |\n    |  'vit_base'   |          'mae_pretrain_vit_base.pth'           | [download](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth) |\n    |  'vit_large'  |          'mae_pretrain_vit_large.pth'          | [download](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth) |\n    |  'vit_base'   |      'deit_base_patch16_224-b5f2ef4d.pth'      | [download](https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth) |\n    |  'vit_base'   | 'deit_base_distilled_patch16_224-df68dfff.pth' | [download](https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_224-df68dfff.pth) |\n\n  - Download the training datasets ([LaSOT](http://vision.cs.stonybrook.edu/~lasot/download.html), [TrackingNet](https://github.com/SilvioGiancola/TrackingNet-devkit), [GOT-10k](http://got-10k.aitestunion.com/downloads), [COCO2017](https://cocodataset.org/#download)) and testing datasets ([NfS](http://ci2cv.net/nfs/index.html), [UAV123](https://cemse.kaust.edu.sa/ivul/uav123), [AVisT](https://sites.google.com/view/avist-benchmark)) to your disk, the organized directory should look like:\n\n    ```\n    --LaSOT/\n    \t|--airplane\n    \t|...\n    \t|--zebra\n    --TrackingNet/\n    \t|--TRAIN_0\n    \t|...\n    \t|--TEST\n    --GOT10k/\n    \t|--test\n    \t|--train\n    \t|--val\n    --COCO/\n    \t|--annotations\n    \t|--images\n    --NFS30/\n    \t|--anno\n    \t|--sequences\n    --UAV123/\n    \t|--anno\n    \t|--data_seq\n    --AVisT/\n    \t|--anno\n    \t|--full_occlusion\n    \t|--out_of_view\n    \t|--sequences\n    ```\n\n  - Edit the paths in `lib/test/evaluation/local.py` and `lib/train/adim/local.py` to the proper ones.\n\n- ### Installation\n\n  We use conda to manage the environment.\n\n  ```\n  conda create --name grm python=3.9\n  conda activate grm\n  bash install.sh\n  ```\n  \n- ### Training\n\n  - Multiple GPU training by DDP (suppose you have 8 GPU)\n\n    ```\n    python tracking/train.py --mode multiple --nproc 8\n    ```\n\n  - Single GPU debugging (too slow, not recommended for training)\n\n    ```\n    python tracking/train.py\n    ```\n\n  - For GOT-10k evaluation, remember to set `--config vitb_256_got_ep100`.\n\n  - To pursuit performance, switch to a stronger variant by setting `--config vitl_320_ep300`.\n\n- ### Evaluation\n\n  - Make sure you have prepared the trained model.\n\n  - LaSOT\n\n    ```\n    python tracking/test.py --dataset lasot\n    ```\n  \n    Then evaluate the raw results using the [official MATLAB toolkit](https://github.com/HengLan/LaSOT_Evaluation_Toolkit).\n  \n  - TrackingNet\n  \n    ```\n    python tracking/test.py --dataset trackingnet\n    python lib/test/utils/transform_trackingnet.py\n    ```\n  \n    Then upload `test/tracking_results/grm/vitb_256_ep300/trackingnet_submit.zip` to the [online evaluation server](https://eval.ai/web/challenges/challenge-page/1805/overview).\n  \n  - GOT-10k\n  \n    ```\n    python tracking/test.py --param vitb_256_got_ep100 --dataset got10k_test\n    python lib/test/utils/transform_got10k.py\n    ```\n  \n    Then upload `test/tracking_results/grm/vitb_256_got_ep100/got10k_submit.zip` to the [online evaluation server](http://got-10k.aitestunion.com/submit_instructions).\n  \n  - NfS30, UAV123, AVisT\n  \n    ```\n    python tracking/test.py --dataset nfs\n    python tracking/test.py --dataset uav\n    python tracking/test.py --dataset avist\n    python tracking/analysis_results.py\n    ```\n  \n  - For multiple threads inference, just add `--threads 40` after `tracking/test.py` (suppose you want to use 40 threads in total).\n  \n  - To show the immediate prediction results during inference, modify `settings.show_result = True` in `lib/test/evaluation/local.py` (may have bugs if you try this on a remote sever).\n  \n  - Please refer to [DynamicViT Example](https://github.com/raoyongming/DynamicViT/blob/master/viz_example.ipynb) for the visualization of search token division results.\n\n## Acknowledgement\n\n:heart::heart::heart:Our idea is implemented base on the following projects. We really appreciate their excellent open-source works!\n\n- [OSTrack](https://github.com/botaoye/OSTrack) [[related paper](https://arxiv.org/abs/2203.11991)]\n- [AiATrack](https://github.com/Little-Podi/AiATrack) [[related paper](https://arxiv.org/abs/2207.09603)]\n- [DynamicViT](https://github.com/raoyongming/DynamicViT) [[related paper](https://arxiv.org/abs/2106.02034)]\n- [PyTracking](https://github.com/visionml/pytracking) [[related paper](https://arxiv.org/abs/2208.06888)]\n\n## Citation\n\nIf any parts of our paper and code help your research, please consider citing us and giving a star to our repository.\n\n```\n@inproceedings{gao2023generalized,\n  title={Generalized Relation Modeling for Transformer Tracking},\n  author={Gao, Shenyuan and Zhou, Chunluan and Zhang, Jun},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={18686--18695},\n  year={2023}\n}\n```\n\n## Contact\n\nIf you have any questions or concerns, feel free to open issues or directly contact me through the ways on my GitHub [homepage](https://github.com/Little-Podi). Suggestions and collaborations are also highly welcome!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLittle-Podi%2FGRM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLittle-Podi%2FGRM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLittle-Podi%2FGRM/lists"}