{"id":13521274,"url":"https://github.com/dawnyc/ROMTrack","last_synced_at":"2025-03-31T20:31:07.993Z","repository":{"id":186515732,"uuid":"675294071","full_name":"dawnyc/ROMTrack","owner":"dawnyc","description":"[ICCV 2023] Robust Object Modeling for Visual Tracking, Official Implementation","archived":false,"fork":false,"pushed_at":"2025-01-05T09:34:14.000Z","size":6250,"stargazers_count":40,"open_issues_count":3,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-05T10:27:54.925Z","etag":null,"topics":["iccv2023","object-modeling","pytorch","robustness","tracking","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2308.05140","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dawnyc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-06T12:53:42.000Z","updated_at":"2025-01-05T09:34:18.000Z","dependencies_parsed_at":"2024-11-02T05:41:33.952Z","dependency_job_id":null,"html_url":"https://github.com/dawnyc/ROMTrack","commit_stats":{"total_commits":12,"total_committers":1,"mean_commits":12.0,"dds":0.0,"last_synced_commit":"1d3b8a88be1f5aa40802c837528feaadd3d25619"},"previous_names":["dawnyc/romtrack"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dawnyc%2FROMTrack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dawnyc%2FROMTrack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dawnyc%2FROMTrack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dawnyc%2FROMTrack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dawnyc","download_url":"https://codeload.github.com/dawnyc/ROMTrack/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246535826,"owners_count":20793328,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["iccv2023","object-modeling","pytorch","robustness","tracking","transformer"],"created_at":"2024-08-01T06:00:32.059Z","updated_at":"2025-03-31T20:31:07.987Z","avatar_url":"https://github.com/dawnyc.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["ICCV 2023"],"readme":"# ROMTrack\nThe official implementation of the ICCV 2023 paper [*Robust Object Modeling for Visual Tracking*](https://arxiv.org/abs/2308.05140)\n\n[[CVF Open Access]](https://openaccess.thecvf.com/content/ICCV2023/papers/Cai_Robust_Object_Modeling_for_Visual_Tracking_ICCV_2023_paper.pdf\n) [[Poster]](asset/Poster.pdf) [[Video]](https://www.bilibili.com/video/BV1p84y1d7ja/)\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"100%\" src=\"asset/ROMTrack_Framework.png\" alt=\"ROMTrack_Pipeline\"/\u003e\n\u003c/p\u003e\n\n[[Models and Raw Results]](https://drive.google.com/drive/folders/1Q7CpNIhWX05VU7gECnhePu3dKzTV_VoK?usp=drive_link) (Google Drive) [[Models and Raw Results]](https://pan.baidu.com/s/1JsOh_YKPmVAdJwn_XcUg5g) (Baidu Netdisk: romt)\n\n#### Base Models \n\n|                Variant               |           ROMTrack           |         ROMTrack-384         |\n| :----------------------------------: | :--------------------------: | :--------------------------: |\n|             Model Setting            |           ViT-Base           |           ViT-Base           |\n|           Pretrained Method          |             MAE              |             MAE              |\n|           Pretrained Weight          |[MAE checkpoint](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth)|[MAE checkpoint](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth)|\n|           Template / Search          |      128×128 / 256×256       |      192×192 / 384×384       |\n| GOT-10k \u003cbr\u003e (AO / SR 0.5 / SR 0.75) |      72.9 / 82.9 / 70.2      |      74.2 / 84.3 / 72.4      |\n|    LaSOT \u003cbr\u003e (AUC / Norm P / P)     |      69.3 / 78.8 / 75.6      |      71.4 / 81.4 / 78.2      |\n| TrackingNet \u003cbr\u003e (AUC / Norm P / P)  |      83.6 / 88.4 / 82.7      |      84.1 / 89.0 / 83.7      |\n|  LaSOT_ext \u003cbr\u003e (AUC / Norm P / P)   |      48.9 / 59.3 / 55.0      |      51.3 / 62.4 / 58.6      |\n|    TNL2K \u003cbr\u003e (AUC / Norm P / P)     |      56.9 / 73.7 / 58.1      |      58.0 / 75.0 / 59.6      |\n|      NFS / OTB / UAV \u003cbr\u003e (AUC)      |      68.0 / 71.4 / 69.7      |      68.8 / 70.9 / 70.5      |\n|   VOT2020 BBox \u003cbr\u003e (EAO / A / R)    |    0.326 / 0.480 / 0.816     |    0.329 / 0.483 / 0.822     |\n|     GPU FPS / MACs(G) / Params(M)    |       116 / 34.5 / 92.1      |        67 / 77.7 / 92.1      |\n|                CPU FPS               |              9.9             |              3.0             |\n\n#### Extended Models (Efficiency-Oriented)\n\n|                Variant               |       ROMTrack-Tiny-256      |      ROMTrack-Small-256      |\n| :----------------------------------: | :--------------------------: | :--------------------------: |\n|             Model Setting            |           ViT-Tiny           |           ViT-Small          |\n|           Pretrained Method          |  Supervised on ImageNet-22k  |  Supervised on ImageNet-22k  |\n|           Pretrained Weight          |[Timm checkpoint](https://storage.googleapis.com/vit_models/augreg/Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz)|[Timm checkpoint](https://storage.googleapis.com/vit_models/augreg/S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz)|\n|           Template / Search          |      128×128 / 256×256       |      128×128 / 256×256       |\n|    LaSOT \u003cbr\u003e (AUC / Norm P / P)     |      59.3 / 68.8 / 60.4      |      62.3 / 72.3 / 65.3      |\n| TrackingNet \u003cbr\u003e (AUC / Norm P / P)  |      75.8 / 81.7 / 71.5      |      78.5 / 84.3 / 75.3      |\n|  LaSOT_ext \u003cbr\u003e (AUC / Norm P / P)   |      40.4 / 49.7 / 43.1      |      43.2 / 52.9 / 47.1      |\n|    TNL2K \u003cbr\u003e (AUC / Norm P / P)     |      48.6 / 64.4 / 45.5      |      52.0 / 68.7 / 50.5      |\n|      NFS / OTB / UAV \u003cbr\u003e (AUC)      |      62.5 / 68.5 / 62.9      |      65.3 / 68.9 / 66.4      |\n|   VOT2020 BBox \u003cbr\u003e (EAO / A / R)    |    0.265 / 0.459 / 0.704     |    0.297 / 0.477 / 0.764     |\n|     GPU FPS / MACs(G) / Params(M)    |       466 /  2.7 / 8.0       |       236 /  9.3 / 25.4      |\n|                CPU FPS               |             36.6             |             17.2             |\n\n#### Extended Models (Performance-Oriented)\n\n|                Variant               |      ROMTrack-Large-384      |\n| :----------------------------------: | :--------------------------: |\n|             Model Setting            |           ViT-Large          |\n|           Pretrained Method          |             MAE              |\n|           Pretrained Weight          |[MAE checkpoint](https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_large.pth)|\n|           Template / Search          |      192×192 / 384×384       |\n|    LaSOT \u003cbr\u003e (AUC / Norm P / P)     |      72.0 / 81.7 / 79.1      |\n| TrackingNet \u003cbr\u003e (AUC / Norm P / P)  |      85.2 / 89.8 / 85.4      |\n|  LaSOT_ext \u003cbr\u003e (AUC / Norm P / P)   |      52.9 / 64.3 / 60.9      |\n|    TNL2K \u003cbr\u003e (AUC / Norm P / P)     |      60.4 / 77.7 / 63.9      |\n|      NFS / OTB / UAV \u003cbr\u003e (AUC)      |      69.2 / 71.0 / 71.5      |\n|   VOT2020 BBox \u003cbr\u003e (EAO / A / R)    |    0.338 / 0.492 / 0.820     |\n|     GPU FPS / MACs(G) / Params(M)    |      21 / 266.5 / 311.3      |\n|                CPU FPS               |              1.1             |\n\n## :newspaper: News\n**[May 2, 2024]**\n- We release the extended models ***\u003cu\u003eROMTrack-Large-384\u003c/u\u003e*** for Performance-Oriented Visual Tracking!\n- Models and Raw Results for all versions of ROMTrack are available on Google Drive or Baidu Netdisk.\n- Code and script for VOT2020 evaluation are available now.\n\n**[April 18, 2024]**\n- We release the extended models ***\u003cu\u003eROMTrack-Tiny-256\u003c/u\u003e*** and ***\u003cu\u003eROMTrack-Small-256\u003c/u\u003e*** for Efficient Visual Tracking!\n- We provide detailed information for all versions of ROMTrack, see **Base Models** and **Extended Models** above.\n\n**[April 17, 2024]**\n- Repository Upgrade is already done! Training and Evaluation using PyTorch 2.2.0 and Python 3.8 brings more efficiency.\n- Training and Evaluation Devices for the upgraded code: RTX A6000, Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, Ubuntu 20.04.1 LTS.\n\n**[March 25, 2024]**\n- We upgrade the implementation to Python 3.8 and PyTorch 2.2.0!\n- We update results on TNL2K!\n- We update FPS metrics on RTX A6000 GPU for reference.\n\n**[March 21, 2024]**\n- We update 2 radar plots for visualization on LaSOT and LaSOT_ext.\n- We post a blog on [Zhihu](https://zhuanlan.zhihu.com/p/662351482), welcome for reading.\n\n**[October 18, 2023]**\n- We update paper in CVF Open Access version.\n- We release poster and video.\n\n**[September 21, 2023]**\n- We release Models and Raw Results of ROMTrack.\n- We refine README for more details.\n\n**[August 6, 2023]**\n- We release Code of ROMTrack.\n\n**[July 14, 2023]**\n- ROMTrack is accepted to **ICCV2023**!\n\n## :calendar: TODO\n- [x] Extended Models (Efficiency-Oriented \u0026 Performance-Oriented) for ROMTrack\n- [x] Repository Upgrade\n- [x] More Analysis (Radar Plot) and More Results (TNL2K Dataset)\n- [x] Code for ROMTrack\n- [x] Model Zoo and Raw Results\n- [x] Refine README\n\n## :star: Highlights\n### :rocket: New Tracking Framework pursing Robustness\n- ROMTrack employes a robust object modeling design which can keep the inherent information of the target template and enables mutual feature matching between the target and the search region simultaneously.\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"60%\" src=\"asset/Robust_Modeling.png\" alt=\"Robust_Modeling\"/\u003e\n\u003c/p\u003e\n\n- **Robustness Comparison** with SOTA methods (bounding box only) on VOT2020.\n  \u003cp align=\"center\"\u003e\n  \u003cimg width=\"100%\" src=\"asset/VOT2020.png\" alt=\"VOT2020\"/\u003e\n  \u003c/p\u003e\n\n### :rocket: Strong Performance and Comparable Speed\n- Performance on Benchmarks\n  \u003cp align=\"center\"\u003e\n  \u003cimg width=\"90%\" src=\"asset/Performance.png\" alt=\"Performance\"/\u003e\n  \u003c/p\u003e\n- Radar Analysis on LaSOT and LaSOT_ext\n  \u003cp align=\"center\"\u003e\n  \u003cimg width=\"49%\" src=\"asset/lasot_radar_plot_v2.png\" alt=\"LaSOT_Radar\"/\u003e\n  \u003cimg width=\"49%\" src=\"asset/lasot_ext_radar_plot.png\" alt=\"LaSOT_ext_Radar\"/\u003e\n  \u003c/p\u003e\n- Speed, MACs, Params (Test on 1080Ti)\n  \u003cp align=\"center\"\u003e\n  \u003cimg width=\"70%\" src=\"asset/Speed.png\" alt=\"Speed\"/\u003e\n  \u003c/p\u003e\n\n## :book: Install the environment\nUse the Anaconda\n```\nconda create -n romtrack python=3.8\nconda activate romtrack\nbash install_pytorch.sh\n```\n\n## :book: Data Preparation\nPut the tracking datasets in ./data. It should look like:\n   ```\n   ${ROMTrack_ROOT}\n    -- data\n        -- lasot\n            |-- airplane\n            |-- basketball\n            |-- bear\n            ...\n        -- lasot_ext\n            |-- atv\n            |-- badminton\n            |-- cosplay\n            ...\n        -- got10k\n            |-- test\n            |-- train\n            |-- val\n        -- coco\n            |-- annotations\n            |-- train2017\n        -- trackingnet\n            |-- TRAIN_0\n            |-- TRAIN_1\n            ...\n            |-- TRAIN_11\n            |-- TEST\n   ```\n## :book: Set project paths\nRun the following command to set paths for this project\n```\npython tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .\n```\nAfter running this command, you can also modify paths by editing these two files\n```\nlib/train/admin/local.py  # paths about training\nlib/test/evaluation/local.py  # paths about testing\n```\n\n## :book: Train ROMTrack\nTraining with multiple GPUs using DDP. More details of other training settings can be found at ```tracking/train_romtrack.sh```\n```\nbash tracking/train_romtrack.sh\n```\n\n## :book: Test and evaluate ROMTrack on benchmarks\n\n- LaSOT/LaSOT_ext/GOT10k-test/TrackingNet/OTB100/UAV123/NFS30. \n  - More details of test settings can be found at ```tracking/test_romtrack.sh```\n```\nbash tracking/test_romtrack.sh\n```\n\n- VOT2020. Current version is vot-toolkit(==0.5.3) and vot-trax(==3.0.3).\n  - Take ROMTrack-Large-384 below as an example.\n```\n### Evaluate ROMTrack-Large-384 with AlphaRefine\nvot evaluate --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384_AR\nvot analysis --nocache --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384_AR\n\n### Evaluate ROMTrack-Large-384 without AlphaRefine\nvot evaluate --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384\nvot analysis --nocache --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384\n```\n\n## :book: Compute FLOPs/Params and test speed\n```\nbash tracking/profile_romtrack.sh\n```\n\n## :book: Visualization\nWe provide attention maps and feature maps for several sequences on LaSOT. Detailed analysis can be found in our paper.\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"80%\" src=\"asset/Visualization.png\" alt=\"Visualization\"/\u003e\n\u003c/p\u003e\n\n## :bookmark: Acknowledgments\n* Thanks for [STARK](https://github.com/researchmm/Stark), [PyTracking](https://github.com/visionml/pytracking) and [MixFormer](https://github.com/MCG-NJU/MixFormer) Library, which helps us to quickly implement our ideas and test our performances.\n* Our implementation of the ViT is modified from the [Timm](https://github.com/rwightman/pytorch-image-models) repo.\n\n## :pencil: Citation\nIf our work is useful for your research, please feel free to star :star: and cite our paper:\n```\n@InProceedings{Cai_2023_ICCV,\n    author    = {Cai, Yidong and Liu, Jie and Tang, Jie and Wu, Gangshan},\n    title     = {Robust Object Modeling for Visual Tracking},\n    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},\n    month     = {October},\n    year      = {2023},\n    pages     = {9589-9600}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdawnyc%2FROMTrack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdawnyc%2FROMTrack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdawnyc%2FROMTrack/lists"}