{"id":13521102,"url":"https://github.com/goutamyg/SMAT","last_synced_at":"2025-03-31T20:30:45.364Z","repository":{"id":189009886,"uuid":"679859958","full_name":"goutamyg/SMAT","owner":"goutamyg","description":"[WACV 2024] Separable Self and Mixed Attention Transformers for Efficient Object Tracking","archived":false,"fork":false,"pushed_at":"2024-05-02T21:36:49.000Z","size":1896,"stargazers_count":29,"open_issues_count":9,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-02T05:32:37.361Z","etag":null,"topics":["mixed-attention","self-attention","single-object-tracking","visual-object-tracking","visual-tracking","wacv","wacv2024"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/goutamyg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-17T19:26:32.000Z","updated_at":"2024-10-29T02:23:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"d4e061e4-cc37-4f39-8db7-0077163eb16a","html_url":"https://github.com/goutamyg/SMAT","commit_stats":null,"previous_names":["goutamyg/smat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goutamyg%2FSMAT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goutamyg%2FSMAT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goutamyg%2FSMAT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goutamyg%2FSMAT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/goutamyg","download_url":"https://codeload.github.com/goutamyg/SMAT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246535765,"owners_count":20793317,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mixed-attention","self-attention","single-object-tracking","visual-object-tracking","visual-tracking","wacv","wacv2024"],"created_at":"2024-08-01T06:00:28.610Z","updated_at":"2025-03-31T20:30:44.090Z","avatar_url":"https://github.com/goutamyg.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["WACV 2024"],"readme":"# [Separable Self and Mixed Attention Transformers for Efficient Object Tracking](https://openaccess.thecvf.com/content/WACV2024/papers/Gopal_Separable_Self_and_Mixed_Attention_Transformers_for_Efficient_Object_Tracking_WACV_2024_paper.pdf) [WACV2024] \n# Official implementation\n![SMAT_block](assets/SMAT_block.png)\n\n## News\n**`09-04-2024`**: C++ implementation of SMAT is available [here](https://github.com/goutamyg/MVT.cpp)\n\n**`07-09-2023`**: The paper is available on [arXiv](https://arxiv.org/abs/2309.03979) now\n\n**`28-08-2023`**: The pretrained tracker model is released\n\n**`17-08-2023`**: The SMAT tracker training and inference code is released\n\n**`14-08-2023`**: The paper is accepted at WACV2024\n\n## Installation\n\nInstall the dependency packages using the environment file `smat_pyenv.yml`.\n\nGenerate the relevant files:\n```\npython tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output\n```\nAfter running this command, modify the datasets paths by editing these files\n```\nlib/train/admin/local.py  # paths about training\nlib/test/evaluation/local.py  # paths about testing\n```\n\n## Training\n\n* Set the path of training datasets in `lib/train/admin/local.py`\n* Place the pretrained backbone model under the `pretrained_models/` folder\n* For data preparation, please refer to [this](https://github.com/botaoye/OSTrack/tree/main)\n* Uncomment lines `63, 67, and 71` in the [base_backbone.py](https://github.com/goutamyg/SMAT/blob/main/lib/models/mobilevit_track/base_backbone.py) file. \nLong story short: The code is opitmized for high inference speed, hence some intermediate feature-maps are pre-computed during testing. However, these pre-computations are not feasible during training. \n* Run\n```\npython tracking/train.py --script mobilevitv2_track --config mobilevitv2_256_128x1_ep300 --save_dir ./output --mode single\n```\n* The training logs will be saved under `output/logs/` folder\n\n## Pretrained tracker model\nThe pretrained tracker model can be found [here](https://drive.google.com/drive/folders/1TindIEwu82IvtozwL4XQFrSnFE2Z6W4y)\n\n## Tracker Evaluation\n\n* Update the test dataset paths in `lib/test/evaluation/local.py`\n* Place the [pretrained tracker model](https://drive.google.com/drive/folders/1TindIEwu82IvtozwL4XQFrSnFE2Z6W4y) under `output/checkpoints/` folder \n* Run\n```\npython tracking/test.py --tracker_name mobilevitv2_track --tracker_param mobilevitv2_256_128x1_ep300 --dataset got10k_test or trackingnet or lasot\n```\n* Change the `DEVICE` variable between `cuda` and `cpu` in the `--tracker_param` file for GPU and CPU-based inference, respectively  \n* The raw results will be stored under `output/test/` folder\n\n## Tracker demo\nTo evaluate the tracker on a sample video, run\n```\npython tracking/video_demo.py --tracker_name mobilevitv2_track --tracker_param mobilevitv2_256_128x1_ep300 --videofile *path-to-video-file* --optional_box *bounding-box-annotation*\n```\n\n## Visualization of tracker output and the attention maps\n![attn_maps](assets/attn_visualization.png)\n\n## Acknowledgements\n* We use the Separable Self-Attention Transformer implementation and the pretrained `MobileViTv2` backbone from [ml-cvnets](https://github.com/apple/ml-cvnets). Thank you!\n* Our training code is built upon [OSTrack](https://github.com/botaoye/OSTrack) and [PyTracking](https://github.com/visionml/pytracking)\n* To generate the evaluation metrics for different datasets (except, server-based GOT-10k and TrackingNet), we use the [pysot-toolkit](https://github.com/StrangerZhang/pysot-toolkit)\n\n## Citation\nIf our work is useful for your research, please consider citing:\n\n```Bibtex\n@inproceedings{gopal2024separable,\n  title={Separable self and mixed attention transformers for efficient object tracking},\n  author={Gopal, Goutam Yelluru and Amer, Maria A},\n  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},\n  pages={6708--6717},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoutamyg%2FSMAT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoutamyg%2FSMAT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoutamyg%2FSMAT/lists"}