{"id":22067564,"url":"https://github.com/jovanavidenovic/DAM4SAM","last_synced_at":"2025-07-24T04:31:41.891Z","repository":{"id":264955434,"uuid":"894538611","full_name":"jovanavidenovic/DAM4SAM","owner":"jovanavidenovic","description":"Official Implementation of the paper: \"A Distractor-Aware Memory for Visual Object Tracking with SAM2\"","archived":false,"fork":false,"pushed_at":"2024-11-26T23:55:08.000Z","size":3174,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-11-27T00:28:17.124Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jovanavidenovic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-26T14:31:10.000Z","updated_at":"2024-11-26T23:55:11.000Z","dependencies_parsed_at":"2024-11-27T00:38:24.940Z","dependency_job_id":null,"html_url":"https://github.com/jovanavidenovic/DAM4SAM","commit_stats":null,"previous_names":["jovanavidenovic/dam4sam"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovanavidenovic%2FDAM4SAM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovanavidenovic%2FDAM4SAM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovanavidenovic%2FDAM4SAM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jovanavidenovic%2FDAM4SAM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jovanavidenovic","download_url":"https://codeload.github.com/jovanavidenovic/DAM4SAM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227421056,"owners_count":17774999,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-30T20:03:19.126Z","updated_at":"2025-07-24T04:31:41.879Z","avatar_url":"https://github.com/jovanavidenovic.png","language":null,"readme":"\u003cdiv align=\"center\"\u003e\n\n# A Distractor-Aware Memory (DAM) for \u003cbr\u003e Visual Object Tracking with SAM2 [CVPR, 2025]\n\n[Jovana Videnović](https://www.linkedin.com/in/jovana-videnovi%C4%87-5a5b08169/), [Alan Lukežič](https://www.vicos.si/people/alan_lukezic/), and [Matej Kristan](https://www.vicos.si/people/matej_kristan/)\n\nFaculty of Computer and Information Science, University of Ljubljana\n\n[[`Preprint`](https://arxiv.org/abs/2411.17576)]  [[`Project page`](https://jovanavidenovic.github.io/dam-4-sam/) ] [[`DiDi dataset`](#didi-a-distractor-distilled-dataset)]\n\n\n\nhttps://github.com/user-attachments/assets/ecfc1e20-0463-4841-876d-2202acc93f77\n\n\n\n\n\n\u003c/div\u003e\n\n## Abstract\nMemory-based trackers such as SAM2 demonstrate remarkable performance, however still struggle with distractors. We propose a new plug-in distractor-aware memory (DAM) and management strategy that substantially improves tracking robustness. The new model is demonstrated on SAM2.1, leading to DAM4SAM, which sets a new state-of-the-art on six benchmarks, including the most challenging VOT/S benchmarks without additional training. We also propose a new distractor-distilled (DiDi) dataset to better study the distractor problem. See the [preprint](https://arxiv.org/abs/2411.17576) for more details.\n\n## Installation\n\nTo set up the repository locally, follow these steps:\n\n1. Clone the repository and navigate to the project directory:\n    ```bash\n    git clone https://github.com/jovanavidenovic/DAM4SAM.git\n    cd DAM4SAM\n    ```\n2. Create a new conda environment and activate it:\n   ```bash\n    conda create -n dam4sam_env python=3.10.15\n    conda activate dam4sam_env\n    ```\n3. Install torch and other dependencies:\n   ```bash\n   pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121\n   pip install -r requirements.txt\n   ```\n\nIf you experience problems as mentioned here, including `ImportError: cannot import name '_C' from 'sam2'`, run the following command in the repository root:\n    ```\n    python setup.py build_ext --inplace\n    ```\nNote that you can still use the repository even with the warning above, but some postprocessing SAM2 steps may be skipped. For more information, consult [SAM2 installation instructions]().\n\n## Getting started\n\nModel checkpoints can be downloaded by running:\n```bash\ncd checkpoints \u0026\u0026 \\\n./download_ckpts.sh \n```\n\nOur model configs are available in `sam2/` folder. \n\n## Running and evaluation\n\nThis repository supports evaluation on the following datasets: DiDi, VOT2020, VOT2022, LaSot, LaSoText and GoT-10k. Support for running on VOTS2024 will be added soon. \n\n### A quick demo\n\nA demo script `run_bbox_example.py` is provided to quickly run the tracker on a given directory containing a sequence of frames. The script first asks user to draw an init bounding box, which is used to automatically estimate a segmentation mask on an init frame. The script is run using the following command:\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_bbox_example.py --dir \u003cframes-dir\u003e --ext \u003cframe-ext\u003e --output_dir \u003coutput-dir\u003e\n```\n`\u003cframes-dir\u003e` is a path to the directory containing a sequence of frames, `\u003cframe-ext\u003e` is a frame extension e.g., jpg, png, etc. (this is an optional argument, default: jpg), `\u003coutput-dir\u003e` is a path to the output directory, where predicted segmentation masks for all frames will be saved. The `--output_dir` is an optional argument, if not given, the script will just visualize the results.\n\n### DiDi dataset\n\nRun on a single sequence and visualize results:\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_on_didi.py --dataset_path \u003cpath-to-didi\u003e --sequence \u003csequence-name\u003e\n```\n\nRun on the whole dataset and save results to disk:\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_on_didi.py --dataset_path \u003cpath-to-didi\u003e --output_dir \u003coutput-dir-path\u003e\n```\n\nAfter obtaining the raw results on DiDi using previous command, you can compute performance measures. This is done using the VOT toolkit. We thus provide the empty vot workspace in the `didi-workspace` directory. The sequences from DiDi dataset should be placed into the `didi-workspace/sequences` directory. Alternatively, you can just create a symbolic link named `sequences` in the `didi-workspace`, pointed to the DiDi dataset on your disk. The raw results must be placed in the `results` subfolder, e.g. `didi-workspace/results/DAM4SAM`. If the results were obtained using `run_on_didi.py` you should move them to the workspace using the following command:\n\n```bash\npython move_didi_results.py --dataset_path \u003cpath-to-didi\u003e --src \u003csource-results-directory\u003e --dst ./didi-workspace/results/DAM4SAM\n```\n\nThe `\u003csource-results-directory\u003e` is the path to the directory used as `output_dir` argument in `run_on_didi.py` script. The `move_didi_results.py` script does not only move the results, but also convert them into bounding boxes since DiDi is a bounding box dataset. Finally, the performance measures are computed using the following commands:\n\n```bash\nvot analysis --workspace \u003cpath-to-didi-workspace\u003e --format=json DAM4SAM\nvot report --workspace \u003cpath-to-didi-workspace\u003e --format=html DAM4SAM\n```\n\nPerformance measures are available in the generated report under `didi-workspace/reports`. Note: if running the analysis multiple times, remember to clear the `cache` directory. \n\n### VOT2020 and VOT2022 Challenges\n\nCreate VOT workspace (for more info see instructions [here](https://www.votchallenge.net/howto/)). For VOT2020 use:\n```bash\nvot initialize vot2020/shortterm --workspace \u003cworkspace-dir-path\u003e\n```\nand for VOT2022 use:\n```bash\nvot initialize vot2022/shortterm --workspace \u003cworkspace-dir-path\u003e\n```\n\nYou can use integration files from `vot_integration/vot2022_st` folder to run only on the selected experiment. We provided two stack files: one for the baseline and one for the real-time experiments. After workspace creation and tracker integration you can evaluate the tracker on VOT using the following commands:\n\n```bash\nvot evaluate --workspace \u003cpath-to-vot-workspace\u003e DAM4SAM\nvot analysis --workspace \u003cpath-to-vot-workspace\u003e --format=json DAM4SAM\nvot report --workspace \u003cpath-to-vot-workspace\u003e --format=html DAM4SAM\n```\n\n### Bounding box datasets\nRunning our tracker is supported on LaSot, LaSoText and GoT-10k datasets. Tracker is initialized with masks, which are obtained using SAM2 image predictor, from ground truth initialization bounding boxes. You can download them for all datasets at [this link](https://data.vicos.si/alanl/sam2_init_masks.zip). Before running the tracker, set the corresponding paths to the datasets and the directory with ground truth masks in dam4sam_config.yaml (in the repo root directory).\n\nRun on the whole dataset and save results to disk (arguments for the argument \u003cdataset-name\u003e can be: `got | lasot | lasot_ext`):\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_on_box_dataset.py --dataset_name=\u003cdataset-name\u003e --output_dir=\u003coutput-dir-path\u003e\n```\n\nRun on a single sequence and visualize results:\n```bash\nCUDA_VISIBLE_DEVICES=0 python run_on_box_dataset.py --dataset_name=\u003cdataset-name\u003e --sequence=\u003csequence-name\u003e\n```\n\n## Video object removal by Remove Anything\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"imgs/object-removal.png\" width=\"95%\"\u003e \u003c/p\u003e\n\nWe provide a demo for object removal in a video -- as you can see the examples on our [project page](https://jovanavidenovic.github.io/dam-4-sam/). Object removal is performed by a simple pipeline: first, using our DAM4SAM for segmenting a selected object and second using the [proPainter tool](https://github.com/sczhou/ProPainter) for object inpainting. Object removal can be performed using the following command:\n```bash\n./inpaint_object.sh \u003cframes_dir\u003e \u003coutput_dir\u003e\n```\nwhere `\u003cframes_dir\u003e` is a path to the directory with a sequence of video frames and `\u003coutput_dir\u003e` is a path to the directory where output (intermediate masks and inpainted video) will be stored. Note that the script will remove any content from the `\u003coutput_dir\u003e`. \nThe output video quality is controlled by output size using the `--resize_ratio 0.5` -- you can increase this ratio to 1 if you have enough GPU memory.\nThe pipeline goes as follows: (i) the user is required to draw a bounding box around the object that should be removed, (ii) DAM4SAM performs a binary segmentation of the selected object through the whole video and stores segmentation masks on disk (in `\u003coutput_dir\u003e`) and (iii) proPainter performs object removal using `inference_propainter.py` script. \nTo assure the correct setup, the following project directory structure should be provided:\n```bash\n├── root_dir\n│   ├── dam4sam\n│   ├── proPainter\n└── inpaint_object.sh\n```\nWhere `dam4sam` is a directory with the DAM4SAM code (this repository) and `proPainter` is a directory where [proPainter](https://github.com/sczhou/ProPainter) is checkouted. The script `inpaint_object.sh` is provided in this repository. \n\n\nhttps://github.com/user-attachments/assets/ddb4a87b-92cf-4f78-be3b-8218d75b8599\n\n\n## DiDi: A distractor-distilled dataset\nDiDi is a distractor-distilled tracking dataset created to address the limitation of low distractor presence in current visual object tracking benchmarks. To enhance the evaluation and analysis of tracking performance amidst distractors, we have semi-automatically distilled several existing benchmarks into the DiDi dataset. The dataset is available for download at [this link](https://go.vicos.si/didi).\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"imgs/didi-examples.jpg\" width=\"80%\"\u003e \u003c/p\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003ci\u003eExample frames from the DiDi dataset showing challenging distractors. Targets are denoted by green bounding boxes.\u003c/i\u003e\n\u003c/div\u003e\n\n### Experimental results on DiDi\nSee [the project page](https://jovanavidenovic.github.io/dam-4-sam/) for qualitative comparison.\n| Model         | Quality | Accuracy | Robustness |\n|---------------|---------|----------|------------|\n| TransT        | 0.465   | 0.669    | 0.678      |\n| KeepTrack     | 0.502   | 0.646    | 0.748      |\n| SeqTrack      | 0.529   | 0.714    | 0.718      |\n| AQATrack      | 0.535   | 0.693    | 0.753      |\n| AOT           | 0.541   | 0.622    | 0.852      |\n| Cutie         | 0.575   | 0.704    | 0.776      |\n| ODTrack       | 0.608   | 0.740 :1st_place_medal:\t | 0.809    |\n| SAM2.1Long    | 0.646   | 0.719    | 0.883      |\n| SAM2.1   | 0.649 :3rd_place_medal:\t | 0.720    | 0.887 :3rd_place_medal:\t |\n| SAMURAI       | 0.680 :2nd_place_medal:\t  | 0.722 :3rd_place_medal:\t   | 0.930 :2nd_place_medal:\t    |\n| **DAM4SAM** (ours) | 0.694 :1st_place_medal:\t | 0.727 :2nd_place_medal:\t | 0.944 :1st_place_medal:\t |\n\n## Acknowledgments\n\nOur work is built on top of [SAM 2](https://github.com/facebookresearch/sam2?tab=readme-ov-file) by Meta FAIR.\n\n### Citation\nPlease consider citing our paper if you found our work useful.\n\n```bibtex\n@InProceedings{dam4sam,\n  author = {Videnovic, Jovana and Lukezic, Alan and Kristan, Matej},\n  title = {A Distractor-Aware Memory for Visual Object Tracking with {SAM2}},\n  booktitle = {Comp. Vis. Patt. Recognition},\n  year = {2025}\n}\n```\n\n\n","funding_links":[],"categories":["Paper List"],"sub_categories":["Follow-up Papers"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjovanavidenovic%2FDAM4SAM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjovanavidenovic%2FDAM4SAM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjovanavidenovic%2FDAM4SAM/lists"}