{"id":16907969,"url":"https://github.com/xvjiarui/vfs","last_synced_at":"2025-06-23T19:38:43.145Z","repository":{"id":86501892,"uuid":"351519713","full_name":"xvjiarui/VFS","owner":"xvjiarui","description":"Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective, in ICCV 2021 (Oral)","archived":false,"fork":false,"pushed_at":"2021-12-16T08:49:11.000Z","size":142575,"stargazers_count":145,"open_issues_count":4,"forks_count":11,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-02-27T19:34:17.468Z","etag":null,"topics":["correspondence","pytorch","representation-learning","self-su","video"],"latest_commit_sha":null,"homepage":"https://jerryxu.net/VFS/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xvjiarui.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-03-25T17:24:36.000Z","updated_at":"2025-02-26T15:45:21.000Z","dependencies_parsed_at":"2023-03-12T12:00:50.921Z","dependency_job_id":null,"html_url":"https://github.com/xvjiarui/VFS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xvjiarui%2FVFS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xvjiarui%2FVFS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xvjiarui%2FVFS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xvjiarui%2FVFS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xvjiarui","download_url":"https://codeload.github.com/xvjiarui/VFS/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243848028,"owners_count":20357483,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["correspondence","pytorch","representation-learning","self-su","video"],"created_at":"2024-10-13T18:49:33.984Z","updated_at":"2025-03-17T07:30:42.480Z","avatar_url":"https://github.com/xvjiarui.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective\n\nThis repository is the official implementation for VFS introduced in the paper:\n\n[**Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective**](https://arxiv.org/abs/2103.17263)\n\u003cbr\u003e\n[*Jiarui Xu*](https://jerryxu.net), [*Xiaolong Wang*](https://xiaolonw.github.io/)\n\u003cbr\u003e\nICCV 2021 (**Oral**)\n\nThe project page with video is at [https://jerryxu.net/VFS/](https://jerryxu.net/VFS/).\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"figs/vfs.gif\" width=\"75%\"\u003e\n\u003c/div\u003e\n\n## Citation\n\nIf you find our work useful in your research, please cite:\n\n```latex\n@article{xu2021rethinking,\n  title={Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective},\n  author={Xu, Jiarui and Wang, Xiaolong},\n  journal={arXiv preprint arXiv:2103.17263},\n  year={2021}\n}\n```\n\n## Environmental Setup\n\n* Python 3.7\n* PyTorch 1.6-1.8\n* mmaction2\n* davis2017-evaluation\n* got10k\n\nThe codebase is implemented based on the awesome [MMAction2](https://github.com/open-mmlab/mmaction2), please follow the [install instruction](https://mmaction2.readthedocs.io/en/latest/install.html) of MMAction2 to setup the environment.\n\nQuick start full script:\n\n```shell\nconda create -n vfs python=3.7 -y\nconda activate vfs\nconda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge\npip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html\n# install customized evaluation API for DAVIS\npip install git+https://github.com/xvjiarui/davis2017-evaluation\n# install evaluation API for OTB\npip install got10k\n\n# install VFS\ngit clone https://github.com/xvjiarui/VFS/\ncd VFS\npip install -e .\n```\n\nWe also provide the Dockerfile under `docker/` folder.\n\n**The code is developed and tested based on PyTorch 1.6-1.8.  It also runs smoothly with PyTorch 1.9 but the accuracy is slightly worse for OTB evaluation. Please feel free to open a PR if you find the reason.**\n\n## Model Zoo\n\n### Fine-grained correspondence\n\n\u003cp float=\"left\"\u003e\n\u003cimg src=\"figs/paragliding.gif\" width=\"49%\"\u003e\n\u003cimg src=\"figs/soapbox.gif\" width=\"49%\"\u003e\n\u003c/p\u003e\n\n| Backbone  | Config                                              | J\u0026F-Mean | J-Mean | F-Mean | Download                                                                                                                   | Inference cmd                                                                                                                                                                                                                                                          |\n| --------- | --------------------------------------------------- | -------- | ------ | ------ | -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ResNet-18 | [cfg](configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py) | 66.7     | 64.0   | 69.5   | [pretrain ckpt](https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth) | \u003cdetails\u003e\u003csummary\u003ecmd\u003c/summary\u003e`./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1  --eval davis --options test_cfg.save_np=True`\u003c/details\u003e |\n| ResNet-50 | [cfg](configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py) | 69.5     | 67.0   | 72.0   | [pretrain ckpt](https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth) | \u003cdetails\u003e\u003csummary\u003ecmd\u003c/summary\u003e`./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1  --eval davis --options test_cfg.save_np=True`\u003c/details\u003e |\n\nNote: We report the accuracy of the last block in res4, to evaluate all blocks, please pass `--options test_cfg.all_blocks=True`.\nThe reproduced performance in this repo is slightly higher than reported in the paper.\n\n### Object-level correspondence\n\n\u003cp float=\"left\"\u003e\n\u003cimg src=\"figs/mountainbike.gif\" width=\"33%\"\u003e\n\u003cimg src=\"figs/deer.gif\" width=\"33%\"\u003e\n\u003cimg src=\"figs/jogging.gif\" width=\"33%\"\u003e\n\u003c/p\u003e\n\n| Backbone  | Config                                           | Precision | Success | Download                                                                                                                | Inference cmd                                                                                                                                                                                                                                                                                            |\n| --------- | ------------------------------------------------ | --------- | ------- | ----------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| ResNet-18 | [cfg](configs/r18_sgd_cos_100e_r2_1xNx8_k400.py) | 70.0      | 52.3    | [tracking ckpt](https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-99e2f7cd.pth) | \u003cdetails\u003e\u003csummary\u003ecmd\u003c/summary\u003e`python projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth`\u003c/details\u003e                                               |\n| ResNet-50 | [cfg](configs/r50_sgd_cos_100e_r5_1xNx2_k400.py) | 73.9      | 52.5    | [tracking ckpt](https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r5_1xNx2_k400-b7fb2a38.pth) | \u003cdetails\u003e\u003csummary\u003ecmd\u003c/summary\u003e`python projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r2_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048`\u003c/details\u003e |\n\nNote: We fine-tune an extra linear layer.\nThe reproduced performance in this repo is slightly higher than reported in the paper.\n\n## Data Preparation\n\nWe use [Kinetics-400](https://github.com/cvdfoundation/kinetics-dataset) for self-supervised correspondence pretraining.\n\nThe fine-grained correspondence is evaluated on [DAVIS2017](https://davischallenge.org/davis2017/code.html) w/o any fine-tuning.\n\nThe object-level correspondence is evaluated on [OTB-100](http://cvlab.hanyang.ac.kr/tracker_benchmark/index.html) under linear probing setting (fine-tuning an extra linear layer).\n\nThe overall file structure is as followed:\n\n```shell\nvfs\n├── mmaction\n├── tools\n├── configs\n├── data\n│   ├── kinetics400\n│   │   ├── videos_train\n│   │   │   ├── kinetics400_train_list_videos.txt\n│   │   │   ├── train\n│   │   │   │   ├── abseiling/\n│   │   │   │   ├── air_drumming/\n│   │   │   │   ├── ...\n│   │   │   │   ├── yoga/\n│   │   │   │   ├── zumba/\n│   ├── davis\n│   │   ├── DAVIS\n│   │   │   ├── Annotations\n│   │   │   │   ├── 480p\n│   │   │   │   │   ├── bike-packing/\n│   │   │   │   │   ├── ...\n│   │   │   │   │   ├── soapbox/\n│   │   │   ├── ImageSets\n│   │   │   │   ├── 2017/\n│   │   │   │   ├── davis2017_val_list_rawframes.txt\n│   │   │   ├── JPEGImages\n│   │   │   │   ├── 480p\n│   │   │   │   │   ├── bike-packing/\n│   │   │   │   │   ├── ...\n│   │   │   │   │   ├── soapbox/\n│   ├── otb\n│   │   ├── Basketball/\n│   │   ├── ...\n│   │   ├── Woman/\n│   ├── GOT-10k\n│   │   ├── train\n│   │   │   ├── GOT-10k_Train_000001/\n│   │   │   ├── ...\n│   │   │   ├── GOT-10k_Train_009335/\n```\n\nThe instructions for preparing each dataset are as followed.\n\n### Kinetics-400\n\nPlease follow the documentation [here](https://mmaction2.readthedocs.io/en/latest/supported_datasets.html#kinetics-400-600-700) to prepare the Kinetics-400. The dataset could be downloaded from [kinetics-dataset](https://github.com/cvdfoundation/kinetics-dataset).\n\n### DAVIS2017\n\nDAVIS2017 dataset could be downloaded from the [official website](https://davischallenge.org/davis2017/code.html). We use the 480p validation set for evaluation.\n\n```shell\n# download data\nwget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip\n# download filelist\nwget https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/davis2017_val_list_rawframes.txt\n```\n\nThen please unzip and place them according to the file structure above.\n\n### OTB-100\n\nThe OTB-100 frames and annotations will be downloaded automatically.\n\n### GOT-10k\n\nGOT-10k dataset could be downloaded from the [official website](http://got-10k.aitestunion.com/downloads).\n\nThen please unzip and place them according to the file structure above.\n\n## Run Experiments\n\n### Pretrain\n\n```shell\n./tools/dist_train.sh ${CONFIG} ${GPUS}\n```\n\nWe use 2 and 8 GPUs for ResNet-18 and ResNet-50 models respectively.\n\n### Inference\n\nTo run the following inference and evaluation, we need to convert the pretrained checkpoint into the same format as torchvision [ResNet](https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py).\n\n```shell\npython tools/convert_weights/convert_to_pretrained.py ${PRETRAIN_CHECKPOINT} ${BACKBONE_WEIGHT}\n```\n\n#### Evaluate fine-grained correspondence on DAVIS2017\n\n```shell\n./tools/dist_test.sh ${CONFIG} ${BACKBONE_WEIGHT} ${GPUS}  --eval davis\n```\n\nYou may pass `--options test_cfg.save_np=True` to save memory.\n\nInference cmd examples:\n\n```shell\n# testing r18 model\n./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1  --eval davis --options test_cfg.save_np=True\n# testing r50 model\n./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1  --eval davis --options test_cfg.save_np=True\n```\n\n#### Evaluate object-level correspondence\n\nResNet-18:\n\n```shell\n python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT}\n```\n\nResNet-50:\n\n```shell\n python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT} --options out_scale=0.00001 out_channels=2048\n```\n\nThe results will be saved in `work_dirs/${CONFIG}/siamfc`.\n\nTo inference with provided tracking checkpoints:\n\n```shell\n python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --checkpoint ${TRACKING_CHECKPOINT}\n```\n\nInference cmd examples:\n\n```shell\n# testing r18 model\npython projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth\n# testing r50 model\npython projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r5_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048\n```\n\n## Acknowledgements\n\nThe codebase is based on [MMAction2](https://github.com/open-mmlab/mmaction2).\nThe fine-grained correspondence inference and evaluation follows [TimeCycle](https://github.com/xiaolonw/TimeCycle), [UVC](https://github.com/Liusifei/UVC) and [videowalk](https://github.com/ajabri/videowalk).\nThe object-level correspondence inference and evaluation is based on [SiamFC-PyTorch](https://github.com/huanglianghua/siamfc-pytorch) and [vince](https://github.com/danielgordon10/vince).\n\nThank you all for the great open source repositories!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxvjiarui%2Fvfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxvjiarui%2Fvfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxvjiarui%2Fvfs/lists"}