{"id":19932173,"url":"https://github.com/amazon-science/video-contrastive-learning","last_synced_at":"2025-05-03T11:31:33.285Z","repository":{"id":39622049,"uuid":"390864670","full_name":"amazon-science/video-contrastive-learning","owner":"amazon-science","description":"Video Contrastive Learning with Global Context, ICCVW 2021","archived":false,"fork":false,"pushed_at":"2022-05-30T16:01:15.000Z","size":203,"stargazers_count":147,"open_issues_count":3,"forks_count":16,"subscribers_count":9,"default_branch":"main","last_synced_at":"2023-11-07T19:21:42.039Z","etag":null,"topics":["computer-vision","contrastive-learning","iccv-2021","self-supervised-learning","video-understanding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-29T22:25:58.000Z","updated_at":"2023-09-07T19:38:38.000Z","dependencies_parsed_at":"2022-08-23T13:20:09.014Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/video-contrastive-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fvideo-contrastive-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fvideo-contrastive-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fvideo-contrastive-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Fvideo-contrastive-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/video-contrastive-learning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224360230,"owners_count":17298319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","contrastive-learning","iccv-2021","self-supervised-learning","video-understanding"],"created_at":"2024-11-12T23:09:18.092Z","updated_at":"2024-11-12T23:09:18.647Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Video Contrastive Learning with Global Context (VCLR)\n\nThis is the official PyTorch implementation of our [VCLR paper](https://arxiv.org/abs/2108.02722).\n\n```\n@article{kuang2021vclr,\n  title={Video Contrastive Learning with Global Context},\n  author={Haofei Kuang, Yi Zhu, Zhi Zhang, Xinyu Li, Joseph Tighe, Sören Schwertfeger, Cyrill Stachniss, Mu Li},\n  journal={arXiv preprint arXiv:2108.02722},\n  year={2021}\n}\n```\n\n\n## Install dependencies\n- environments\n  ```shell\n  conda create --name vclr python=3.7\n  conda activate vclr\n  conda install numpy scipy scikit-learn matplotlib scikit-image\n  pip install torch==1.7.1 torchvision==0.8.2\n  pip install opencv-python tqdm termcolor gcc7 ffmpeg tensorflow==1.15.2\n  pip install mmcv-full==1.2.7\n  ```\n\n\n## Prepare datasets\nPlease refer to [PREPARE_DATA](PREPARE_DATA.md) to prepare the datasets.\n\n\n## Prepare pretrained MoCo weights\nIn this work, we follow [SeCo](https://arxiv.org/abs/2008.00975) and use the pretrained weights of [MoCov2](https://github.com/facebookresearch/moco) as initialization.\n\n```shell\ncd ~\ngit clone https://github.com/amazon-research/video-contrastive-learning.git\ncd video-contrastive-learning\nmkdir pretrain \u0026\u0026 cd pretrain\nwget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_200ep/moco_v2_200ep_pretrain.pth.tar\ncd ..\n```\n\n\n## Self-supervised pretraining\n\n```shell\nbash shell/main_train.sh\n```\nCheckpoints will be saved to `./results`\n\n\n## Downstream tasks\n\n### Linear evaluation\nIn order to evaluate the effectiveness of self-supervised learning, we conduct a linear evaluation (probing) on Kinetics400 dataset. Basically, we first extract features from the pretrained weight and then train a SVM classifier to see how the learned features perform.\n\n```shell\nbash shell/eval_svm.sh\n```\n\n- Results\n\n  | Arch | Pretrained dataset | Epoch | Pretrained model | Acc. on K400 |\n  | :------: | :-----: | :-----: | :-----: | :-----: |\n  | ResNet50 | Kinetics400 | 400 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_k400.pth) | 64.1 |\n\n\n### Video retrieval\n\n```shell\nbash shell/eval_retrieval.sh\n```\n\n- Results\n\n  | Arch | Pretrained dataset | Epoch | Pretrained model | R@1 on UCF101 | R@1 on HMDB51 |\n  | :------: | :-----: | :-----: | :-----: | :-----: | :-----: |\n  | ResNet50 | Kinetics400 | 400 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_k400.pth) | 70.6 | 35.2 |\n  | ResNet50 | UCF101 | 400 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_ucf.pth) | 46.8 | 17.6 |\n\n\n### Action recognition \u0026 action localization\n\nHere, we use mmaction2 for both tasks. If you are not familiar with mmaction2, you can read the [official documentation](https://mmaction2.readthedocs.io/en/latest/index.html).\n\n#### Installation\n- Step1: Install mmaction2\n\n  To make sure the results can be reproduced, please use our forked version of mmaction2 (version: 0.11.0):\n  ```shell\n  conda activate vclr\n  cd ~\n  git clone https://github.com/KuangHaofei/mmaction2\n\n  cd mmaction2\n  pip install -v -e .\n  ```\n- Step2: Prepare the pretrained weights\n\n  Our pretrained backbone have different format with the backbone of mmaction2, it should be transferred to mmaction2 format. We provide the transferred version of our K400 pretrained weights, [TSN](https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth) and [TSM](https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth). We also provide the script for transferring weights, you can find it [here](./tools/weights/README.md).\n\n  Moving the pretrained weights to `checkpoints` directory:\n  ```shell\n  cd ~/mmaction2\n  mkdir checkpoints\n  wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm.pth\n  wget https://haofeik-data.s3.amazonaws.com/VCLR/pretrained/vclr_mm_tsm.pth\n  ```\n\n#### Action recognition\nMake sure you have prepared the dataset and environments following the previous step. Now suppose you are in the root directory of `mmaction2`, follow the subsequent steps to fine tune the TSN or TSM models for action recognition.\n\nFor each dataset, the train and test setting can be found in the configuration files.\n\n- UCF101\n  - config file: [tsn_ucf101.py](https://github.com/KuangHaofei/mmaction2/blob/master/configs/recognition/tsn/vclr/tsn_ucf101.py)\n  - train command:\n    ```shell\n    ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_ucf101.py 8 \\\n      --validate --seed 0 --deterministic\n    ```\n  - test command:\n    ```shell\n    python tools/test.py configs/recognition/tsn/vclr/tsn_ucf101.py \\\n      work_dirs/vclr/ucf101/latest.pth \\\n      --eval top_k_accuracy mean_class_accuracy --out result.json\n    ```\n\n- HMDB51\n  - config file: [tsn_hmdb51.py](https://github.com/KuangHaofei/mmaction2/blob/master/configs/recognition/tsn/vclr/tsn_hmdb51.py)\n  - train command:\n    ```shell\n    ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_hmdb51.py 8 \\\n      --validate --seed 0 --deterministic\n    ```\n  - test command:\n    ```shell\n    python tools/test.py configs/recognition/tsn/vclr/tsn_hmdb51.py \\\n      work_dirs/vclr/hmdb51/latest.pth \\\n      --eval top_k_accuracy mean_class_accuracy --out result.json\n    ```\n\n- SomethingSomethingV2: TSN\n  - config file: [tsn_sthv2.py](https://github.com/KuangHaofei/mmaction2/blob/master/configs/recognition/tsn/vclr/tsn_sthv2.py)\n  - train command:\n    ```shell\n    ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_sthv2.py 8 \\\n      --validate --seed 0 --deterministic\n    ```\n  - test command:\n    ```shell\n    python tools/test.py configs/recognition/tsn/vclr/tsn_sthv2.py \\\n      work_dirs/vclr/tsn_sthv2/latest.pth \\\n      --eval top_k_accuracy mean_class_accuracy --out result.json\n    ```\n- SomethingSomethingV2: TSM\n  - config file: [tsm_sthv2.py](https://github.com/KuangHaofei/mmaction2/blob/master/configs/recognition/tsm/vclr/tsm_sthv2.py)\n  - train command:\n    ```shell\n    ./tools/dist_train.sh configs/recognition/tsm/vclr/tsm_sthv2.py 8 \\\n      --validate --seed 0 --deterministic\n    ```\n  - test command:\n    ```shell\n    python tools/test.py configs/recognition/tsm/vclr/tsm_sthv2.py \\\n      work_dirs/vclr/tsm_sthv2/latest.pth \\\n      --eval top_k_accuracy mean_class_accuracy --out result.json\n    ```\n\n- ActivityNet\n  - config file: [tsn_activitynet.py](https://github.com/KuangHaofei/mmaction2/blob/master/configs/recognition/tsn/vclr/tsn_activitynet.py)\n  - train command:\n    ```shell\n    ./tools/dist_train.sh configs/recognition/tsn/vclr/tsn_activitynet.py 8 \\\n      --validate --seed 0 --deterministic\n    ```\n  - test command:\n    ```shell\n    python tools/test.py configs/recognition/tsn/vclr/tsn_activitynet.py \\\n      work_dirs/vclr/tsn_activitynet/latest.pth \\\n      --eval top_k_accuracy mean_class_accuracy --out result.json\n    ```\n\n- Results\n\n  | Arch | Dataset | Finetuned model | Acc. |\n  | :------: | :-----: | :-----: | :-----: |\n  | TSN | UCF101 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_recognition/mm_ucf_tsn.pth) | 85.6 |\n  | TSN | HMDB51 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_recognition/mm_hmdb_tsn.pth) | 54.1 |\n  | TSN | SomethingSomethingV2 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_recognition/mm_sthv2_tsn.pth) | 33.3 |\n  | TSM | SomethingSomethingV2 | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_recognition/mm_sthv2_tsm.pth) | 52.0 |\n  | TSN | ActivityNet | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_recognition/mm_anet_tsn.pth) | 71.9 |\n\n\n#### Action localization\n- Step 1: Follow the previous section, suppose the finetuned model is saved at `work_dirs/vclr/tsn_activitynet/latest.pth`\n\n- Step 2: Extract ActivityNet features\n  ```shell\n  cd ~/mmaction2/tools/data/activitynet/\n\n  python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \\\n    --data-list /home/ubuntu/data/ActivityNet/anet_train_video.txt \\\n    --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \\\n    --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth\n\n  python tsn_feature_extraction.py --data-prefix /home/ubuntu/data/ActivityNet/rawframes \\\n    --data-list /home/ubuntu/data/ActivityNet/anet_val_video.txt \\\n    --output-prefix /home/ubuntu/data/ActivityNet/rgb_feat \\\n    --modality RGB --ckpt /home/ubuntu/mmaction2/work_dirs/vclr/tsn_activitynet/latest.pth\n\n  python activitynet_feature_postprocessing.py \\\n    --rgb /home/ubuntu/data/ActivityNet/rgb_feat \\\n    --dest /home/ubuntu/data/ActivityNet/mmaction_feat\n  ```\n  Note, the root directory of ActivityNey is `/home/ubuntu/data/ActivityNet/` in our case. Please replace it according to your real directory.\n\n- Step 3: Train and test the BMN model\n  - train\n    ```shell\n    cd ~/mmaction2\n    ./tools/dist_train.sh configs/localization/bmn/bmn_acitivitynet_feature_vclr.py 2 \\\n      --work-dir work_dirs/vclr/bmn_activitynet --validate --seed 0 --deterministic --bmn\n    ```\n  - test\n    ```shell\n    python tools/test.py configs/localization/bmn/bmn_acitivitynet_feature_vclr.py \\\n      work_dirs/vclr/bmn_activitynet/latest.pth \\\n      --bmn --eval AR@AN --out result.json\n    ```\n\n- Results\n\n  | Arch | Dataset | Finetuned model | AUC | AR@100 |\n  | :------: | :-----: | :-----: | :-----: | :-----: |\n  | BMN | ActivityNet | [Download link](https://haofeik-data.s3.amazonaws.com/VCLR/action_localization/mm_anet_bmn.pth) | 65.5 | 73.8 |\n\n\n## Feature visualization\n\nWe provide our feature visualization code at [here](./tools/feature_visualization/README.md).\n\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n\n## License\n\nThis project is licensed under the Apache-2.0 License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fvideo-contrastive-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Fvideo-contrastive-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Fvideo-contrastive-learning/lists"}