https://github.com/gkordo/s2vs
Authors official PyTorch implementation of the "Self-Supervised Video Similarity Learning" [CVPRW 2023]
https://github.com/gkordo/s2vs
duplicate-videos fivr ndvr self-supervised-learning self-supervision video-detection video-retrieval video-search video-similarity video-similarity-learning video-similarity-search
Last synced: about 1 month ago
JSON representation
Authors official PyTorch implementation of the "Self-Supervised Video Similarity Learning" [CVPRW 2023]
- Host: GitHub
- URL: https://github.com/gkordo/s2vs
- Owner: gkordo
- License: mit
- Created: 2023-04-05T12:12:00.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-25T10:36:53.000Z (over 1 year ago)
- Last Synced: 2025-03-23T17:51:28.751Z (about 2 months ago)
- Topics: duplicate-videos, fivr, ndvr, self-supervised-learning, self-supervision, video-detection, video-retrieval, video-search, video-similarity, video-similarity-learning, video-similarity-search
- Language: Python
- Homepage:
- Size: 17.2 MB
- Stars: 41
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Self-Supervised Video Similarity Learning
This repository contains the PyTorch implementation of the paper
[Self-Supervised Video Similarity Learning](https://arxiv.org/abs/2304.03378).
It contains code for the training of video similarity learning network with self-supervision.
Also, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the
employed video datasets, and pre-trained models are provided.
![]()
## Prerequisites
* Python 3
* PyTorch
* Torchvision
* FFMpeg## Preparation
### Installation
* Clone this repo
```bash
$ git clone [email protected]:https://github.com/gkordo/s2vs.git
$ cd s2vs
```* Install the required packages
```bash
$ pip install -r requirements.txt
```## Training
* Extract the frames from the videos in the dataset used for training.
```bash
$ ffmpeg -nostdin -y -vf fps=1 -start_number 0 -q 0 ${video_id}/%05d.jpg -i
```* Edit [`scripts/train_ssl.sh`](scripts/train_ssl.sh) to configure the training session.
* Choose the augmentation types you want to include during training by providing the appropriate values to the
`--augmentations` argument. Provide a string that contains `GT` for Global Transformations, `FT` for Frame Transformations
`TT` for Temporal Transformations and `ViV` for Video-in-Video.* Run the script as follows
```bash
$ bash scripts/train_ssl.sh
```* Once the training is over, a `model.pth` file will have been created in a path based on the provided `experiment_path` argument.
## Evaluation
* Download the datasets from the original sources:
* [FIVR-5K, FIVR-200K](https://ndd.iti.gr/fivr/) - Fine-grained Incident Video Retrieval ([features](https://mever.iti.gr/s2vs/features/fivr_5k.hdf5), [features](https://mever.iti.gr/s2vs/features/fivr_200k.hdf5))
* [VCDB](https://fvl.fudan.edu.cn/dataset/vcdb/list.htm) - Video Copy Detection ([features](https://mever.iti.gr/s2vs/features/vcdb.hdf5))
* [EVVE](http://pascal.inrialpes.fr/data/evve/) - Event-based Video Retrieval ([features](https://mever.iti.gr/s2vs/features/evve.hdf5))* Determine the pattern based on the video ids that video files are stored, e.g. `{id}/video.*` if it follows the pattern:
```
Dataset_dir
├── video_id1
│ └── video.mp4
├── video_id2
│ └── video.flv
│ ⋮
└── video_idN
└── video.webm
```* Run the [`evaluation.py`](evaluation.py) script to evaluate a trained model.
```bash
$ python evaluation.py --dataset FIVR-200K --dataset_path --pattern '{id}/video.*' --model_path
```
or run the script with the provided features
```bash
$ python evaluation.py --dataset FIVR-200K --dataset_hdf5 --model_path
```* If no value is given to the `--model_path` argument, then the pretrained `s2vs_dns` model is used.
## Use our pretrained models
* Usage of the model is similar to [DnS](https://github.com/mever-team/distill-and-select#use-our-pretrained-models) and [ViSiL](https://github.com/MKLab-ITI/visil/tree/pytorch#use-visil-in-your-python-code)* Load our pretrained models as follows:
```python
import torchfeat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC')
s2vs_dns = torch.hub.load('gkordo/s2vs:main', 's2vs_dns')
s2vs_vcdb = torch.hub.load('gkordo/s2vs:main', 's2vs_vcdb')
```## Citation
If you use this code for your research, please consider citing our papers:
```bibtex
@inproceedings{kordopatis2023s2vs,
title={Self-Supervised Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Tolias, Giorgos and Tzelepis, Christos and Kompatsiaris, Ioannis and Patras, Ioannis and Papadopoulos, Symeon},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year={2023}
}@inproceedings{kordopatis2019visil,
title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2019}
}
```## Visualization
For visualization examples of augmentation and similarity matrices, as well as model usage in code,
have a look at [this Colab notebook](https://colab.research.google.com/drive/18vFs15sJZQ_MxePYRdwVeHUqcmF8Tlp0).## Related Projects
**[DnS](https://github.com/mever-team/distill-and-select)** - computational efficiency w/ selector network
**[ViSiL](https://github.com/MKLab-ITI/visil)** - original ViSiL approach
**[FIVR-200K](https://github.com/MKLab-ITI/FIVR-200K)** - download our FIVR-200K dataset
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details## Contact for further details
Giorgos Kordopatis-Zilos ([email protected])