https://github.com/sayakpaul/single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
https://github.com/sayakpaul/single-video-curation-svd
stable-video-diffusion video-curation video-filtering
Last synced: 6 months ago
JSON representation
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
- Host: GitHub
- URL: https://github.com/sayakpaul/single-video-curation-svd
- Owner: sayakpaul
- License: apache-2.0
- Created: 2023-12-26T10:44:59.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-30T07:02:50.000Z (almost 2 years ago)
- Last Synced: 2025-01-08T12:52:42.908Z (10 months ago)
- Topics: stable-video-diffusion, video-curation, video-filtering
- Language: Jupyter Notebook
- Homepage:
- Size: 669 KB
- Stars: 81
- Watchers: 3
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# single-video-curation-svd
**Educational** repository for applying the main video data curation techniques presented in the Stable Video Diffusion [paper](https://arxiv.org/abs/2311.15127). The goal is to provide an interactive resource for the community dig deeper into the curation techniques presented in the paper. As such, it doesn't guarantee precise correctness.
The curation techniques used in this repository are discussed in _Appendix C_ of the paper, in detail. These include:
* Cascaded Cut Detection
* Keyframe-Aware Clipping
* Optical Flow
* Synthetic Captioning
* Caption similarities and Aesthetics
* Text DetectionThe notebooks present these techniques on a single video file sourced from the [UCF-101 dataset](https://www.crcv.ucf.edu/data/UCF101.php).
| **Video** | **CoCa Caption** | **V-BLIP Caption** | **Zephyr-7B Caption** |
|:------------:|:-----:|:------------:|:-----:|
|  | a group of men playing basketball on a court . | the basketball player dribbles to the basket | A basketball player in a group of men dribbles towards the basket on a court. |Readers are advised to first go through Appendix C of the Stable Video Diffusion paper before referring to the notebooks.
## Setup
Below are the primary dependencies:
* PyTorch (follow the installation instructions from the [official site](https://pytorch.org/))
* `transformers`
* `opencv-python`
* `numpy`
* `ffmpeg`Other dependencies will be detailed below.
## Notebooks (should be run in order)
Clip extraction
Refer to the `video_preprocessing_clip_extraction.ipynb` notebook for this. You'd need to install the `scenedetect` library from here: https://github.com/Breakthrough/PySceneDetect. This shows both cascaded cut detection and keyframe-aware clipping. At the end of the notebook, you should expect to see different clips extracted from the provided video.
Captioning
`video_preprocessing_captioning.ipynb` presents synthetic captioning from a single video clip.
This uses three models:
* CoCa (relies on `open_clip`)
* V-BLIP (relies on [EILEV](https://github.com/yukw777/EILEV))
* Zephyr-7B (relies on `transformers`)We had to apply some corrections to `eilev` to make it work. The correction patch can be found [here](./correction_patches/corrections_eilev.patch).
Optical Flow
`video_preprocessing_optical_flow_score.ipynb` notebook shows the optical flow score computation only using the Farneback algorithm. It doesn't, however, show [RAFT](https://arxiv.org/abs/2003.12039).
Caption similarities and Aesthetics
This is straightforward and is implemented in the `video_preprocessing_similarity_aesthetics.ipynb` notebook.
Text Detection
Refer to the `video_preprocessing_text_detection.ipynb` notebook for this. We use a wrapper library called `craft_text_detector` ([repository](https://github.com/fcakyon/craft-text-detector)) for this as it provides a handy package around the CRAFT text detection model. However, to make it work, we had to do some changes. The patch can be found [here](./correction_patches/corrections_craft.patch).
## Acknowledgements
Thanks to ChatGPT for all the help.
Thanks to [Dhruv Nair](https://github.com/DN6) for his reviews.