Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://ldynx.github.io/SAVE/
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://ldynx.github.io/SAVE/
- Owner: ldynx
- License: apache-2.0
- Created: 2023-12-14T06:05:00.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-22T06:46:56.000Z (about 2 months ago)
- Last Synced: 2024-11-22T07:28:05.927Z (about 2 months ago)
- Language: Python
- Size: 15.6 MB
- Stars: 25
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-Segment-Anything - [code
README
# SAVE: Protagonist Diversification with Structure Agnostic Video Editing (ECCV 2024)
This repository contains the official implementation of
[SAVE: Protagonist Diversification with Structure Agnostic Video Editing](https://arxiv.org/abs/2312.02503).[![Project Website](https://img.shields.io/badge/Project-Website-orange)](https://ldynx.github.io/SAVE/)
[![arXiv 2312.02503](https://img.shields.io/badge/arXiv-2312.02503-red)](https://arxiv.org/abs/2312.02503)## Teaser
🐱 A cat is roaring ➜ 🐶 A dog is < Smot > / 🐯 A tiger is < Smot >
😎 A man is skiing ➜ 🐻 A bear is < Smot > / 🐭 Mickey-Mouse is < Smot >
SAVE reframes the video editing task as a motion inversion problem, seeking to find the motion word < Smot > in textual embedding space to well represent the motion in a source video. The video editing task can be achieved by isolating the motion from a single source video with < Smot > and then modifying the protagonist accordingly.## Setup
### Requirements
```
pip install -r requirements.txt
```### Weights
We use [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) as our base text-to-image model and fine-tune it on a reference video for text-to-video generation. Example video weights are available at [GoogleDrive](https://drive.google.com/drive/folders/1ytqzQ7aKBiiSQxDSbDPn2i-6zwdbUFsw).### Training
To fine-tune the text-to-image diffusion models on a custom video, run this command:
```
python run_train.py --config configs/-train.yaml
```
Configuration file `-train.yaml` contains the following arguments:
* `output_dir` - Directory to save the weights.
* `placeholder_tokens` - Pseudo words separated by `|` e.g., `|`.
* `initializer_tokens` - Initialization words separated by `|` e.g., `cat|roaring`.
* `sentence_component` - Use `` for appearance words and `` for motion words e.g., `|`.
* `num_s1_train_epochs` - Number of epochs for appearance pre-registration.
* `exp_localization_weight` - Weight for the cross-attention loss (recommended range is 1e-4 to 5e-4).
* `train_data: video_path` - Path to the source video.
* `train_data: prompt` - Source prompt that includes the pseudo words in `placeholder_tokens` e.g., `a cat is `.
* `n_sample_frames` - Number of frames.## Video Editing
Once the updated weights are prepared, run this command:
```
python run_inference.py --config configs/-inference.yaml
```
Configuration file `-inference.yaml` contains the following arguments:
* `pretrained_model_path` - Directory to the saved weights.
* `image_path` - Path to the source video.
* `placeholder_tokens` - Pseudo words separated by `|` e.g., `|`.
* `sentence_component` - Use `` for appearance words and `` for motion words e.g., `|`.
* `prompt` - Source prompt that includes the pseudo words in `placeholder_tokens` e.g., `a cat is `.
* `prompts` - List of source and editing prompts e.g., [`a cat is `, `a dog is `].
* `blend_word` - List of protagonists in the source and edited videos e.g., [`cat`, `dog`].## Citation
```
@inproceedings{song2025save,
title={Save: Protagonist diversification with structure agnostic video editing},
author={Song, Yeji and Shin, Wonsik and Lee, Junsoo and Kim, Jeesoo and Kwak, Nojun},
booktitle={European Conference on Computer Vision},
pages={41--57},
year={2025},
organization={Springer}
}
```## Acknowledgements
This code builds upon [diffusers](https://github.com/huggingface/diffusers), [Tune-A-Video](https://github.com/showlab/Tune-A-Video) and [Video-P2P](https://github.com/dvlab-research/Video-P2P). Thank you for open-sourcing!