{"id":22068823,"url":"https://ldynx.github.io/SAVE/","last_synced_at":"2025-07-24T07:31:06.241Z","repository":{"id":212423513,"uuid":"731458675","full_name":"ldynx/SAVE","owner":"ldynx","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-22T06:46:56.000Z","size":16380,"stargazers_count":25,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-22T07:28:05.927Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ldynx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-14T06:05:00.000Z","updated_at":"2024-11-22T06:47:01.000Z","dependencies_parsed_at":"2024-11-22T07:34:23.115Z","dependency_job_id":null,"html_url":"https://github.com/ldynx/SAVE","commit_stats":null,"previous_names":["ldynx/save"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldynx%2FSAVE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldynx%2FSAVE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldynx%2FSAVE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ldynx%2FSAVE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ldynx","download_url":"https://codeload.github.com/ldynx/SAVE/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227421331,"owners_count":17775010,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-30T20:04:23.334Z","updated_at":"2024-11-30T20:07:06.160Z","avatar_url":"https://github.com/ldynx.png","language":"Python","funding_links":[],"categories":["Paper List"],"sub_categories":["Follow-up Papers"],"readme":"# SAVE: Protagonist Diversification with \u003cU\u003eS\u003c/U\u003etructure \u003cU\u003eA\u003c/U\u003egnostic \u003cU\u003eV\u003c/U\u003eideo \u003cU\u003eE\u003c/U\u003editing (ECCV 2024)\n\nThis repository contains the official implementation of \n[\u003cU\u003eSAVE: Protagonist Diversification with Structure Agnostic Video Editing\u003c/U\u003e](https://arxiv.org/abs/2312.02503).\n\n[![Project Website](https://img.shields.io/badge/Project-Website-orange)](https://ldynx.github.io/SAVE/)\n[![arXiv 2312.02503](https://img.shields.io/badge/arXiv-2312.02503-red)](https://arxiv.org/abs/2312.02503)\n\n\n## Teaser\n\u003ch4 align=\"center\"\u003e 🐱 A cat is roaring ➜ 🐶 A dog is \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e / 🐯 A tiger is \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e \u003c/h4\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/cat_flower/cat.gif\" width=\"200\" height=\"200\"\u003e\u003cimg src=\"assets/cat_flower/Ours_dog.gif\" width=\"200\" height=\"200\"\u003e\u003cimg src=\"assets/cat_flower/Ours_tiger.gif\" width=\"200\" height=\"200\"\u003e\n\u003c/p\u003e\n\n\u003ch4 align=\"center\"\u003e 😎 A man is skiing ➜ 🐻 A bear is \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e / 🐭 Mickey-Mouse is \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e \u003c/h4\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/man-skiing/man-skiing.gif\" width=\"200\" height=\"200\"\u003e\u003cimg src=\"assets/man-skiing/Ours_bear.gif\" width=\"200\" height=\"200\"\u003e\u003cimg src=\"assets/man-skiing/Ours_Mickey-Mouse.gif\" width=\"200\" height=\"200\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cem\u003eSAVE reframes the video editing task as a motion inversion problem, seeking to find the motion word \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e in textual embedding space to well represent the motion in a source video. The video editing task can be achieved by isolating the motion from a single source video with \u003c S\u003csub\u003emot\u003c/sub\u003e \u003e and then modifying the protagonist accordingly.\u003c/em\u003e\n\u003c/p\u003e\n\n## Setup\n### Requirements\n```\npip install -r requirements.txt\n```\n\n### Weights\nWe use [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) as our base text-to-image model and fine-tune it on a reference video for text-to-video generation. Example video weights are available at [GoogleDrive](https://drive.google.com/drive/folders/1ytqzQ7aKBiiSQxDSbDPn2i-6zwdbUFsw).\n\n### Training\nTo fine-tune the text-to-image diffusion models on a custom video, run this command:\n```\npython run_train.py --config configs/\u003cvideo-name\u003e-train.yaml\n```\nConfiguration file `\u003cvideo-name\u003e-train.yaml` contains the following arguments:\n* `output_dir` - Directory to save the weights.\n* `placeholder_tokens` - Pseudo words separated by `|` e.g., `\u003cs1\u003e|\u003cs2\u003e`.\n* `initializer_tokens` - Initialization words separated by `|` e.g., `cat|roaring`.\n* `sentence_component` - Use `\u003co\u003e` for appearance words and `\u003cv\u003e` for motion words e.g., `\u003co\u003e|\u003cv\u003e`.\n* `num_s1_train_epochs` - Number of epochs for appearance pre-registration.\n* `exp_localization_weight` - Weight for the cross-attention loss (recommended range is 1e-4 to 5e-4).\n* `train_data: video_path` - Path to the source video.\n* `train_data: prompt` - Source prompt that includes the pseudo words in `placeholder_tokens` e.g., `a \u003cs1\u003e cat is \u003cs2\u003e`.\n* `n_sample_frames` - Number of frames.\n\n\n## Video Editing\nOnce the updated weights are prepared, run this command: \n```\npython run_inference.py --config configs/\u003cvideo-name\u003e-inference.yaml\n```\nConfiguration file `\u003cvideo-name\u003e-inference.yaml` contains the following arguments:\n* `pretrained_model_path` - Directory to the saved weights.\n* `image_path` - Path to the source video.\n* `placeholder_tokens` - Pseudo words separated by `|` e.g., `\u003cs1\u003e|\u003cs2\u003e`.\n* `sentence_component` - Use `\u003co\u003e` for appearance words and `\u003cv\u003e` for motion words e.g., `\u003co\u003e|\u003cv\u003e`.\n* `prompt` - Source prompt that includes the pseudo words in `placeholder_tokens` e.g., `a \u003cs1\u003e cat is \u003cs2\u003e`.\n* `prompts` - List of source and editing prompts e.g., [`a \u003cs1\u003e cat is \u003cs2\u003e`, `a dog is \u003cs2\u003e`].\n* `blend_word` - List of protagonists in the source and edited videos e.g., [`cat`, `dog`].\n\n\n## Citation\n\n```\n@inproceedings{song2025save,\n  title={Save: Protagonist diversification with structure agnostic video editing},\n  author={Song, Yeji and Shin, Wonsik and Lee, Junsoo and Kim, Jeesoo and Kwak, Nojun},\n  booktitle={European Conference on Computer Vision},\n  pages={41--57},\n  year={2025},\n  organization={Springer}\n}\n```\n\n## Acknowledgements\nThis code builds upon [diffusers](https://github.com/huggingface/diffusers), [Tune-A-Video](https://github.com/showlab/Tune-A-Video) and [Video-P2P](https://github.com/dvlab-research/Video-P2P). Thank you for open-sourcing!\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/ldynx.github.io%2FSAVE%2F","html_url":"https://awesome.ecosyste.ms/projects/ldynx.github.io%2FSAVE%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/ldynx.github.io%2FSAVE%2F/lists"}