https://github.com/amazon-science/instruct-video-to-video

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/amazon-science/instruct-video-to-video
Owner: amazon-science
License: mit-0
Created: 2023-11-25T06:00:08.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-13T12:04:31.000Z (over 2 years ago)
Last Synced: 2024-02-13T14:23:45.477Z (over 2 years ago)
Language: Python
Size: 167 MB
Stars: 43
Watchers: 3
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ## This is the code release for the ICLR2024 paper [Consistent Video-to-Video Transfer Using Synthetic Dataset](https://arxiv.org/abs/2311.00213).

![teaser](figures/teaser.png)

## Quick Links

* [Installation](#installation)

* [Video Editing](#video-editing) 🔥

* [Synthetic Video Prompt-to-Prompt Dataset](#synthetic-video-prompt-to-prompt-dataset)

* [Training](#training)

* [Create Synthetic Video Dataset](#create-synthetic-video-dataset)

## Updates

* 2024/02/13: The official synthetic data and model will not be released due to Amazon policy, but we provide a third party reproduction of the synthetic data and model weights. Please refer to this [github repo](https://github.com/cplusx/INSV2V-3rd-pty-reprod)

* 2023/11/29: We have updated paper with more comparison to recent baseline methods and updated the [comparison video](#visual-comparison-to-other-methods). Gradio demo code is uploaded.

## Installation

```bash

git clone https://github.com/amazon-science/instruct-video-to-video.git

pip install -r requirements.txt

```

NOTE: The code is tested on PyTorch 2.1.0+cu11.8 and corresponding xformers version. Any PyTorch version > 2.0 should work but please install the right corresponding xformers version.

## Video Editing

We are undergoing the model release process. Please stay tuned.

Download the [InsV2V model weights](https://github.com/cplusx/INSV2V-3rd-pty-reprod) and change the ckpt path in the following notebook.

✨🚀 This [notebook](video_edit.ipynb) provide a sample code to conduct text-based video editing.

### Download LOVEU Dataset for Testing

Please follow the instructions in the [LOVEU Dataset](https://sites.google.com/view/loveucvpr23/track4) to download the dataset. Use the following [script](insv2v_run_loveu_tgve.py) to run editing on the LOVEU dataset:

```bash

python insv2v_run_loveu_tgve.py \

    --config configs/instruct_v2v.yaml \

    --ckpt-path [PATH TO THE CHECKPOINT] \

    --data-dir [PATH TO THE LOVEU DATASET] \

    --with_optical_flow \ # use motion compensation

    --text-cfg 7.5 10 \

    --video-cfg 1.2 1.5 \

    --image-size 256 384

```

Note: you may need to try different combination of image resolution, video/text classifier free guidance scale to find the best editing results.

Example results of editing LOVEU-TGVE Dataset:

  

    

    

  

  

    

    

  

  

    

    

  

  

    

    

  

  

    

    

  

## Synthetic Video Prompt-to-Prompt Dataset

Generation pipeline of the synthetic video dataset:

![generation pipeline](figures/data_pipe.png)

Examples of the synthetic video dataset:

  

    

    

  

  

    

    

  

  

    

    

  

  

    

    

  

  

    

    

  

## Training

### Download Foundational Models

[Download](https://drive.google.com/file/d/1R9sWsnGZUa5P8IB5DDfD9eU-T9SQLsFw/view?usp=sharing) the foundational models and place them in the `pretrained_models` folder.

### Download Synthetic Video Dataset

[See download link in the third party reproduction](https://github.com/cplusx/INSV2V-3rd-pty-reprod)

### Train the Model

Put the synthetic video dataset in the `video_ptp` folder.

Run the following command to train the model:

```bash

python main.py --config configs/instruct_v2v.yaml -r # add -r to resume training if the training is interrupted

```

## Create Synthetic Video Dataset

If you want to create your own synthetic video dataset, please follow the instructions

* Download the modelscope VAE, UNet and text encoder weights from [here](https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis/tree/main)

* Replace the model path in the [`video_prompt_to_prompt.py`](video_prompt_to_prompt.py) file

```

vae_ckpt = 'VAE_PATH'

unet_ckpt = 'UNet_PATH'

text_model_ckpt = 'Text_MODEL_PATH'

```

* Download the edit prompt files from [Instruct Pix2Pix](https://github.com/timothybrooks/instruct-pix2pix). The prompt file should be `gpt-generated-prompts.jsonl`, and change the file path in the `video_prompt_to_prompt.py` accordingly. Or download the WebVid prompt edit file proposed in our paper from [To be released]().

* Run the command to generate the synthetic video dataset:

```bash

python video_prompt_to_prompt.py 

    --start [START INDEX] \

    --end [END INDEX] \

    --prompt_source [ip2p or webvid] \

    --num_sample_each_prompt [NUM SAMPLES FOR EACH PROMPT]

```

## Visual Comparison to Other Methods

https://github.com/amazon-science/instruct-video-to-video/assets/20940184/d3619652-dd75-41a0-92b4-345bbf57de40

Links to the baselines used in the video:

｜ [Tune-A-Video](https://github.com/showlab/Tune-A-Video) | [Control Video](https://github.com/thu-ml/controlvideo) | [Vid2Vid Zero](https://github.com/baaivision/vid2vid-zero) | [Video P2P](https://github.com/ShaoTengLiu/Video-P2P) ｜

｜ [TokenFlow](https://github.com/omerbt/TokenFlow) | [Render A Video](https://github.com/williamyang1991/Rerender_A_Video) | [Pix2Video](https://github.com/duyguceylan/pix2video) ｜

## Credit

The code was implemented by [Jiaxin Cheng](https://github.com/cplusx) during his internship at the AWS Shanghai Lablet.

## References

Part of the code and the foundational models are adapted from the following works:

* [Instruct Pix2Pix](https://github.com/timothybrooks/instruct-pix2pix)

* [AnimateDiff](https://github.com/guoyww/animatediff/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/amazon-science/instruct-video-to-video

Awesome Lists containing this project

README