Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Vchitect/VEnhancer

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
https://github.com/Vchitect/VEnhancer

aigc-enhancement diffusion-models frame-interpolation text-to-video video-enhancement video-generation video-super-resolution video-to-video

Last synced: 4 days ago
JSON representation

Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation

Awesome Lists containing this project

README

        

VEnhancer: Generative Space-Time Enhancement
for Video Generation


Jingwen He, 
Tianfan Xue, 
Dongyang Liu, 
Xinqi Lin, 

Peng Gao, 
Dahua Lin, 
Yu Qiao, 
Wanli Ouyang, 
Ziwei Liu



The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory, 



S-Lab, Nanyang Technological University 














VEnhancer, an All-in-One generative video enhancement model that can achieve spatial super-resolution, temporal super-resolution, and video refinement for AI-generated videos.


AIGC video
+VEnhancer








:open_book: For more visual results, go checkout our project page

---

## 🔥 Update
- [2024.09.12] 😸 Release our version 2 checkpoint: **[venhancer_v2.pt](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt)** . It is less creative, but is able to generate more texture details, and has better identity preservation, which is more suitable for enhancing videos with profiles.
- [2024.09.10] 😸 Support **Multiple GPU Inference** and **tiled VAE** for temporal VAE decoding. And more stable performance for long video enhancement.
- [2024.08.18] 😸 Support enhancement for **abitrary long videos** (by spliting the videos into muliple chunks with overlaps); **Faster sampling** with only 15 steps without obvious quality loss (by setting `--solver_mode 'fast'` in the script command); Use **temporal VAE** to reduce video flickering.
- [2024.07.28] 🔥 Inference code and pretrained video enhancement model are released.
- [2024.07.10] 🤗 This repo is created.

## :astonished: Gallery

| Inputs & Results | Model Version |
| :---------- | :-: |
|Prompt: A close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece.
from [Open-Sora](https://github.com/hpcaitech/Open-Sora)|

v2
|
|Prompt: Einstein plays guitar.
from [Kling](https://kling.kuaishou.com/en)|
v2
|
|Prompt: A girl eating noodles.
from [Kling](https://kling.kuaishou.com/en)|
v2
|
|Prompt: A little brick man visiting an art gallery.

from [Kling](https://kling.kuaishou.com/en) |
v1
|

## 🎬 Overview
VEnhancer achieves spatial super-resolution, temporal super-resolution (i.e, frame interpolation), and video refinement in **one model**.
It is flexible to adapt to different upsampling factors (e.g., 1x~8x) for either spatial or temporal super-resolution. Besides, it provides flexible control to modify the refinement strength for handling diversified video artifacts.

It follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network.
This **video ControlNet** accepts both low-resolution key frames and full frames of noisy latents as inputs.
Also, the noise level $\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning through our proposed **video-aware conditioning** apart from timestep $t$ and prompt $c_{text}$.

## :gear: Installation
```shell
# clone this repo
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer

# create environment
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
```
Note that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:
```shell
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
```

## :dna: Pretrained Models
| Model Name | Description | HuggingFace | BaiduNetdisk |
| :---------: | :----------: | :----------: | :----------: |
| venhancer_paper.pth | very creative, strong refinement, but sometimes over-smooths edges and texture details. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_paper.pt?download=true) | [download](https://pan.baidu.com/s/15t20RGvEHqJOMmhA_zRLiA?pwd=cpsd)|
| venhancer_v2.pth | less creative, but can generate better texture details, and has better identity preservation. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt?download=true) | [download](https://pan.baidu.com/s/1mc4s5xqcVqKyL-GwkE0loA?pwd=bbqn)|

## 💫 Inference
1) Download the VEnhancer model and then put the checkpoint in the `VEnhancer/ckpts` directory. (optional as it can be done automatically)
2) run the following command.
```bash
bash run_VEnhancer.sh
```
for single GPU inference (at least A100 80G is required), or
```bash
bash run_VEnhancer_MultiGPU.sh
```
for multiple GPU inference.

In `run_VEnhancer.sh` or `run_VEnhancer_MultiGPU.sh`,
- `version`. We now provide two choices: `v1` and `v2` (venhancer_paper.pth and venhancer_v2.pth, respectively).
- `up_scale` is the upsampling factor ($1\sim8$) for spatial super-resolution. $\times3,4$ are recommended. Note that the target resolution will be adjusted no higher than 2k resolution.
- `target_fps` is your expected target fps, and the default is 24.
- `noise_aug` is the noise level ($0\sim300$) regarding noise augmentation. Higher noise corresponds to stronger refinement. $200\sim300$ are recommended.
- Regarding prompt, you can use `--filename_as_prompt` to automatically use filename as prompt; or you can write the prompt to a txt file, and specify the prompt_path by setting `--prompt_path [your_prompt_path]`; or directly provide the prompt by specifying `--prompt [your_prompt]`.
- Regarding sampling, `--solver_mode fast` has fixed 15 sampling steps. For `--solver_mode normal`, you can modify `steps` to trade off efficiency over video quality.

### Gradio
The same functionality is also available as a gradio demo. Please follow the previous guidelines, and specify the model version (v1 or v2).
``` shell
python gradio_app.py --version v1
```

## BibTeX
If you use our work in your research, please cite our publication:
```
@article{he2024venhancer,
title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
journal={arXiv preprint arXiv:2407.07667},
year={2024}
}
```

## 🤗 Acknowledgements
Our codebase builds on [modelscope](https://github.com/modelscope/modelscope).
Thanks the authors for sharing their awesome codebases!

## 📧 Contact
If you have any questions, please feel free to reach us at `[email protected]`.