Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Vchitect/VEnhancer
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
https://github.com/Vchitect/VEnhancer
aigc-enhancement diffusion-models frame-interpolation text-to-video video-enhancement video-generation video-super-resolution video-to-video
Last synced: 4 days ago
JSON representation
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
- Host: GitHub
- URL: https://github.com/Vchitect/VEnhancer
- Owner: Vchitect
- Created: 2024-07-10T13:52:26.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-16T13:48:02.000Z (about 2 months ago)
- Last Synced: 2024-09-17T09:26:05.529Z (about 2 months ago)
- Topics: aigc-enhancement, diffusion-models, frame-interpolation, text-to-video, video-enhancement, video-generation, video-super-resolution, video-to-video
- Language: Python
- Homepage: https://vchitect.github.io/VEnhancer-project/
- Size: 31 MB
- Stars: 375
- Watchers: 17
- Forks: 22
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
VEnhancer: Generative Space-Time Enhancement
for Video Generation
Peng Gao,
Dahua Lin,
Yu Qiao,
Wanli Ouyang,
Ziwei Liu
The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory,
S-Lab, Nanyang Technological UniversityVEnhancer, an All-in-One generative video enhancement model that can achieve spatial super-resolution, temporal super-resolution, and video refinement for AI-generated videos.
AIGC video
+VEnhancer
:open_book: For more visual results, go checkout our project page
---
## 🔥 Update
- [2024.09.12] 😸 Release our version 2 checkpoint: **[venhancer_v2.pt](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt)** . It is less creative, but is able to generate more texture details, and has better identity preservation, which is more suitable for enhancing videos with profiles.
- [2024.09.10] 😸 Support **Multiple GPU Inference** and **tiled VAE** for temporal VAE decoding. And more stable performance for long video enhancement.
- [2024.08.18] 😸 Support enhancement for **abitrary long videos** (by spliting the videos into muliple chunks with overlaps); **Faster sampling** with only 15 steps without obvious quality loss (by setting `--solver_mode 'fast'` in the script command); Use **temporal VAE** to reduce video flickering.
- [2024.07.28] 🔥 Inference code and pretrained video enhancement model are released.
- [2024.07.10] 🤗 This repo is created.## :astonished: Gallery
| Inputs & Results | Model Version |
| :---------- | :-: |
|Prompt: A close-up shot of a woman standing in a dimly lit room. she is wearing a traditional chinese outfit, which includes a red and gold dress with intricate designs and a matching headpiece.
from [Open-Sora](https://github.com/hpcaitech/Open-Sora)|v2|
|Prompt: Einstein plays guitar.
from [Kling](https://kling.kuaishou.com/en)|v2|
|Prompt: A girl eating noodles.
from [Kling](https://kling.kuaishou.com/en)|v2|
|Prompt: A little brick man visiting an art gallery.
from [Kling](https://kling.kuaishou.com/en) |v1|## 🎬 Overview
VEnhancer achieves spatial super-resolution, temporal super-resolution (i.e, frame interpolation), and video refinement in **one model**.
It is flexible to adapt to different upsampling factors (e.g., 1x~8x) for either spatial or temporal super-resolution. Besides, it provides flexible control to modify the refinement strength for handling diversified video artifacts.It follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network.
This **video ControlNet** accepts both low-resolution key frames and full frames of noisy latents as inputs.
Also, the noise level $\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning through our proposed **video-aware conditioning** apart from timestep $t$ and prompt $c_{text}$.## :gear: Installation
```shell
# clone this repo
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer# create environment
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt
```
Note that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:
```shell
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
```## :dna: Pretrained Models
| Model Name | Description | HuggingFace | BaiduNetdisk |
| :---------: | :----------: | :----------: | :----------: |
| venhancer_paper.pth | very creative, strong refinement, but sometimes over-smooths edges and texture details. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_paper.pt?download=true) | [download](https://pan.baidu.com/s/15t20RGvEHqJOMmhA_zRLiA?pwd=cpsd)|
| venhancer_v2.pth | less creative, but can generate better texture details, and has better identity preservation. | [download](https://huggingface.co/jwhejwhe/VEnhancer/resolve/main/venhancer_v2.pt?download=true) | [download](https://pan.baidu.com/s/1mc4s5xqcVqKyL-GwkE0loA?pwd=bbqn)|## 💫 Inference
1) Download the VEnhancer model and then put the checkpoint in the `VEnhancer/ckpts` directory. (optional as it can be done automatically)
2) run the following command.
```bash
bash run_VEnhancer.sh
```
for single GPU inference (at least A100 80G is required), or
```bash
bash run_VEnhancer_MultiGPU.sh
```
for multiple GPU inference.In `run_VEnhancer.sh` or `run_VEnhancer_MultiGPU.sh`,
- `version`. We now provide two choices: `v1` and `v2` (venhancer_paper.pth and venhancer_v2.pth, respectively).
- `up_scale` is the upsampling factor ($1\sim8$) for spatial super-resolution. $\times3,4$ are recommended. Note that the target resolution will be adjusted no higher than 2k resolution.
- `target_fps` is your expected target fps, and the default is 24.
- `noise_aug` is the noise level ($0\sim300$) regarding noise augmentation. Higher noise corresponds to stronger refinement. $200\sim300$ are recommended.
- Regarding prompt, you can use `--filename_as_prompt` to automatically use filename as prompt; or you can write the prompt to a txt file, and specify the prompt_path by setting `--prompt_path [your_prompt_path]`; or directly provide the prompt by specifying `--prompt [your_prompt]`.
- Regarding sampling, `--solver_mode fast` has fixed 15 sampling steps. For `--solver_mode normal`, you can modify `steps` to trade off efficiency over video quality.### Gradio
The same functionality is also available as a gradio demo. Please follow the previous guidelines, and specify the model version (v1 or v2).
``` shell
python gradio_app.py --version v1
```## BibTeX
If you use our work in your research, please cite our publication:
```
@article{he2024venhancer,
title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
journal={arXiv preprint arXiv:2407.07667},
year={2024}
}
```## 🤗 Acknowledgements
Our codebase builds on [modelscope](https://github.com/modelscope/modelscope).
Thanks the authors for sharing their awesome codebases!## 📧 Contact
If you have any questions, please feel free to reach us at `[email protected]`.