https://github.com/sczhou/upscale-a-video
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
https://github.com/sczhou/upscale-a-video
aigc-enhancement deflicker video-diffusion-model video-super-resolution
Last synced: 5 months ago
JSON representation
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
- Host: GitHub
- URL: https://github.com/sczhou/upscale-a-video
- Owner: sczhou
- License: other
- Created: 2023-11-30T18:38:24.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-09-27T10:50:59.000Z (about 1 year ago)
- Last Synced: 2025-04-06T17:11:21.721Z (7 months ago)
- Topics: aigc-enhancement, deflicker, video-diffusion-model, video-super-resolution
- Language: Python
- Homepage:
- Size: 10.8 MB
- Stars: 1,190
- Watchers: 76
- Forks: 64
- Open Issues: 33
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Upscale-A-Video:
Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
S-Lab, Nanyang Technological University
CVPR 2024 (Highlight)Upscale-A-Video is a diffusion-based model that upscales videos by taking the low-resolution video and text prompts as inputs.
![]()
:open_book: For more visual results, go checkout our project page
---
## 🔥 Update
- [2024.09] Inference code is released.
- [2024.02] YouHQ dataset is made publicly available.
- [2023.12] This repo is created.## 🎬 Overview
## 🔧 Dependencies and Installation
1. Clone Repo
```bash
git clone https://github.com/sczhou/Upscale-A-Video.git
cd Upscale-A-Video
```2. Create Conda Environment and Install Dependencies
```bash
# create new conda env
conda create -n UAV python=3.9 -y
conda activate UAV# install python dependencies
pip install -r requirements.txt
```3. Download Models
(a) Download pretrained models and configs from [Google Drive](https://drive.google.com/drive/folders/1O8pbeR1hsRlFUU8O4EULe-lOKNGEWZl1?usp=sharing) and put them under the `pretrained_models/upscale_a_video` folder.
The [`pretrained_models`](./pretrained_models) directory structure should be arranged as:
```
├── pretrained_models
│ ├── upscale_a_video
│ │ ├── low_res_scheduler
│ │ ├── ...
│ │ ├── propagator
│ │ ├── ...
│ │ ├── scheduler
│ │ ├── ...
│ │ ├── text_encoder
│ │ ├── ...
│ │ ├── tokenizer
│ │ ├── ...
│ │ ├── unet
│ │ ├── ...
│ │ ├── vae
│ │ ├── ...
```
(a) (Optional) LLaVA can be downloaded automatically when set `--use_llava` to `True`, for users with access to huggingface.## ☕️ Quick Inference
The `--input_path` can be either the path to a single video or a folder containing multiple videos.
We provide several examples in the [`inputs`](./inputs) folder.
Run the following commands to try it out:```shell
## AIGC videos
python inference_upscale_a_video.py \
-i ./inputs/aigc_1.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28python inference_upscale_a_video.py \
-i ./inputs/aigc_2.mp4 -o ./results -n 150 -g 6 -s 30 -p 24,26,28python inference_upscale_a_video.py \
-i ./inputs/aigc_3.mp4 -o ./results -n 150 -g 6 -s 30 -p 20,22,24
``````shell
## old videos/movies/animations
python inference_upscale_a_video.py \
-i ./inputs/old_video_1.mp4 -o ./results -n 150 -g 9 -s 30python inference_upscale_a_video.py \
-i ./inputs/old_movie_1.mp4 -o ./results -n 100 -g 5 -s 20 -p 17,18,19python inference_upscale_a_video.py \
-i ./inputs/old_movie_2.mp4 -o ./results -n 120 -g 6 -s 30 -p 8,10,12python inference_upscale_a_video.py \
-i ./inputs/old_animation_1.mp4 -o ./results -n 120 -g 6 -s 20 --use_video_vae
```If you notice any color discrepancies between the output and the input, you can set `--color_fix` to `"AdaIn"` or `"Wavelet"`. By default, it is set to `"None"`.
## 🎞️ YouHQ Dataset
The datasets are hosted on Google Drive| Dataset | Link | Description|
| :----- | :--: | :---- |
| YouHQ-Train | [Google Drive](https://drive.google.com/file/d/1f8g8gTHzQq-cKt4s94YQXDwJcdjL59lK/view?usp=sharing)| 38,576 videos for training, each of which has around 32 frames.|
| YouHQ40-Test| [Google Drive](https://drive.google.com/file/d/1rkeBQJMqnRTRDtyLyse4k6Vg2TilvTKC/view?usp=sharing) | 40 video clips for evaluation, each of which has around 32 frames.|## 📑 Citation
If you find our repo useful for your research, please consider citing our paper:
```bibtex
@inproceedings{zhou2024upscaleavideo,
title={{Upscale-A-Video}: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution},
author={Zhou, Shangchen and Yang, Peiqing and Wang, Jianyi and Luo, Yihang and Loy, Chen Change},
booktitle={CVPR},
year={2024}
}
```## 📝 License
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
## 📧 Contact
If you have any questions, please feel free to reach us at `shangchenzhou@gmail.com` or `peiqingyang99@outlook.com`.