An open API service indexing awesome lists of open source software.

https://github.com/ailab-cvc/make-your-video

[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance
https://github.com/ailab-cvc/make-your-video

Last synced: 12 months ago
JSON representation

[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance

Awesome Lists containing this project

README

          

## ___***Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance***___

     
     

_**[Jinbo Xing](https://doubiiu.github.io/), [Menghan Xia*](https://menghanxia.github.io), [Yuxin Liu](), [Yuechen Zhang](https://julianjuaner.github.io/), [Yong Zhang](https://yzhang2016.github.io), [Yingqing He](https://github.com/YingqingHe), [Hanyuan Liu](https://github.com/hyliu),
[Haoxin Chen](), [Xiaodong Cun](https://vinthony.github.io/academic/), [Xintao Wang](https://xinntao.github.io/), [Ying Shan](https://scholar.google.com/citations?hl=en&user=4oXBp9UAAAAJ&view_op=list_works&sortby=pubdate), [Tien-Tsin Wong](https://www.cse.cuhk.edu.hk/~ttwong/myself.html)**_



(* corresponding author)

From CUHK and Tencent AI Lab.

IEEE TVCG 2024

## 🔆 Introduction
Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.

## 🤗 **Applications**
### Real-life scene to video


Real-life scene
Ours
Text2Video-zero+CtrlNet
LVDMExt+Adapter













"A dam discharging water"












"A futuristic rocket ship on a launchpad, with sleek design, glowing lights"

### 3D scene modeling to video


Real-life scene
Ours
Text2Video-zero+CtrlNet
LVDMExt+Adapter













"A train on the rail, 2D cartoon style"












"A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic"













"A Chinese ink wash landscape painting"

### Video re-rendering


Original video
Ours
SD-Depth
Text2Video-zero+CtrlNet
LVDMExt+Adapter
Tune-A-Video



















"A tiger walks in the forest, photorealistic"




















"An origami boat moving on the sea"





















"A camel walking on the snow field, Miyazaki Hayao anime style"

## 🌟 **Method Overview**

![](./assets/overview.jpg#gh-light-mode-only)
![](./assets/overview_black.png#gh-dark-mode-only)

## 📝 Changelog
- __[2023.11.30]__: 🔥🔥 Release the main model.
- __[2023.06.01]__: 🔥🔥 Create this repo and launch the project webpage.

## 🧰 Models

|Model|Resolution|Checkpoint|
|:---------|:---------|:--------|
|MakeYourVideo256|256x256|[Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt)|

It takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.

## ⚙️ Setup

### Install Environment via Anaconda (Recommended)
```bash
conda create -n makeyourvideo python=3.8.5
conda activate makeyourvideo
pip install -r requirements.txt
```

## 💫 Inference
### 1. Command line
1) Download the pre-trained depth estimation model from [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/dpt_hybrid-midas-501f0c75.pt), and put the `dpt_hybrid-midas-501f0c75.pt` in `checkpoints/depth/dpt_hybrid-midas-501f0c75.pt`.
2) Download pretrained models via [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/makeyourvideo_256_v1/model.ckpt`.
3) Input the following commands in terminal.
```bash
sh scripts/run.sh
```

## 👨‍👩‍👧‍👦 Other Interesting Open-source Projects
[VideoCrafter1](https://github.com/AILab-CVC/VideoCrafter): Framework for high-quality video generation.

[DynamiCrafter](https://doubiiu.github.io/projects/DynamiCrafter/): Open-domain image animation methods using video diffusion priors.

Play with these projects in the same conda environement!
## 😉 Citation
```bib
@article{xing2023make,
title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},
author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},
journal={arXiv preprint arXiv:2306.00943},
year={2023}
}
```

## 📢 Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
****

## 🌞 **Acknowledgement**
We gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the [WebVid-10M](https://m-bain.github.io/webvid-dataset/) dataset and follow the corresponding terms of access.