https://github.com/ailab-cvc/make-your-video

[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance
https://github.com/ailab-cvc/make-your-video

Last synced: 12 months ago
JSON representation

[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance

Host: GitHub
URL: https://github.com/ailab-cvc/make-your-video
Owner: AILab-CVC
License: other
Created: 2023-05-31T07:24:26.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-02-24T07:56:41.000Z (over 2 years ago)
Last Synced: 2025-06-04T02:16:47.524Z (about 1 year ago)
Language: Python
Homepage: https://doubiiu.github.io/projects/Make-Your-Video/
Size: 61.1 MB
Stars: 193
Watchers: 15
Forks: 9
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          ## ___***Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance***___



       

       

_**[Jinbo Xing](https://doubiiu.github.io/), [Menghan Xia*](https://menghanxia.github.io), [Yuxin Liu](), [Yuechen Zhang](https://julianjuaner.github.io/), [Yong Zhang](https://yzhang2016.github.io), [Yingqing He](https://github.com/YingqingHe), [Hanyuan Liu](https://github.com/hyliu), 
[Haoxin Chen](), [Xiaodong Cun](https://vinthony.github.io/academic/), [Xintao Wang](https://xinntao.github.io/), [Ying Shan](https://scholar.google.com/citations?hl=en&user=4oXBp9UAAAAJ&view_op=list_works&sortby=pubdate), [Tien-Tsin Wong](https://www.cse.cuhk.edu.hk/~ttwong/myself.html)**_





(* corresponding author)

From CUHK and Tencent AI Lab.

IEEE TVCG 2024



## 🔆 Introduction

 Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.

## 🤗 **Applications**

### Real-life scene to video

			   		

						Real-life scene

						Ours

						Text2Video-zero+CtrlNet

						LVDM_Ext+Adapter

			   		

  

  

  

    

  

  

    

  

  

    

  

  

    

  

"A dam discharging water"

  

  

  

    

  

  

    

  

  

    

  

  

    

  

"A futuristic rocket ship on a launchpad, with sleek design, glowing lights"

### 3D scene modeling to video

			   		

						Real-life scene

						Ours

						Text2Video-zero+CtrlNet

						LVDM_Ext+Adapter

			   		

  

  

  

    

  

  

    

  

  

    

  

  

    

  

"A train on the rail, 2D cartoon style"

  

  

  

    

  

  

    

  

  

    

  

  

    

  

"A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic"

  

  

  

    

  

  

    

  

  

    

  

  

    

  

"A Chinese ink wash landscape painting"

### Video re-rendering

			   		

						Original video

						Ours

						SD-Depth

						Text2Video-zero+CtrlNet

						LVDM_Ext+Adapter

						Tune-A-Video

			   		

  

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  "A tiger walks in the forest, photorealistic"

  

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  

  "An origami boat moving on the sea"

  

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  

    

  

  

  "A camel walking on the snow field, Miyazaki Hayao anime style"

## 🌟 **Method Overview**

![](./assets/overview.jpg#gh-light-mode-only)

![](./assets/overview_black.png#gh-dark-mode-only)

## 📝 Changelog

- __[2023.11.30]__: 🔥🔥 Release the main model.

- __[2023.06.01]__: 🔥🔥 Create this repo and launch the project webpage.




## 🧰 Models

|Model|Resolution|Checkpoint|

|:---------|:---------|:--------|

|MakeYourVideo256|256x256|[Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt)|

It takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.

## ⚙️ Setup

### Install Environment via Anaconda (Recommended)

```bash

conda create -n makeyourvideo python=3.8.5

conda activate makeyourvideo

pip install -r requirements.txt

```

## 💫 Inference 

### 1. Command line

1) Download the pre-trained depth estimation model from [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/dpt_hybrid-midas-501f0c75.pt), and put the `dpt_hybrid-midas-501f0c75.pt` in `checkpoints/depth/dpt_hybrid-midas-501f0c75.pt`.

2) Download pretrained models via [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/makeyourvideo_256_v1/model.ckpt`.

3) Input the following commands in terminal.

```bash

  sh scripts/run.sh

```

## 👨‍👩‍👧‍👦 Other Interesting Open-source Projects

[VideoCrafter1](https://github.com/AILab-CVC/VideoCrafter): Framework for high-quality video generation.

[DynamiCrafter](https://doubiiu.github.io/projects/DynamiCrafter/): Open-domain image animation methods using video diffusion priors.

Play with these projects in the same conda environement!

## 😉 Citation

```bib

@article{xing2023make,

  title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},

  author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},

  journal={arXiv preprint arXiv:2306.00943},

  year={2023}

}

```

## 📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.

****

## 🌞 **Acknowledgement**

We gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the [WebVid-10M](https://m-bain.github.io/webvid-dataset/) dataset and follow the corresponding terms of access.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ailab-cvc/make-your-video

Awesome Lists containing this project

README