{"id":20161934,"url":"https://github.com/ailab-cvc/make-your-video","last_synced_at":"2025-07-04T00:06:22.805Z","repository":{"id":171743453,"uuid":"647626341","full_name":"AILab-CVC/Make-Your-Video","owner":"AILab-CVC","description":"[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance","archived":false,"fork":false,"pushed_at":"2024-02-24T07:56:41.000Z","size":64105,"stargazers_count":193,"open_issues_count":4,"forks_count":9,"subscribers_count":15,"default_branch":"main","last_synced_at":"2025-06-04T02:16:47.524Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://doubiiu.github.io/projects/Make-Your-Video/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-31T07:24:26.000Z","updated_at":"2025-05-31T15:02:10.000Z","dependencies_parsed_at":"2024-02-24T08:40:24.155Z","dependency_job_id":null,"html_url":"https://github.com/AILab-CVC/Make-Your-Video","commit_stats":null,"previous_names":["videocrafter/make-your-video","ailab-cvc/make-your-video"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AILab-CVC/Make-Your-Video","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FMake-Your-Video","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FMake-Your-Video/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FMake-Your-Video/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FMake-Your-Video/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/Make-Your-Video/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FMake-Your-Video/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263421918,"owners_count":23464048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T00:21:57.394Z","updated_at":"2025-07-04T00:06:17.700Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## ___***Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance***___\n\n\u003cdiv align=\"center\"\u003e\n\n \u003ca href='https://arxiv.org/abs/2306.00943'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-2306.00943-b31b1b.svg'\u003e\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n \u003ca href='https://doubiiu.github.io/projects/Make-Your-Video/'\u003e\u003cimg src='https://img.shields.io/badge/Project-Video-Green'\u003e\u003c/a\u003e \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n\n\n\n\n_**[Jinbo Xing](https://doubiiu.github.io/), [Menghan Xia*](https://menghanxia.github.io), [Yuxin Liu](), [Yuechen Zhang](https://julianjuaner.github.io/), [Yong Zhang](https://yzhang2016.github.io), [Yingqing He](https://github.com/YingqingHe), [Hanyuan Liu](https://github.com/hyliu), \u003cbr\u003e[Haoxin Chen](), [Xiaodong Cun](https://vinthony.github.io/academic/), [Xintao Wang](https://xinntao.github.io/), [Ying Shan](https://scholar.google.com/citations?hl=en\u0026user=4oXBp9UAAAAJ\u0026view_op=list_works\u0026sortby=pubdate), [Tien-Tsin Wong](https://www.cse.cuhk.edu.hk/~ttwong/myself.html)**_\n\u003cbr\u003e\u003cbr\u003e\n(* corresponding author)\n\nFrom CUHK and Tencent AI Lab.\n\nIEEE TVCG 2024\n\u003c/div\u003e\n\n## 🔆 Introduction\n Make-Your-Video is a customized video generation model with both text and motion structure (depth) control. It inherits rich visual concepts from image LDM and supports longer video inference.\n\n\n## 🤗 **Applications**\n### Real-life scene to video\n\u003ctable class=\"center\"\u003e\n\t\t\t   \t\t\u003ctr style=\"font-weight: bolder;text-align:center;\"\u003e\n\t\t\t\t\t\t\u003ctd\u003eReal-life scene\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eOurs\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eText2Video-zero+CtrlNet\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eLVDM\u003csub\u003eExt\u003c/sub\u003e+Adapter\u003c/td\u003e\n\t\t\t   \t\t\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/dam_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/dam_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/dam_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/dam_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\"A dam discharging water\"\u003c/td\u003e\u003c/tr\u003e\n  \n\n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/rocket_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/rocket_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/rocket_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/real-life_GIF/rocket_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\"A futuristic rocket ship on a launchpad, with sleek design, glowing lights\"\u003c/td\u003e\u003c/tr\u003e\n\u003c/table \u003e\n\n### 3D scene modeling to video\n\u003ctable class=\"center\"\u003e\n\t\t\t   \t\t\u003ctr style=\"font-weight: bolder;text-align:center;\"\u003e\n\t\t\t\t\t\t\u003ctd\u003eReal-life scene\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eOurs\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eText2Video-zero+CtrlNet\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eLVDM\u003csub\u003eExt\u003c/sub\u003e+Adapter\u003c/td\u003e\n\t\t\t   \t\t\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/train_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/train_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/train_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/train_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\"A train on the rail, 2D cartoon style\"\u003c/td\u003e\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/book_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/book_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/book_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/book_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\"A Van Gogh style painting on drawing board in park, some books on the picnic blanket, photorealistic\"\u003c/td\u003e\u003c/tr\u003e\n\n\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/mountain_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/mountain_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/mountain_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/3dmodeling_GIF/mountain_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\u003ctd colspan=\"4\"\u003e\"A Chinese ink wash landscape painting\"\u003c/td\u003e\u003c/tr\u003e\n\u003c/table \u003e\n\n### Video re-rendering\n\u003ctable class=\"center\"\u003e\n\t\t\t   \t\t\u003ctr style=\"font-weight: bolder; text-align:center;\"\u003e\n\t\t\t\t\t\t\u003ctd\u003eOriginal video\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eOurs\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eSD-Depth\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eText2Video-zero+CtrlNet\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eLVDM\u003csub\u003eExt\u003c/sub\u003e+Adapter\u003c/td\u003e\n\t\t\t\t\t\t\u003ctd\u003eTune-A-Video\u003c/td\u003e\n\t\t\t   \t\t\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_sddepth.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/bear_tav.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n  \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\"A tiger walks in the forest, photorealistic\"\u003c/td\u003e\u003c/tr\u003e\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_sddepth.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/boat_tav.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\"An origami boat moving on the sea\"\u003c/td\u003e\u003c/tr\u003e\n\n  \n  \u003ctr\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_input.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_ours.gif width=\"170\"\u003e\n  \u003c/td\u003e\n\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_sddepth.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_t2vzero.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_lvdm.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e\n    \u003cimg src=assets/video-rerendering_GIF/camel_tav.gif width=\"170\"\u003e\n  \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\"A camel walking on the snow field, Miyazaki Hayao anime style\"\u003c/td\u003e\u003c/tr\u003e\n\u003c/table \u003e\n\n## 🌟 **Method Overview**\n\n![](./assets/overview.jpg#gh-light-mode-only)\n![](./assets/overview_black.png#gh-dark-mode-only)\n\n\n## 📝 Changelog\n- __[2023.11.30]__: 🔥🔥 Release the main model.\n- __[2023.06.01]__: 🔥🔥 Create this repo and launch the project webpage.\n\u003cbr\u003e\n\n\n## 🧰 Models\n\n|Model|Resolution|Checkpoint|\n|:---------|:---------|:--------|\n|MakeYourVideo256|256x256|[Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt)|\n\nIt takes approximately 13 seconds and requires a peak GPU memory of 20 GB to animate an image using a single NVIDIA A100 (40G) GPU.\n\n## ⚙️ Setup\n\n### Install Environment via Anaconda (Recommended)\n```bash\nconda create -n makeyourvideo python=3.8.5\nconda activate makeyourvideo\npip install -r requirements.txt\n```\n\n\n## 💫 Inference \n### 1. Command line\n1) Download the pre-trained depth estimation model from [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/dpt_hybrid-midas-501f0c75.pt), and put the `dpt_hybrid-midas-501f0c75.pt` in `checkpoints/depth/dpt_hybrid-midas-501f0c75.pt`.\n2) Download pretrained models via [Hugging Face](https://huggingface.co/Doubiiu/Make-Your-Video/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/makeyourvideo_256_v1/model.ckpt`.\n3) Input the following commands in terminal.\n```bash\n  sh scripts/run.sh\n```\n\n\n\n\n\n## 👨‍👩‍👧‍👦 Other Interesting Open-source Projects\n[VideoCrafter1](https://github.com/AILab-CVC/VideoCrafter): Framework for high-quality video generation.\n\n[DynamiCrafter](https://doubiiu.github.io/projects/DynamiCrafter/): Open-domain image animation methods using video diffusion priors.\n\nPlay with these projects in the same conda environement!\n## 😉 Citation\n```bib\n@article{xing2023make,\n  title={Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance},\n  author={Xing, Jinbo and Xia, Menghan and Liu, Yuxin and Zhang, Yuechen and Zhang, Yong and He, Yingqing and Liu, Hanyuan and Chen, Haoxin and Cun, Xiaodong and Wang, Xintao and others},\n  journal={arXiv preprint arXiv:2306.00943},\n  year={2023}\n}\n```\n\n\n## 📢 Disclaimer\nWe develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.\n****\n\n\n## 🌞 **Acknowledgement**\nWe gratefully acknowledge the Visual Geometry Group of University of Oxford for collecting the [WebVid-10M](https://m-bain.github.io/webvid-dataset/) dataset and follow the corresponding terms of access.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fmake-your-video","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Fmake-your-video","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fmake-your-video/lists"}