{"id":13488957,"url":"https://github.com/ali-vilab/videocomposer","last_synced_at":"2025-12-29T23:24:26.348Z","repository":{"id":171602958,"uuid":"648133004","full_name":"ali-vilab/videocomposer","owner":"ali-vilab","description":"Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability","archived":false,"fork":false,"pushed_at":"2023-11-11T02:45:57.000Z","size":34930,"stargazers_count":843,"open_issues_count":31,"forks_count":77,"subscribers_count":25,"default_branch":"main","last_synced_at":"2024-05-16T00:00:23.124Z","etag":null,"topics":["diffusion-models","video-generation","video-synthesiswith"],"latest_commit_sha":null,"homepage":"https://videocomposer.github.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ali-vilab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-06-01T09:27:30.000Z","updated_at":"2024-05-10T10:50:29.000Z","dependencies_parsed_at":"2023-12-26T03:44:47.456Z","dependency_job_id":null,"html_url":"https://github.com/ali-vilab/videocomposer","commit_stats":null,"previous_names":["damo-vilab/videocomposer","ali-vilab/videocomposer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ali-vilab%2Fvideocomposer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ali-vilab%2Fvideocomposer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ali-vilab%2Fvideocomposer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ali-vilab%2Fvideocomposer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ali-vilab","download_url":"https://codeload.github.com/ali-vilab/videocomposer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245957739,"owners_count":20700323,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion-models","video-generation","video-synthesiswith"],"created_at":"2024-07-31T18:01:24.901Z","updated_at":"2025-12-29T23:24:26.343Z","avatar_url":"https://github.com/ali-vilab.png","language":"Python","funding_links":[],"categories":["Video Generation","Python","HarmonyOS"],"sub_categories":["Windows Manager"],"readme":"# VideoComposer\n\nOfficial repo for [VideoComposer: Compositional Video Synthesis with Motion Controllability](https://arxiv.org/pdf/2306.02018.pdf)\n\nPlease see [Project Page](https://videocomposer.github.io/) for more examples.\n\nWe are searching for talented, motivated, and imaginative researchers to join our team. If you are interested, please don't hesitate to send us your resume via email yingya.zyy@alibaba-inc.com\n\n![figure1](source/fig01.jpg \"figure1\")\n\n\nVideoComposer is a controllable video diffusion model, which allows users to flexibly control the spatial and temporal patterns simultaneously within a synthesized video in various forms, such as text description, sketch sequence, reference video, or even simply handcrafted motions and handrawings.\n\n\n## 🔥News!!!\n\n- __[2023.10]__ We release a high-quality I2VGen-XL model, please refer to the [Webpage](https://i2vgen-xl.github.io)\n- __[2023.08]__ We release the Gradio UI on [ModelScope](https://modelscope.cn/studios/damo/VideoComposer-Demo/summary)\n- __[2023.07]__ We release the pretrained model without watermark, please refer to the [ModelCard](https://modelscope.cn/models/damo/VideoComposer/files)\n\n\n\n## TODO\n- [x] Release our technical papers and webpage.\n- [x] Release code and pretrained model.\n- [x] Release Gradio UI on [ModelScope](https://modelscope.cn/studios/damo/VideoComposer-Demo/summary) and Hugging Face.\n- [x] Release pretrained model that can generate 8s videos without watermark on [ModelScope](https://modelscope.cn/models/damo/VideoComposer/files)\n\n\n## Method\n\n![method](source/fig02_framwork.jpg \"method\")\n\n\n## Running by Yourself\n\n### 1. Installation \n\nRequirements:\n- Python==3.8\n- ffmpeg (for motion vector extraction)\n- torch==1.12.0+cu113\n- torchvision==0.13.0+cu113\n- open-clip-torch==2.0.2\n- transformers==4.18.0\n- flash-attn==0.2 \n- xformers==0.0.13\n- motion-vector-extractor==1.0.6 (for motion vector extraction)\n\nYou also can create the same environment as ours with the following command:\n```\nconda env create -f environment.yaml\n```\n\n### 2. Download model weights\n\nDownload all the [model weights](https://www.modelscope.cn/models/damo/VideoComposer/summary) via the following command:\n\n```\n!pip install modelscope\nfrom modelscope.hub.snapshot_download import snapshot_download\nmodel_dir = snapshot_download('damo/VideoComposer', cache_dir='model_weights/', revision='v1.0.0')\n```\n\nNext, place these models in the `model_weights` folder following the file structure shown below.\n\n\n```\n|--model_weights/\n|    |--non_ema_228000.pth\n|    |--midas_v3_dpt_large.pth \n|    |--open_clip_pytorch_model.bin\n|    |--sketch_simplification_gan.pth\n|    |--table5_pidinet.pth\n|    |--v2-1_512-ema-pruned.ckpt\n```\n\nYou can also download some of them from their original project: \n- \"midas_v3_dpt_large.pth\" in [MiDaS](https://github.com/isl-org/MiDaS)\n- \"open_clip_pytorch_model.bin\" in [Open Clip](https://github.com/mlfoundations/open_clip) \n- \"sketch_simplification_gan.pth\" and \"table5_pidinet.pth\" in [Pidinet](https://github.com/zhuoinoulu/pidinet)\n- \"v2-1_512-ema-pruned.ckpt\" in [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-2-1-base/blob/main/v2-1_512-ema-pruned.ckpt).\n\nFor convenience, we provide a download link in this repo.\n\n\n### 3. Running\n\nIn this project, we provide two implementations that can help you better understand our method.\n\n\n#### 3.1 Inference with Customized Inputs\n\nYou can run the code with the following command:\n\n```\npython run_net.py\\\n    --cfg configs/exp02_motion_transfer.yaml\\\n    --seed 9999\\\n    --input_video \"demo_video/motion_transfer.mp4\"\\\n    --image_path \"demo_video/moon_on_water.jpg\"\\\n    --input_text_desc \"A beautiful big moon on the water at night\"\n```\nThe results are saved in the `outputs/exp02_motion_transfer-S09999` folder:\n\n![case1](source/results/exp02_motion_transfer-S00009.gif \"case2\")\n![case2](source/results/exp02_motion_transfer-S09999.gif \"case2\")\n\n\nIn some cases, if you notice a significant change in color difference, you can use the style condition to adjust the color distribution with the following command. This can be helpful in certain cases.\n\n\n```\npython run_net.py\\\n    --cfg configs/exp02_motion_transfer_vs_style.yaml\\\n    --seed 9999\\\n    --input_video \"demo_video/motion_transfer.mp4\"\\\n    --image_path \"demo_video/moon_on_water.jpg\"\\\n    --style_image \"demo_video/moon_on_water.jpg\"\\\n    --input_text_desc \"A beautiful big moon on the water at night\"\n```\n\n\n```\npython run_net.py\\\n    --cfg configs/exp03_sketch2video_style.yaml\\\n    --seed 8888\\\n    --sketch_path \"demo_video/src_single_sketch.png\"\\\n    --style_image \"demo_video/style/qibaishi_01.png\"\\\n    --input_text_desc \"Red-backed Shrike lanius collurio\"\n```\n![case2](source/results/exp03_sketch2video_style-S09999.gif \"case2\")\n\n\n\n```\npython run_net.py\\\n    --cfg configs/exp04_sketch2video_wo_style.yaml\\\n    --seed 144\\\n    --sketch_path \"demo_video/src_single_sketch.png\"\\\n    --input_text_desc \"A Red-backed Shrike lanius collurio is on the branch\"\n```\n![case2](source/results/exp04_sketch2video_wo_style-S00144.gif \"case2\")\n![case2](source/results/exp04_sketch2video_wo_style-S00144-1.gif \"case2\")\n\n\n\n```\npython run_net.py\\\n    --cfg configs/exp05_text_depths_wo_style.yaml\\\n    --seed 9999\\\n    --input_video demo_video/video_8800.mp4\\\n    --input_text_desc \"A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish\"\n```\n![case2](source/results/exp05_text_depths_wo_style-S09999-0.gif \"case2\")\n![case2](source/results/exp05_text_depths_wo_style-S09999-2.gif \"case2\")\n\n```\npython run_net.py\\\n    --cfg configs/exp06_text_depths_vs_style.yaml\\\n    --seed 9999\\\n    --input_video demo_video/video_8800.mp4\\\n    --style_image \"demo_video/style/qibaishi_01.png\"\\\n    --input_text_desc \"A glittering and translucent fish swimming in a small glass bowl with multicolored piece of stone, like a glass fish\"\n```\n\n![case2](source/results/exp06_text_depths_vs_style-S09999-0.gif \"case2\")\n![case2](source/results/exp06_text_depths_vs_style-S09999-1.gif \"case2\")\n\n\n#### 3.2 Inference on a Video\n\nYou can just run the code with the following command:\n```\npython run_net.py \\\n    --cfg configs/exp01_vidcomposer_full.yaml \\\n    --input_video \"demo_video/blackswan.mp4\" \\\n    --input_text_desc \"A black swan swam in the water\" \\\n    --seed 9999\n```\n\nThis command will extract the different conditions, e.g., depth, sketch, and motion vectors, of the input video for the following video generation, which are saved in the `outputs` folder. The task list are predefined in \u003cfont style=\"color: rgb(128,128,255)\"\u003einference_multi.py\u003c/font\u003e. \n\n\n\nIn addition to the above use cases, you can explore further possibilities with this code and model. Please note that due to the diversity of generated samples by the diffusion model, you can explore different seeds to generate better results. \n\nWe hope you enjoy using it! \u0026#x1F600; \n\n\n\n## BibTeX\n\nIf this repo is useful to you, please cite our technical paper.\n```bibtex\n@article{2023videocomposer,\n  title={VideoComposer: Compositional Video Synthesis with Motion Controllability},\n  author={Wang, Xiang* and Yuan, Hangjie* and Zhang, Shiwei* and Chen, Dayou* and Wang, Jiuniu, and Zhang, Yingya, and Shen, Yujun, and Zhao, Deli and Zhou, Jingren},\n  booktitle={arXiv preprint arXiv:2306.02018},\n  year={2023}\n}\n```\n\n\n## Acknowledgement\n\nWe would like to express our gratitude for the contributions of several previous works to the development of VideoComposer. This includes, but is not limited to [Composer](https://arxiv.org/abs/2302.09778), [ModelScopeT2V](https://modelscope.cn/models/damo/text-to-video-synthesis/summary), [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [OpenCLIP](https://github.com/mlfoundations/open_clip), [WebVid-10M](https://m-bain.github.io/webvid-dataset/), [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/), [Pidinet](https://github.com/zhuoinoulu/pidinet) and [MiDaS](https://github.com/isl-org/MiDaS). We are committed to building upon these foundations in a way that respects their original contributions.\n\n\n## Disclaimer\n\nThis open-source model is trained on the [WebVid-10M](https://m-bain.github.io/webvid-dataset/) and [LAION-400M](https://laion.ai/blog/laion-400-open-dataset/) datasets and is intended for \u003cstrong\u003eRESEARCH/NON-COMMERCIAL USE ONLY\u003c/strong\u003e. We have also trained more powerful models using internal video data, which can be used in the future.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fali-vilab%2Fvideocomposer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fali-vilab%2Fvideocomposer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fali-vilab%2Fvideocomposer/lists"}