{"id":13488891,"url":"https://github.com/merlresearch/TI2V-Zero","last_synced_at":"2025-03-28T02:31:18.326Z","repository":{"id":244293968,"uuid":"814802700","full_name":"merlresearch/TI2V-Zero","owner":"merlresearch","description":"Text-conditioned image-to-video generation based on diffusion models.","archived":false,"fork":false,"pushed_at":"2024-06-13T18:29:19.000Z","size":294,"stargazers_count":34,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-31T01:35:00.244Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/merlresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T18:29:17.000Z","updated_at":"2024-10-27T02:29:00.000Z","dependencies_parsed_at":"2024-06-13T22:32:16.554Z","dependency_job_id":"57df98c7-9df4-4f7f-818d-c161d6be6a3b","html_url":"https://github.com/merlresearch/TI2V-Zero","commit_stats":null,"previous_names":["merlresearch/ti2v-zero"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/merlresearch%2FTI2V-Zero","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/merlresearch%2FTI2V-Zero/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/merlresearch%2FTI2V-Zero/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/merlresearch%2FTI2V-Zero/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/merlresearch","download_url":"https://codeload.github.com/merlresearch/TI2V-Zero/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245957673,"owners_count":20700314,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:23.619Z","updated_at":"2025-03-28T02:31:17.918Z","avatar_url":"https://github.com/merlresearch.png","language":"Python","funding_links":[],"categories":["Video Generation","Papers"],"sub_categories":["Text-Video Generation"],"readme":"\u003c!--\nCopyright (C) 2024 Mitsubishi Electric Research Laboratories (MERL)\n\nSPDX-License-Identifier: AGPL-3.0-or-later\n--\u003e\n\n# TI2V-Zero (CVPR 2024)\n\nThis repository contains the implementation of the paper:\n\u003e **TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models\"**\u003cbr\u003e\n\u003e [Haomiao Ni](https://nithin-gk.github.io/), [Bernhard Egger](https://eggerbernhard.ch/) [Suhas Lohit](https://www.merl.com/people/slohit), [Anoop Cherian](https://www.merl.com/people/cherian), [Ye Wang](https://www.merl.com/people/yewang), [Toshiaki Koike-Akino](https://www.merl.com/people/koike), [Sharon X. Huang](https://faculty.ist.psu.edu/suh972/), [Tim K Marks](https://www.merl.com/people/tmarks)\n\nIEEE/CVF Conference on Computer Vision and Pattern Recognition (**CVPR**), 2024\n\n[[Project Page](https://www.merl.com/demos/TI2V-Zero)]\n\n## Summary\n\n\u003cdiv align=center\u003e\u003cimg src=\"framework.png\" width=\"915px\" height=\"283px\"/\u003e\u003c/div\u003e\n\nText-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (\\eg, a woman's photo) and a text description e.g., \"a woman is drinking water\". Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this work, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a \"repeat-and-slide\" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation.\n\n## Quick Start\n----\n1. Install required dependencies. First create a conda environment using `conda create --name ti2v python=3.8`. Activate the conda environment using `conda activate ti2v`. Then use `pip install -r requirements.txt` to install the remaining dependencies.\n2. Run `python initialization.py` to download pretrained [ModelScope](https://modelscope.cn/models/iic/text-to-video-synthesis/summary) models from [HuggingFace](https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis).\n3. Run `python demo_img2vid.py` to generate videos by providing an image and a text input.\n\nYou can set the image path and text input in this file manually. By default, the file uses example images and text inputs. The example images in the `examples/` folder were generated using [Stable Diffusion](https://github.com/CompVis/stable-diffusion)\n\n## Generating Videos using Public Datasets\n----\n**MUG Dataset**\n1. Download MUG dataset from their [website](https://mug.ee.auth.gr/fed/).\n2. After installing dependencies, run `python gen_video_mug.py` to generate videos. Please set the paths in the code files if needed.\n\n**UCF101 Dataset**\n1. Download UCF101 dataset from their [website](https://www.crcv.ucf.edu/data/UCF101.php).\n2. Preprocess the dataset to sample frames from video. You may use our preprocessing function in `datasets_ucf.py`.\n3. After installing dependencies, run `python gen_video_ucf.py` to generate videos. Please set the paths in the code files if needed.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for our policy on contributions.\n\n## License\n\nReleased under `AGPL-3.0-or-later` license, as found in the [LICENSE.md](LICENSE.md) file.\n\nAll files, except as noted below:\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nSPDX-License-Identifier: AGPL-3.0-or-later\n```\n\nThe following files\n\n* `autoencoder.py`\n* `diffusion.py`\n* `modelscope_t2v.py`\n* `unet_sd.py`\n\nwere adapted from https://github.com/modelscope/modelscope/tree/57791a8cc59ccf9eda8b94a9a9512d9e3029c00b/modelscope/models/multi_modal/video_synthesis (license included in [LICENSES/Apache-2.0.txt](LICENSES/Apache-2.0.txt)):\n\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nCopyright (c) 2021-2022 The Alibaba Fundamental Vision Team Authors\n```\n\nThe following file\n\n* `modelscope_t2v_pipeline.py`\n\nwas adapted from https://github.com/modelscope/modelscope/blob/bedec553c17b7e297da9db466fee61ccbd4295ba/modelscope/pipelines/multi_modal/text_to_video_synthesis_pipeline.py (license included in [LICENSES/Apache-2.0.txt](LICENSES/Apache-2.0.txt))\n\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nCopyright (c) Alibaba, Inc. and its affiliates.\n```\n\nThe following file\n\n* `util.py`\n\nwas adapted from https://github.com/modelscope/modelscope/blob/57791a8cc59ccf9eda8b94a9a9512d9e3029c00b/modelscope/models/cv/anydoor/ldm/util.py (license included in [LICENSES/Apache-2.0.txt](LICENSES/Apache-2.0.txt)):\n\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nCopyright (c) 2021-2022 The Alibaba Fundamental Vision Team Authors. All rights reserved.\n```\n\nThe following files:\n\n* `dataset/datasets_mug.py`\n* `dataset/datasets_ucf.py`\n\nwere adapted from [LFDM](https://github.com/nihaomiao/CVPR23_LFDM/tree/main/preprocessing) (license included in [LICENSES/BSD-2-Clause.txt](LICENSES/BSD-2-Clause.txt)):\n\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nCopyright (C) 2023 NEC Laboratories America, Inc. (\"NECLA\"). All rights reserved.\n```\n\nThe following files\n\n* `demo_img2vid.py`\n* `gen_video_mug.py`\n* `gen_video_ucf.py`\n\nwere adapted from [LFDM](https://github.com/nihaomiao/CVPR23_LFDM/tree/main/demo) (license included in [LICENSES/BSD-2-Clause.txt](LICENSES/BSD-2-Clause.txt)):\n\n```\nCopyright (c) 2024 Mitsubishi Electric Research Laboratories (MERL)\nCopyright (C) 2023 NEC Laboratories America, Inc. (\"NECLA\"). All rights reserved.\n```\n\n## Citation\nIf you use our work, please use the following citation\n\n\n```bibTex\n@inproceedings{ni2024ti2v,\n  title={TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models},\n  author={Ni, Haomiao and Egger, Bernhard and Lohit, Suhas and Cherian, Anoop and Wang, Ye and Koike-Akino, Toshiaki and Huang, Sharon X and Marks, Tim K},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerlresearch%2FTI2V-Zero","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmerlresearch%2FTI2V-Zero","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerlresearch%2FTI2V-Zero/lists"}