{"id":13488886,"url":"https://github.com/nihaomiao/CVPR23_LFDM","last_synced_at":"2025-03-28T02:31:23.022Z","repository":{"id":147208979,"uuid":"607301922","full_name":"nihaomiao/CVPR23_LFDM","owner":"nihaomiao","description":"The pytorch implementation of our CVPR 2023 paper \"Conditional Image-to-Video Generation with Latent Flow Diffusion Models\"","archived":false,"fork":false,"pushed_at":"2024-06-18T17:55:12.000Z","size":27678,"stargazers_count":451,"open_issues_count":2,"forks_count":42,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-10-31T01:34:58.458Z","etag":null,"topics":["cvpr2023","diffusion-models","image-animation","image-to-video","latent-diffusion","optical-flow","video-generation","video-prediction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nihaomiao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-27T18:09:16.000Z","updated_at":"2024-10-30T07:01:22.000Z","dependencies_parsed_at":"2024-06-18T22:37:35.206Z","dependency_job_id":null,"html_url":"https://github.com/nihaomiao/CVPR23_LFDM","commit_stats":{"total_commits":45,"total_committers":2,"mean_commits":22.5,"dds":"0.37777777777777777","last_synced_commit":"bc1ff9a3279921a5d74f6e0b76ea9e505b242bc7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nihaomiao%2FCVPR23_LFDM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nihaomiao%2FCVPR23_LFDM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nihaomiao%2FCVPR23_LFDM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nihaomiao%2FCVPR23_LFDM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nihaomiao","download_url":"https://codeload.github.com/nihaomiao/CVPR23_LFDM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245957673,"owners_count":20700314,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2023","diffusion-models","image-animation","image-to-video","latent-diffusion","optical-flow","video-generation","video-prediction"],"created_at":"2024-07-31T18:01:23.563Z","updated_at":"2025-03-28T02:31:18.008Z","avatar_url":"https://github.com/nihaomiao.png","language":"Python","funding_links":[],"categories":["Video Generation"],"sub_categories":[],"readme":"!!! Check out our new CVPR 2024 [paper](https://arxiv.org/abs/2404.16306) and [code](https://github.com/merlresearch/TI2V-Zero) designed for text-conditioned image-to-video generation\n\nLFDM\n=====\nThe pytorch implementation of our CVPR 2023 paper [Conditional Image-to-Video Generation with Latent Flow Diffusion Models](https://arxiv.org/abs/2303.13744).\n\n\u003cdiv align=center\u003e\u003cimg src=\"architecture.png\" width=\"915px\" height=\"306px\"/\u003e\u003c/div\u003e\n\nUpdates\n-----\n[Updated on 07/08/2023] Added multi-GPU training codes for MHAD dataset.\n\n[Updated on 05/12/2023] Released a testing demo for NATOPS dataset.\n\n[Updated on 03/31/2023] Added the illustration of training a LFDM for NATOPS dataset.\n\n[Updated on 03/27/2023] Added the illustration of training a LFDM for MHAD dataset.\n\n[Updated on 03/27/2023] Released a testing demo for MHAD dataset.\n\n[Updated on 03/26/2023] Added the illustration of training a LFDM for MUG dataset.\n\n[Updated on 03/26/2023] Now our paper is available on [arXiv](https://arxiv.org/abs/2303.13744).\n\n[Updated on 03/20/2023] Released a testing demo for MUG dataset.\n\nExample Videos\n------\nAll the subjects of the following videos are *unseen* during the training. \n\nSome generated video results on MUG dataset.\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"examples/mug.gif\" width=\"500\" height=\"276\"/\u003e\n\u003c/div\u003e\n\nSome generated video results on MHAD dataset.\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"examples/mhad1.gif\" width=\"500\" height=\"530\"/\u003e\n\u003c/div\u003e\n\u003cdiv align=center\u003e\n\u003cimg src=\"examples/mhad2.gif\" width=\"500\" height=\"416\"/\u003e\n\u003c/div\u003e\n\nSome generated video results on NATOPS dataset.\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"examples/natops.gif\" width=\"500\" height=\"525\"/\u003e\n\u003c/div\u003e\n\nApplied LFDM trained on MUG to FaceForensics dataset.\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"examples/new_domain_grid.gif\" width=\"400\" height=\"523\"/\u003e\n\u003c/div\u003e\n\nPretrained Models\n-----\n\n|Dataset|Model| Frame Sampling |Link (Google Drive)|\n|-------|------|----------------|-----|\n|MUG|LFAE| -         |https://drive.google.com/file/d/1dRn1wl5TUaZJiiDpIQADt1JJ0_q36MVG/view?usp=share_link|\n|MUG|DM| very_random         |   https://drive.google.com/file/d/1lPVIT_cXXeOVogKLhD9fAT4k1Brd_HHn/view?usp=share_link |\n|MHAD|LFAE|-|https://drive.google.com/file/d/1AVtpKbzqsXdIK-_vHUuQQIGx6Wa5PxS0/view?usp=share_link|\n|MHAD|DM|random|https://drive.google.com/file/d/1BoFPQAeOuHE5wt7h-chhYAO-dU0B1p2y/view?usp=share_link|\n|NATOPS|LFAE|-|https://drive.google.com/file/d/10iyzoYqSwzQ3fZgb6oh3Uay-P7k2A12s/view?usp=share_link|\n|NATOPS|DM|random|https://drive.google.com/file/d/1lSLSzS_KyGvJ7dW3l5hLJLR9k2k8LoU3/view?usp=share_link|\n\nDemo\n-----\n**MUG Dataset**\n\n1. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n2. Run `python -u demo/demo_mug.py` to generate the example videos. Please set the paths in the code files and config file `config/mug128.yaml` if needed. The pretrained models for MUG dataset have released. \n\n**MHAD Dataset**\n\n1. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n2. Run `python -u demo/demo_mhad.py` to generate the example videos. Please set the paths in the code files and config file `config/mhad128.yaml` if needed. The pretrained models for MHAD dataset have released. \n\n**NATOPS Dataset**\n\n1. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n2. Run `python -u demo/demo_natops.py` to generate the example videos. Please set the paths in the code files and config file `config/natops128.yaml` if needed. The pretrained models for NATOPS dataset have released. \n\nTraining LFDM\n----\nThe training of our LFDM includes two stages: 1. train a latent flow autoencoder (LFAE) in an unsupervised fashion. To accelerate the training, we initialize LFAE with the pretrained models provided by MRAA, which can be found in their [github](https://github.com/snap-research/articulated-animation/tree/db2c2135273f601a370e2b62754f9bb56cfd25d5/checkpoints); 2. train a diffusion model (DM) on the latent space of LFAE.\n\n**MUG Dataset**\n\n1. Download MUG dataset from their [website](https://mug.ee.auth.gr/fed/). \n2. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n3. Split the train/test set. You may use the same split as ours, which can be found in `preprocessing/preprocess_MUG.py`.\n4. Run `python -u LFAE/run_mug.py` to train the LFAE. Please set the paths and config file `config/mug128.yaml` if needed. \n5. Once LFAE is trained, you may measure its self-reconstruction performance by running `python -u LFAE/test_flowautoenc_mug.py`.\n6. Run `python -u DM/train_video_flow_diffusion_mug.py` to train the DM. Please set the paths and config file `config/mug128.yaml` if needed. \n7. Once DM is trained, you may test its generation performance by running `python -u DM/test_video_flow_diffusion_mug.py`.\n\n**MHAD Dataset**\n\n1. Download MHAD dataset from their [website](https://personal.utdallas.edu/~kehtar/UTD-MHAD.html). \n2. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n3. Crop the video frames and split the train/test set. You may use the same cropping method and split as ours, which can be found in `preprocessing/preprocess_MHAD.py`.\n4. Run `python -u LFAE/run_mhad.py` to train the LFAE. Please set the paths and config file `config/mhad128.yaml` if needed. \n5. Once LFAE is trained, you may measure its self-reconstruction performance by running `python -u LFAE/test_flowautoenc_mhad.py`.\n6. Run `python -u DM/train_video_flow_diffusion_mhad.py` to train the DM. Please set the paths and config file `config/mhad128.yaml` if needed. \n7. Once DM is trained, you may test its generation performance by running `python -u DM/test_video_flow_diffusion_mhad.py`.\n\n**NATOPS Dataset**\n\n1. Download NATOPS dataset from their [website](https://github.com/yalesong/natops). \n2. Install required dependencies. Here we use Python 3.7.10 and Pytorch 1.12.1, etc.\n3. Segment the video and split the train/test set. You may use the same segmenting method and split as ours, which can be found in `preprocessing/preprocess_NATOPS.py`.\n4. Run `python -u LFAE/run_natops.py` to train the LFAE. Please set the paths and config file `config/natops128.yaml` if needed. \n5. Once LFAE is trained, you may measure its self-reconstruction performance by running `python -u LFAE/test_flowautoenc_natops.py`.\n6. Run `python -u DM/train_video_flow_diffusion_natops.py` to train the DM. Please set the paths and config file `config/natops128.yaml` if needed. \n7. Once DM is trained, you may test its generation performance by running `python -u DM/test_video_flow_diffusion_natops.py`.\n\nCiting LFDM\n-------\nIf you find our approaches useful in your research, please consider citing:\n```\n@inproceedings{ni2023conditional,\n  title={Conditional Image-to-Video Generation with Latent Flow Diffusion Models},\n  author={Ni, Haomiao and Shi, Changhao and Li, Kai and Huang, Sharon X and Min, Martin Renqiang},\n  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n  pages={18444--18455},\n  year={2023}\n}\n```\n\nFor questions with the code, please feel free to open an issue or contact me: homerhm.ni@gmail.com\n\nAcknowledgement\n----\nPart of our code was borrowed from [MRAA](https://github.com/snap-research/articulated-animation), [VDM](https://github.com/lucidrains/video-diffusion-pytorch), and [LDM](https://github.com/CompVis/latent-diffusion). We thank the authors of these repositories for their valuable implementations.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnihaomiao%2FCVPR23_LFDM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnihaomiao%2FCVPR23_LFDM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnihaomiao%2FCVPR23_LFDM/lists"}