{"id":18317355,"url":"https://github.com/compvis/interactive-image2video-synthesis","last_synced_at":"2025-04-05T21:32:21.329Z","repository":{"id":65983448,"uuid":"352608749","full_name":"CompVis/interactive-image2video-synthesis","owner":"CompVis","description":null,"archived":false,"fork":false,"pushed_at":"2022-12-18T15:17:04.000Z","size":65361,"stargazers_count":58,"open_issues_count":1,"forks_count":16,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-21T12:07:27.340Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompVis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-29T10:49:28.000Z","updated_at":"2024-08-12T06:24:01.000Z","dependencies_parsed_at":"2023-02-19T19:31:24.874Z","dependency_job_id":null,"html_url":"https://github.com/CompVis/interactive-image2video-synthesis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Finteractive-image2video-synthesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Finteractive-image2video-synthesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Finteractive-image2video-synthesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Finteractive-image2video-synthesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompVis","download_url":"https://codeload.github.com/CompVis/interactive-image2video-synthesis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247406080,"owners_count":20933803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T18:05:53.883Z","updated_at":"2025-04-05T21:32:17.297Z","avatar_url":"https://github.com/CompVis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Interactive Image2Video-Synthesis\n\nOfficial Pytorch Implementation of our CVPR21 paper [Understanding Object Dynamics for Interactive Image-to-Video Synthesis](https://arxiv.org/abs/2106.11303), where we enable human users to interact with still images.\n\n![teaser](images/teaser.gif )\n\n\n## [**Arxiv**](https://arxiv.org/abs/2106.11303) | [**Project page**](https://compvis.github.io/interactive-image2video-synthesis/) | [**BibTeX**](#bibtex)\n\n[Andreas Blattmann](https://www.linkedin.com/in/andreas-blattmann-479038186/?originalSubdomain=de),\n[Timo Milbich](https://timomilbich.github.io/),\n[Michael Dorkenwald](https://mdork.github.io/),\n[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer),\n[CVPR 2021](http://cvpr2021.thecvf.com/)\u003cbr/\u003e\n\n\n**TL;DR** We introduce the novel problem of Interactive Image-to-Video Synthesis where we learn to understand the relations between the distinct body parts of articulated objects from unlabeled video data. Our proposed model allows for synthesis of videos showing natural object dynamics as responses to targeted, local interactions.and, thus, enables human users to interact with still images by poking pixels.\n\n\n![overview](images/overview.png \"Overview over our model.\")\n\n\n## Table of contents ##\n1. [Requirements](#Requirements)\n2. [Data preparation](#data_prep)\n3. [Pretrained Models](#pretrained)\n4. [Train your own II2V model](#training)\n5. [BibTeX](#bibtex)\n\n## Requirements \u003ca name=\"Requirements\"\u003e\u003c/a\u003e\nA suitable conda environment named ``ii2v`` can be created with\n\n````shell script\nconda env create -f ii2v.yml \nconda activate ii2v\n````\n\n## Data preparation \u003ca name=\"data_prep\"\u003e\u003c/a\u003e\n\n### Get Flownet2 for optical flow estimation ###\n\nAs preparing the data to evaluate our pretrained models or train new ones requires to estimate optical flow maps, first add [Flownet2](https://github.com/NVIDIA/flownet2-pytorch) as a git submodule and place it in the directory ``models/flownet2`` via\n\n```shell script\ngit submodule add https://github.com/NVIDIA/flownet2-pytorch models/flownet2\n``` \n\nSince Flownet2 requires cuda-10.0 and is therefore not compatible with our main conda environment, we provide a separate conda enviroment for optical flow estimation which can bet created via\n\n```shell script\nconda env create -f flownet2\n```\nYou can activate the environment and specify the right cuda version by using \n\n```shell script\nsource activate_flownet2\n``` \nfrom the root of this repository. IMPORTANT: You have to ensure that lines 3 and 4 in the script add your respective ``cuda-10.0`` installation direcories to the ``PATH`` and ``LD_LIBRARY_PATH`` environment variables.\nFinally, you have to build the custom layers of flownet2 with\n\n```shell script\ncd models/flownet2\nbash install.sh -ccbin \u003cPATH TO_GCC7\u003e\n```\n, where ``\u003cPATH TO_GCC7\u003e`` is the path to your ``gcc-7``-binary, which is usually ``/usr/bin/gcc-7`` on a linux server. Make sure that your ``flownet2`` environment is activated and that the env-variables contain the ``cuda-10.0`` installation when running the script.\n   \n\n### Poking Plants ###\n\nDownload Poking Plants dataset from [here](https://heibox.uni-heidelberg.de/d/71de55de923646509bc4/) and extract it to a ``\u003cTARGETDIR\u003e``, which then contains the raw video files. \nTo extract the multi-zip file, use \n\n```shell script\nzip -s 0 poking_plants.zip --out poking_plants_unsplit.zip\nunzip poking_plants_unsplit.zip\n```\n\nTo extract the individual frames and estimate optical flow set the value of the field \n``raw_dir`` in ``config/data_preparation/plants.yaml`` to be ``\u003cTARGETDIR\u003e``, define the target location for the extracted frames (, where all frames of each video will be within a unique directory) via the field ``processed_dir`` and run\n\n````shell script\nsource activate_flownet2\npython -m utils.prepare_dataset --config config/data_preparation/plants.yaml\n````\nBy defining the number of parallel runs of flownet2, which will be distributed among the gpus with the ids specified in ``target_gpus``, with the ``num_workers``-argument, you can significantly speed up the optical flow estimation.  \n### iPER ###\n\nDownload the zipped videos in ```iPER_1024_video_release.zip``` from [this website](https://onedrive.live.com/?authkey=%21AJL%5FNAQMkdXGPlA\u0026id=3705E349C336415F%2188052\u0026cid=3705E349C336415F) \nwebsite (note that you have to create a microsoft account to get access) and extract the archive to a ```\u003cTARGETDIR\u003e``` similar to the above example. There, you'll also find the ``train.txt`` and ``val.txt``. Download these files and save them in the ``\u003cTARGETDIR\u003e`` \nAgain, set the undefined value of the field ``raw_dir`` in ``config/data_preparation/iper.yaml`` to be ``\u003cTARGETDIR\u003e``, define the target location for the extracted frames and the optical flow via ``processed_dir`` and run \n```shell script\npython -m utils.prepare_dataset --config config/data_preparation/iper.yaml\n``` \nwith the ````flownet2```` environment activated. \n\n### Human3.6m ###\n\nFirstly, you will need to create an account at [the homepage of the Human3.6m dataset](http://vision.imar.ro/human3.6m/) to gain access to the dataset. After your account is created and approved (takes a couple of hours), log in and inspect your cookies to find your `PHPSESSID`. \nFill in that `PHPSESSID` in `data/config.ini` and also specify the `TARGETDIR` there, where the extracted videos will be later stored. After setting the field `processed_dir` in `config/data_preparation/human36m.yaml`, you can download and extract the videos via\n```shell script\npython -m data.human36m_preprocess\n```\nwith the ````flownet2```` environment activated. \nFrame extraction and optical flow estimation are then done as usual with\n```shell script\npython -m data.prepare_dataset --config config/data_preparation/human36m.yaml\n```\n\n### TaiChi-HD ###\n\nTo download and extract the videos, follow the steps listed at the [download page](https://github.com/AliaksandrSiarohin/first-order-model/tree/master/data/taichi-loading) for this dataset and set the `out_folder` argument of the script `load_videos.py` to be our `\u003cTARGETDIR\u003e` from the above examples. Again set the fields `raw_dir` and `processed_dir` in `config/data_preparation/taichi.yaml` similar to the above examples and run\n```shell script\npython -m data.prepare_dataset --config config/data_preparation/taichi.yaml\n```\nwith the `flownet2` environment activated to extract the individual frames and estimate the optical flow maps.\n## Pretrained models \u003ca name=\"pretrained\"\u003e\u003c/a\u003e\n\n### Get the checkpoints ###\n\n Here's a list of all available pretrained models. Note that the list will be updated soon, as we then also provide the pretrained models for the additional examples in [the supplementary](https://openaccess.thecvf.com/content/CVPR2021/supplemental/Blattmann_Understanding_Object_Dynamics_CVPR_2021_supplemental.zip)\n\n| Dataset  | Video resolution | Link |  FVD \n|----------|----------|----------|--------- |\n| Poking Plants | 128 x 128 | [plants_128x128](https://heibox.uni-heidelberg.de/d/25d9afb4743446709f73/) | 174.18 |\n| Poking Plants | 64 x 64 | [plants_64x64](https://heibox.uni-heidelberg.de/d/0ae26899aed6443ebdec/) | 89.76 |\n| iPER | 128 x 128 | [iper_128x128](https://heibox.uni-heidelberg.de/d/0695ee70557c4f90bcbe/) | 220.34 |\n| iPER | 64 x 64 | [iper_64x64](https://heibox.uni-heidelberg.de/d/8486eafdfea2405d9ead/) | 144.92 |\n| Human3.6m | 128 x 128 | [h36m_128x128](https://heibox.uni-heidelberg.de/d/1956b6e6afbb4bb681d2/) | 129.62 |\n| Human3.6m | 64 x 64 | [h36m_64x64](https://heibox.uni-heidelberg.de/d/db59ab4cd2624dce99ed/)| 119.89 |\n| TaiChi-HD | 128 x 128 | [taichi_128x128](https://heibox.uni-heidelberg.de/d/98d376baafe64a828093/) | 167.94 |\n| TaiChi-HD | 64 x 64 | [taichi_64x64](https://heibox.uni-heidelberg.de/d/2b7873d9620642d28c21/) | 182.28 |\n\nDownload the data to a `\u003cMODELDIR\u003e` by selecting all items visible under the respective link and clicking on the green 'ZIP Selected Items'. **IMPORTANT:** To ensure smooth and automatic evaluation, choose the name for the resulting zip-file to be the name of the respective link in the above table.\n\n### Evaluate pretrained models ###\n\nAll provided pretrained models can be evaluated with the command\n```shell script\nconda activate ii2v\npython -m utils.eval_pretrained --base_dir \u003cMODELDIR\u003e --mode \u003c[metrics,fvd]\u003e --gpu \u003cGPUID\u003e\n``` \n, where `--mode fvd` will extract samples for calculating the FVD score (for details on its calculation see below) and save them in `\u003cMODELDIR\u003e/\u003cNAME OF LINK IN TABLE\u003e/generated/samples_fvd` and `--mode metrics` will evaluate the model wrt. the remaining metrics which we reported in the paper.\n\n### FVD evaluation ###\nAs the FVD implementation requires `tensorflow\u003c=1.15`, we again created a separate conda environment to evaluate the models wrt. the this score, which can be initialized \nand activated by using \n```shell script\nconda env create -f environement_fvd.yml\nconda activate fvd\n``` \nYou can calculate the FVD-score of a model with\n ```shell script\npython -m utils.metric_fvd --gpu \u003cGPUID\u003e --source \u003cMODELDIR\u003e/\u003cNAME OF LINK IN TABLE\u003e/generated/samples_fvd\n```\nNote that the samples have to be written to `\u003cMODELDIR\u003e/\u003cNAME OF LINK IN TABLE\u003e/generated/samples_fvd` when running the script. \n\n## Train your own II2V model \u003ca name=\"training\"\u003e\u003c/a\u003e\n\nTo train your own model on one of the provided datasets, you'll have to adapt the fields\n* `base_dir` : The base directory where all logs, config-files, checkpoints and results will be stored (we recommend not to change this once you've defined it) \n* `dataset` : The considered dataset, shall be in `['PlantDataset, IperDataset, Human36mDataset, TaichiDataset]`\n* `datapath`: `\u003cTARGETDIR\u003e` from above for the respective dataset\n\nin the config file `config/fixed_length_model.yaml`.  \n\nAfter that, you can start training by running\n```shell script\npython main.py --config config/fixed_length_model.yaml --project_name \u003cUNIQUE_PROJECT_NAME\u003e --gpu \u003cGPUID\u003e --mode \u003c[train, test]\u003e.\n```\n\nTo evaluate the model after training, run \n\n```shell script\npython -m utils.eval_models --base_dir \u003cbase_dir field from the respective config\u003e --mode \u003c[metrics,fvd]\u003e --gpu \u003cGPUID\u003e\n```\n\n\n\n## BibTeX \u003ca name=\"bibtex\"\u003e\u003c/a\u003e\n\n```\n@InProceedings{Blattmann_2021_CVPR,\n    author    = {Blattmann, Andreas and Milbich, Timo and Dorkenwald, Michael and Ommer, Bjorn},\n    title     = {Understanding Object Dynamics for Interactive Image-to-Video Synthesis},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2021},\n    pages     = {5171-5181}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Finteractive-image2video-synthesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompvis%2Finteractive-image2video-synthesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Finteractive-image2video-synthesis/lists"}