{"id":18317346,"url":"https://github.com/compvis/geometry-free-view-synthesis","last_synced_at":"2025-10-07T22:29:54.911Z","repository":{"id":41580839,"uuid":"358208946","full_name":"CompVis/geometry-free-view-synthesis","owner":"CompVis","description":"Is a geometric model required to synthesize novel views from a single image?","archived":false,"fork":false,"pushed_at":"2023-04-16T21:13:47.000Z","size":169379,"stargazers_count":380,"open_issues_count":11,"forks_count":35,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-03-30T19:06:18.963Z","etag":null,"topics":["novel-view-synthesis","transformers"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2104.07652","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CompVis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-15T09:53:33.000Z","updated_at":"2025-03-22T14:46:32.000Z","dependencies_parsed_at":"2024-12-23T19:12:55.842Z","dependency_job_id":"4021e218-a80f-41da-95d4-f4fe59b19737","html_url":"https://github.com/CompVis/geometry-free-view-synthesis","commit_stats":{"total_commits":9,"total_committers":1,"mean_commits":9.0,"dds":0.0,"last_synced_commit":"00dc639c98dfb9246bee0009649c5be8f8b58e1e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fgeometry-free-view-synthesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fgeometry-free-view-synthesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fgeometry-free-view-synthesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CompVis%2Fgeometry-free-view-synthesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CompVis","download_url":"https://codeload.github.com/CompVis/geometry-free-view-synthesis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247543595,"owners_count":20955865,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["novel-view-synthesis","transformers"],"created_at":"2024-11-05T18:05:51.438Z","updated_at":"2025-10-07T22:29:54.797Z","avatar_url":"https://github.com/CompVis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Geometry-Free View Synthesis: Transformers and no 3D Priors\n![teaser](assets/firstpage.jpg)\n\n[**Geometry-Free View Synthesis: Transformers and no 3D Priors**](https://compvis.github.io/geometry-free-view-synthesis/)\u003cbr/\u003e\n[Robin Rombach](https://github.com/rromb)\\*,\n[Patrick Esser](https://github.com/pesser)\\*,\n[Björn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)\u003cbr/\u003e\n\\* equal contribution\n\n[arXiv](https://arxiv.org/abs/2104.07652) | [BibTeX](#bibtex) | [Colab](https://colab.research.google.com/github/CompVis/geometry-free-view-synthesis/blob/master/scripts/braindance.ipynb)\n\n### Interactive Scene Exploration Results\n\n[RealEstate10K](https://google.github.io/realestate10k/):\u003cbr/\u003e\n\u003ca href=\"assets/realestate_short.mp4\"\u003e![realestate](assets/realestate_preview.gif)\u003c/a\u003e\u003cbr/\u003e\nVideos: [short (2min)](assets/realestate_short.mp4) / [long (12min)](assets/realestate_long.mp4)\n\n[ACID](https://infinite-nature.github.io/):\u003cbr/\u003e\n\u003ca href=\"assets/acid_short.mp4\"\u003e![acid](assets/acid_preview.gif)\u003c/a\u003e\u003cbr/\u003e\nVideos: [short (2min)](assets/acid_short.mp4) / [long (9min)](assets/acid_long.mp4)\n\n### Demo\n\nFor a quickstart, you can try the [Colab\ndemo](https://colab.research.google.com/github/CompVis/geometry-free-view-synthesis/blob/master/scripts/braindance.ipynb),\nbut for a smoother experience we recommend installing the local demo as\ndescribed below.\n\n#### Installation\n\nThe demo requires building a PyTorch extension. If you have a sane development\nenvironment with PyTorch, g++ and nvcc, you can simply\n\n```\npip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis\n```\n\nIf you run into problems and have a GPU with compute capability below 8, you\ncan also use the provided conda environment:\n\n```\ngit clone https://github.com/CompVis/geometry-free-view-synthesis\nconda env create -f geometry-free-view-synthesis/environment.yaml\nconda activate geofree\npip install geometry-free-view-synthesis/\n```\n\n#### Running\n\nAfter [installation](#installation), running\n\n```\nbraindance.py\n```\n\nwill start the demo on [a sample scene](http://walledoffhotel.com/rooms.html).\nExplore the scene interactively using the `WASD` keys to move and `arrow keys` to\nlook around. Once positioned, hit the `space bar` to render the novel view with\nGeoGPT.\n\nYou can move again with WASD keys. Mouse control can be activated with the m\nkey. Run `braindance.py \u003cfolder to select image from/path to image\u003e` to run the\ndemo on your own images. By default, it uses the `re-impl-nodepth` (trained on\nRealEstate without explicit transformation and no depth input) which can be\nchanged with the `--model` flag. The corresponding checkpoints will be\ndownloaded the first time they are required. Specify an output path using\n`--video path/to/vid.mp4` to record a video.\n\n```\n\u003e braindance.py -h\nusage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]\n\nWhat's up, BD-maniacs?\n\nkey(s)       action                  \n=====================================\nwasd         move around             \narrows       look around             \nm            enable looking with mouse\nspace        render with transformer \nq            quit                    \n\npositional arguments:\n  path                  path to image or directory from which to select image. Default example is used if not specified.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}\n                        pretrained model to use.\n  --video [VIDEO]       path to write video recording to. (no recording if unspecified).\n```\n\n## Training\n\n### Data Preparation\n\nWe support training on [RealEstate10K](https://google.github.io/realestate10k/)\nand [ACID](https://infinite-nature.github.io/). Both come in the same [format as\ndescribed here](https://google.github.io/realestate10k/download.html) and the\npreparation is the same for both of them. You will need to have\n[`colmap`](https://github.com/colmap/colmap) installed and available on your\n`$PATH`.\n\nWe assume that you have extracted the `.txt` files of the dataset you want to\nprepare into `$TXT_ROOT`, e.g. for RealEstate:\n\n```\n\u003e tree $TXT_ROOT\n├── test\n│   ├── 000c3ab189999a83.txt\n│   ├── ...\n│   └── fff9864727c42c80.txt\n└── train\n    ├── 0000cc6d8b108390.txt\n    ├── ...\n    └── ffffe622a4de5489.txt\n```\n\nand that you have downloaded the frames (we downloaded them in resolution `640\nx 360`) into `$IMG_ROOT`, e.g. for RealEstate:\n\n```\n\u003e tree $IMG_ROOT\n├── test\n│   ├── 000c3ab189999a83\n│   │   ├── 45979267.png\n│   │   ├── ...\n│   │   └── 55255200.png\n│   ├── ...\n│   ├── 0017ce4c6a39d122\n│   │   ├── 40874000.png\n│   │   ├── ...\n│   │   └── 48482000.png\n├── train\n│   ├── ...\n```\n\nTo prepare the `$SPLIT` split of the dataset (`$SPLIT` being one of `train`,\n`test` for RealEstate and `train`, `test`, `validation` for ACID) in\n`$SPA_ROOT`, run the following within the `scripts` directory:\n\n```\npython sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}\n```\n\nYou can also simply set `TXT_ROOT`, `IMG_ROOT` and `SPA_ROOT` as environment\nvariables and run `./sparsify_realestate.sh` or `./sparsify_acid.sh`. Take a\nlook into the sources to run with multiple workers in parallel.\n\nFinally, symlink `$SPA_ROOT` to `data/realestate_sparse`/`data/acid_sparse`.\n\n### First Stage Models\nAs described in [our paper](https://arxiv.org/abs/2104.07652), we train the transformer models in \na compressed, discrete latent space of pretrained VQGANs. These pretrained models can be conveniently\ndownloaded by running\n```\npython scripts/download_vqmodels.py \n```\nwhich will also create symlinks ensuring that the paths specified in the training configs (see `configs/*`) exist.\nIn case some of the models have already been downloaded, the script will only create the symlinks.\n\nFor training custom first stage models, we refer to the [taming transformers\nrepository](https://github.com/CompVis/taming-transformers).\n\n### Running the Training\nAfter both the preparation of the data and the first stage models are done, \nthe experiments on ACID and RealEstate10K as described in our paper can be reproduced by running\n```\npython geofree/main.py --base configs/\u003cdataset\u003e/\u003cdataset\u003e_13x23_\u003cexperiment\u003e.yaml -t --gpus 0,\n```\nwhere `\u003cdataset\u003e` is one of `realestate`/`acid` and `\u003cexperiment\u003e` is one of \n`expl_img`/`expl_feat`/`expl_emb`/`impl_catdepth`/`impl_depth`/`impl_nodepth`/`hybrid`. \nThese abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)\n\n![variants](assets/geofree_variants.png)\n\nNote that each experiment was conducted on a GPU with 40 GB VRAM.\n\n## BibTeX\n\n```\n@misc{rombach2021geometryfree,\n      title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, \n      author={Robin Rombach and Patrick Esser and Björn Ommer},\n      year={2021},\n      eprint={2104.07652},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Fgeometry-free-view-synthesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcompvis%2Fgeometry-free-view-synthesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcompvis%2Fgeometry-free-view-synthesis/lists"}