{"id":22381970,"url":"https://github.com/googleinterns/ibrnet","last_synced_at":"2025-04-08T08:13:37.668Z","repository":{"id":44927444,"uuid":"340195082","full_name":"googleinterns/IBRNet","owner":"googleinterns","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-04T22:48:02.000Z","size":6992,"stargazers_count":505,"open_issues_count":11,"forks_count":51,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-01T05:34:13.311Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/googleinterns.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-18T22:33:46.000Z","updated_at":"2025-03-29T08:31:52.000Z","dependencies_parsed_at":"2025-01-06T04:03:09.179Z","dependency_job_id":"19c1458a-04cf-447f-b0ea-8fd31418712c","html_url":"https://github.com/googleinterns/IBRNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleinterns%2FIBRNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleinterns%2FIBRNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleinterns%2FIBRNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleinterns%2FIBRNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/googleinterns","download_url":"https://codeload.github.com/googleinterns/IBRNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801169,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-05T00:11:18.650Z","updated_at":"2025-04-08T08:13:37.633Z","avatar_url":"https://github.com/googleinterns.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IBRNet: Learning Multi-View Image-Based Rendering\nPyTorch implementation of paper \"IBRNet: Learning Multi-View Image-Based Rendering\", CVPR 2021.\n\n\u003e IBRNet: Learning Multi-View Image-Based Rendering  \n\u003e [Qianqian Wang](https://www.cs.cornell.edu/~qqw/), [Zhicheng Wang](https://www.linkedin.com/in/zhicheng-wang-96116897/), [Kyle Genova](https://www.kylegenova.com/), [Pratul Srinivasan](https://pratulsrinivasan.github.io/), [Howard Zhou](https://www.linkedin.com/in/howard-zhou-0a34b84/), [Jonathan T. Barron](https://jonbarron.info), [Ricardo Martin-Brualla](http://www.ricardomartinbrualla.com/), [Noah Snavely](https://www.cs.cornell.edu/~snavely/), [Thomas Funkhouser](https://www.cs.princeton.edu/~funk/)    \n\u003e CVPR 2021\n\u003e \n\n#### [project page](https://ibrnet.github.io/) | [paper](http://arxiv.org/abs/2102.13090) | [data \u0026 model](https://drive.google.com/drive/folders/1I2MTWAJPCoseyaPOmRvpWkxIZq3c5lCu?usp=sharing)\n\n![Demo](assets/ancient.gif)\n\n## Installation\nClone this repo with submodules:\n```\ngit clone --recurse-submodules https://github.com/googleinterns/IBRNet\ncd IBRNet/\n```\n\nThe code is tested with Python3.7, PyTorch == 1.5 and CUDA == 10.2. We recommend you to use [anaconda](https://www.anaconda.com/) to make sure that all dependencies are in place. To create an anaconda environment:\n```\nconda env create -f environment.yml\nconda activate ibrnet\n```\n\n## Datasets\n\n### 1. Training datasets\n```\n├──data/\n    ├──ibrnet_collected_1/\n    ├──ibrnet_collected_2/\n    ├──real_iconic_noface/\n    ├──spaces_dataset/\n    ├──RealEstate10K-subset/\n    ├──google_scanned_objects/\n\n```\nPlease first `cd data/`, and then download datasets into `data/` following the instructions below. The organization of the datasets should be the same as above.\n\n#### (a) **Our captures**\nWe captured 67 forward-facing scenes (each scene contains 20-60 images). To download our data [ibrnet_collected.zip](https://drive.google.com/file/d/1dZZChihfSt9iIzcQICojLziPvX1vejkp/view?usp=sharing) (4.1G) for training, run:\n```\ngdown https://drive.google.com/uc?id=1dZZChihfSt9iIzcQICojLziPvX1vejkp\nunzip ibrnet_collected.zip\n```\n\nP.S. We've captured some more scenes in [ibrnet_collected_more.zip](https://drive.google.com/file/d/1Xsi2170hvm1fpIaP6JI_d9oa0LGThJ7E/view?usp=sharing), but we didn't include them for training. Feel free to download them if you would like more scenes for your task, but you wouldn't need them to reproduce our results.\n#### (b) [**LLFF**](https://bmild.github.io/llff/) released scenes\nDownload and process [real_iconic_noface.zip](https://drive.google.com/file/d/1m6AaHg-NEH3VW3t0Zk9E9WcNp4ZPNopl/view?usp=sharing) (6.6G) using the following commands:\n```angular2\n# download \ngdown https://drive.google.com/uc?id=1m6AaHg-NEH3VW3t0Zk9E9WcNp4ZPNopl\nunzip real_iconic_noface.zip\n\n# [IMPORTANT] remove scenes that appear in the test set\ncd real_iconic_noface/\nrm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower\ncd ../\n``` \n#### (c) [**Spaces Dataset**](https://github.com/augmentedperception/spaces_dataset)\nDownload spaces dataset by:\n```\ngit clone https://github.com/augmentedperception/spaces_dataset\n```\n\n\n#### (d) [**RealEstate10K**](https://google.github.io/realestate10k/)\nThe full RealEstate10K dataset is very large and can be difficult to download.\nHence, we provide a subset of RealEstate10K training scenes containing only 200 scenes. In our experiment, we found using more scenes from RealEstate10K only provides marginal improvement. To download our [camera files](https://drive.google.com/file/d/1IgJIeCPPZ8UZ529rN8dw9ihNi1E9K0hL/view?usp=sharing) (2MB):\n\n```\ngdown https://drive.google.com/uc?id=1IgJIeCPPZ8UZ529rN8dw9ihNi1E9K0hL\nunzip RealEstate10K_train_cameras_200.zip -d RealEstate10K-subset\n```\nBesides the camera files, you also need to download the corresponding video frames from YouTube. You can download the frames (29G) by running the following commands. The script uses `ffmpeg` to extract frames, so please make sure you have [ffmpeg](https://ffmpeg.org/) installed.\n\n```\ngit clone https://github.com/qianqianwang68/RealEstate10K_Downloader\ncd RealEstate10K_Downloader\npython generate_dataset.py train\ncd ../\n```\n\n#### (e) [**Google Scanned Objects**](https://app.ignitionrobotics.org/GoogleResearch/fuel/collections/Google%20Scanned%20Objects)\nGoogle Scanned Objects contain 1032 diffuse objects with various shapes and appearances.\nWe use [gaps](https://github.com/tomfunkhouser/gaps) to render these objects for training. Each object is rendered at 512 × 512 pixels\nfrom viewpoints on a quarter of the sphere. We render 250\nviews for each object. To download [our renderings](https://drive.google.com/file/d/1tKHhH-L1viCvTuBO1xg--B_ioK7JUrrE/view?usp=sharing) (7.5GB), run:\n```\ngdown https://drive.google.com/uc?id=1tKHhH-L1viCvTuBO1xg--B_ioK7JUrrE\nunzip google_scanned_objects_renderings.zip\n```\nThe mapping between our renderings and the public Google Scanned Objects can be found in [this spreadsheet](https://docs.google.com/spreadsheets/d/1JGqJ9vKgZf9gLLUM-KIiRr_ePzJ-2CYRs5daB0qNIPo/edit?usp=sharing\u0026resourcekey=0-aZfNVJQSm9GEIzT1afvx8Q).\n\n### 2. Evaluation datasets\n```\n├──data/\n    ├──deepvoxels/\n    ├──nerf_synthetic/\n    ├──nerf_llff_data/\n```\nThe evaluation datasets include DeepVoxel synthetic dataset, NeRF realistic 360 dataset and the real forward-facing dataset. To download all three datasets (6.7G), run the following command under `data/` directory:\n```\nbash download_eval_data.sh\n```\n\n## Evaluation\nFirst download our pretrained model under the project root directory:\n```\ngdown https://drive.google.com/uc?id=1wNkZkVQGx7rFksnX7uVX3NazrbjqaIgU\nunzip pretrained_model.zip\n```\n\nYou can use `eval/eval.py` to evaluate the pretrained model. For example, to obtain the PSNR, SSIM and LPIPS on the *fern* scene in the real forward-facing dataset, you can first specify your paths in `configs/eval_llff.txt` and then run:\n```\ncd eval/\npython eval.py --config ../configs/eval_llff.txt\n``` \n## Rendering videos of smooth camera paths\nYou can use `render_llff_video.py` to render videos of smooth camera paths for the real forward-facing scenes. For example, you can first specify your paths in `configs/eval_llff.txt` and then run:\n```\ncd eval/\npython render_llff_video.py --config ../configs/eval_llff.txt\n```\nYou can also capture your own data of forward-facing scenes and synthesize novel views using our method. Please follow the instructions from [LLFF](https://github.com/Fyusion/LLFF) on how to capture and process the images. \n\n\n## Training\nWe strongly recommend you to train the model with multiple GPUs:\n```\n# this example uses 8 GPUs (nproc_per_node=8) \npython -m torch.distributed.launch --nproc_per_node=8 train.py --config configs/pretrain.txt\n```\nAlternatively, you can train with a single GPU by setting `distributed=False` in `configs/pretrain.txt` and running:\n```\npython train.py --config configs/pretrain.txt\n```\n\n## Finetuning\nTo finetune on a specific scene, for example, *fern*, using the pretrained model, run:\n```\n# this example uses 2 GPUs (nproc_per_node=2) \npython -m torch.distributed.launch --nproc_per_node=2 train.py --config configs/finetune_llff.txt\n```\n\n## Additional information\n- Our current implementation is not well-optimized in terms of the time efficiency at inference. Rendering a 1000x800 image can take from 30s to over a minute depending on specific GPU models. Please make sure to maximize the GPU memory utilization by increasing the size of the chunk to reduce inference time. You can also try to decrease the number of input source views (but subject to performance loss).  \n- If you want to create and train on your own datasets, you can implement your own Dataset class following our examples in `ibrnet/data_loaders/`. You can verify the camera poses using `data_verifier.py` in `ibrnet/data_loaders/`.\n- Since the evaluation datasets are either object-centric or forward-facing scenes, our provided view selection methods are very simple (based on either viewpoints or camera locations). If you want to evaluate our method on new scenes with other kinds of camera distributions, you might need to implement your own view selection methods to identify the most effective source views.\n- If you have any questions, you can contact qwang423@gmail.com.\n## Citation\n```\n@inproceedings{wang2021ibrnet,\n  author    = {Wang, Qianqian and Wang, Zhicheng and Genova, Kyle and Srinivasan, Pratul and Zhou, Howard  and Barron, Jonathan T. and Martin-Brualla, Ricardo and Snavely, Noah and Funkhouser, Thomas},\n  title     = {IBRNet: Learning Multi-View Image-Based Rendering},\n  booktitle = {CVPR},\n  year      = {2021}\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleinterns%2Fibrnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogleinterns%2Fibrnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleinterns%2Fibrnet/lists"}