{"id":19186797,"url":"https://github.com/donydchen/mvsplat360","last_synced_at":"2025-04-04T12:06:02.529Z","repository":{"id":261718636,"uuid":"884964404","full_name":"donydchen/mvsplat360","owner":"donydchen","description":"🎞️ [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views","archived":false,"fork":false,"pushed_at":"2024-12-03T00:08:49.000Z","size":1207,"stargazers_count":237,"open_issues_count":8,"forks_count":9,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-28T11:07:04.310Z","etag":null,"topics":["feed-forward-gaussian-splatting","gaussian-splatting","generative-models","neurips-2024","novel-view-synthesis","stable-video-diffusion","video-diffusion-model"],"latest_commit_sha":null,"homepage":"https://donydchen.github.io/mvsplat360/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/donydchen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-07T17:47:56.000Z","updated_at":"2025-03-23T17:32:15.000Z","dependencies_parsed_at":"2025-02-24T16:09:38.471Z","dependency_job_id":"d73e6959-255f-4e16-9aea-5a682b63e48f","html_url":"https://github.com/donydchen/mvsplat360","commit_stats":null,"previous_names":["donydchen/mvsplat360"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donydchen%2Fmvsplat360","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donydchen%2Fmvsplat360/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donydchen%2Fmvsplat360/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donydchen%2Fmvsplat360/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/donydchen","download_url":"https://codeload.github.com/donydchen/mvsplat360/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247174407,"owners_count":20896076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feed-forward-gaussian-splatting","gaussian-splatting","generative-models","neurips-2024","novel-view-synthesis","stable-video-diffusion","video-diffusion-model"],"created_at":"2024-11-09T11:16:46.819Z","updated_at":"2025-04-04T12:06:02.511Z","avatar_url":"https://github.com/donydchen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eMVSplat360: Feed-Forward 360 Scene Synthesis \u003cbr\u003e from Sparse Views\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://donydchen.github.io/\"\u003eYuedong Chen\u003c/a\u003e\n    \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://chuanxiaz.com/\"\u003eChuanxia Zheng\u003c/a\u003e\n    \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://haofeixu.github.io/\"\u003eHaofei Xu\u003c/a\u003e\n    \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://bohanzhuang.github.io/\"\u003eBohan Zhuang\u003c/a\u003e \u003cbr\u003e\n    \u003ca href=\"https://www.robots.ox.ac.uk/~vedaldi/\"\u003eAndrea Vedaldi\u003c/a\u003e\n    \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://personal.ntu.edu.sg/astjcham/\"\u003eTat-Jen Cham\u003c/a\u003e\n    \u0026nbsp;·\u0026nbsp;\n    \u003ca href=\"https://jianfei-cai.github.io/\"\u003eJianfei Cai\u003c/a\u003e\n  \u003c/p\u003e\n  \u003ch3 align=\"center\"\u003eNeurIPS 2024\u003c/h3\u003e\n  \u003ch3 align=\"center\"\u003e\u003ca href=\"https://arxiv.org/abs/2411.04924\"\u003ePaper\u003c/a\u003e | \u003ca href=\"https://donydchen.github.io/mvsplat360/\"\u003eProject Page\u003c/a\u003e | \u003ca href=\"https://huggingface.co/donydchen/mvsplat360/tree/main\"\u003ePretrained Models\u003c/a\u003e \u003c/h3\u003e\n\u003c/p\u003e\n\nhttps://github.com/user-attachments/assets/4cfa6654-5bb5-4f72-a264-6941bcf00bed\n\n## Installation\n\nTo get started, create a conda virtual environment using Python 3.10+ and install the requirements:\n\n```bash\nconda create -n mvsplat360 python=3.10\nconda activate mvsplat360\npip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 xformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu118\npip install -r requirements.txt\n```\n\n## Acquiring Datasets\n\nThis project mainly uses [DL3DV](https://github.com/DL3DV-10K/Dataset) and [RealEstate10K](https://google.github.io/realestate10k/index.html) datasets.\n\nThe dataset structure aligns with our previous work, [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#acquiring-datasets). You may refer to the script [convert_dl3dv.py](src/scripts/convert_dl3dv.py) for converting the DL3DV-10K datasets to the torch chunks used in this project.\n\nYou might also want to check out the [DepthSplat's DATASETS.md](https://github.com/cvg/depthsplat/blob/main/DATASETS.md), which provides detailed instructions on pre-processing DL3DV and RealEstate10K for use here (as both projects share the same code base from pixelSplat).\n\nA pre-processed tiny subset of DL3DV (containing 5 scenes) is provided [here](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_tiny.zip) for quick reference. To use it, simply download it and unzip it to `datasets/dl3dv_tiny`.\n\n\n## Running the Code\n\n### Evaluation\n\nTo render novel views,\n\n* get the pre-trained models [dl3dv_480p.ckpt](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_480p.ckpt), and save it to `/checkpoints`\n\n* run the following:\n\n```bash\n# dl3dv; requires at least 22G VRAM\npython -m src.main +experiment=dl3dv_mvsplat360 \\\nwandb.name=dl3dv_480P_ctx5_tgt56 \\\nmode=test \\\ndataset/view_sampler=evaluation \\\ndataset.roots=[datasets/dl3dv_tiny] \\\ncheckpointing.load=checkpoints/dl3dv_480p.ckpt\n```\n\n* the rendered novel views will be stored under `outputs/test/{wandb.name}`\n\nTo evaluate the quantitative performance, kindly refer to [compute_dl3dv_metrics.py](src/scripts/compute_dl3dv_metrics.py)\n\nTo render videos from a pre-trained model, run the following\n\n```bash\n# dl3dv; requires at least 38G VRAM\npython -m src.main +experiment=dl3dv_mvsplat360_video \\\nwandb.name=dl3dv_480P_ctx5_tgt56_video \\\nmode=test \\\ndataset/view_sampler=evaluation \\\ndataset.roots=[datasets/dl3dv_tiny] \\\ncheckpointing.load=checkpoints/dl3dv_480p.ckpt \n```\n\n### Training\n\n* Download the encoder pre-trained weight from [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#evaluation) and save it to `checkpoints/re10k.ckpt`.\n* Download SVD pre-trained weight from [generative-models](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main) and save it to `checkpoints/svd.safetensors`.\n* Run the following:\n\n```bash\n# train mvsplat360; requires at least 80G VRAM\npython -m src.main +experiment=dl3dv_mvsplat360 dataset.roots=[datasets/dl3dv]\n```\n\n* Alternatively, you can also fine-tune from our released model by appending `checkpointing.load=checkpoints/dl3dv_480p.ckpt` and `checkpointing.resume=false` to the above command. \n* You can also set up your wandb account [here](config/main.yaml) for logging. Have fun.\n\n## Camera Conventions\n\nThe camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). More details are at [this comment](https://github.com/donydchen/mvsplat/issues/28#issuecomment-2126416038).\n\nThe camera extrinsic matrices are OpenCV-style camera-to-world matrices (+X right, +Y down, +Z camera looks into the screen).\n\n## BibTeX\n\n```bibtex\n@article{chen2024mvsplat360,\n    title     = {MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views},\n    author    = {Chen, Yuedong and Zheng, Chuanxia and Xu, Haofei and Zhuang, Bohan and Vedaldi, Andrea and Cham, Tat-Jen and Cai, Jianfei},\n    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},\n    year      = {2024},\n}\n```\n\n## Acknowledgements\n\nThe project is based on [MVSplat](https://github.com/donydchen/mvsplat), [pixelSplat](https://github.com/dcharatan/pixelsplat), [UniMatch](https://github.com/autonomousvision/unimatch) and [generative-models](https://github.com/Stability-AI/generative-models). Many thanks to these projects for their excellent contributions!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonydchen%2Fmvsplat360","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdonydchen%2Fmvsplat360","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonydchen%2Fmvsplat360/lists"}