https://github.com/donydchen/mvsplat360
🎞️ [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views
https://github.com/donydchen/mvsplat360
feed-forward-gaussian-splatting gaussian-splatting generative-models neurips-2024 novel-view-synthesis stable-video-diffusion video-diffusion-model
Last synced: about 2 months ago
JSON representation
🎞️ [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views
- Host: GitHub
- URL: https://github.com/donydchen/mvsplat360
- Owner: donydchen
- License: mit
- Created: 2024-11-07T17:47:56.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-12-03T00:08:49.000Z (6 months ago)
- Last Synced: 2025-03-28T11:07:04.310Z (about 2 months ago)
- Topics: feed-forward-gaussian-splatting, gaussian-splatting, generative-models, neurips-2024, novel-view-synthesis, stable-video-diffusion, video-diffusion-model
- Language: Python
- Homepage: https://donydchen.github.io/mvsplat360/
- Size: 1.15 MB
- Stars: 237
- Watchers: 7
- Forks: 9
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
MVSplat360: Feed-Forward 360 Scene Synthesis
from Sparse Views
Yuedong Chen
·
Chuanxia Zheng
·
Haofei Xu
·
Bohan Zhuang
Andrea Vedaldi
·
Tat-Jen Cham
·
Jianfei Cai
NeurIPS 2024
Paper | Project Page | Pretrained Modelshttps://github.com/user-attachments/assets/4cfa6654-5bb5-4f72-a264-6941bcf00bed
## Installation
To get started, create a conda virtual environment using Python 3.10+ and install the requirements:
```bash
conda create -n mvsplat360 python=3.10
conda activate mvsplat360
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 xformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```## Acquiring Datasets
This project mainly uses [DL3DV](https://github.com/DL3DV-10K/Dataset) and [RealEstate10K](https://google.github.io/realestate10k/index.html) datasets.
The dataset structure aligns with our previous work, [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#acquiring-datasets). You may refer to the script [convert_dl3dv.py](src/scripts/convert_dl3dv.py) for converting the DL3DV-10K datasets to the torch chunks used in this project.
You might also want to check out the [DepthSplat's DATASETS.md](https://github.com/cvg/depthsplat/blob/main/DATASETS.md), which provides detailed instructions on pre-processing DL3DV and RealEstate10K for use here (as both projects share the same code base from pixelSplat).
A pre-processed tiny subset of DL3DV (containing 5 scenes) is provided [here](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_tiny.zip) for quick reference. To use it, simply download it and unzip it to `datasets/dl3dv_tiny`.
## Running the Code
### Evaluation
To render novel views,
* get the pre-trained models [dl3dv_480p.ckpt](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_480p.ckpt), and save it to `/checkpoints`
* run the following:
```bash
# dl3dv; requires at least 22G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 \
wandb.name=dl3dv_480P_ctx5_tgt56 \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=checkpoints/dl3dv_480p.ckpt
```* the rendered novel views will be stored under `outputs/test/{wandb.name}`
To evaluate the quantitative performance, kindly refer to [compute_dl3dv_metrics.py](src/scripts/compute_dl3dv_metrics.py)
To render videos from a pre-trained model, run the following
```bash
# dl3dv; requires at least 38G VRAM
python -m src.main +experiment=dl3dv_mvsplat360_video \
wandb.name=dl3dv_480P_ctx5_tgt56_video \
mode=test \
dataset/view_sampler=evaluation \
dataset.roots=[datasets/dl3dv_tiny] \
checkpointing.load=checkpoints/dl3dv_480p.ckpt
```### Training
* Download the encoder pre-trained weight from [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#evaluation) and save it to `checkpoints/re10k.ckpt`.
* Download SVD pre-trained weight from [generative-models](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main) and save it to `checkpoints/svd.safetensors`.
* Run the following:```bash
# train mvsplat360; requires at least 80G VRAM
python -m src.main +experiment=dl3dv_mvsplat360 dataset.roots=[datasets/dl3dv]
```* Alternatively, you can also fine-tune from our released model by appending `checkpointing.load=checkpoints/dl3dv_480p.ckpt` and `checkpointing.resume=false` to the above command.
* You can also set up your wandb account [here](config/main.yaml) for logging. Have fun.## Camera Conventions
The camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). More details are at [this comment](https://github.com/donydchen/mvsplat/issues/28#issuecomment-2126416038).
The camera extrinsic matrices are OpenCV-style camera-to-world matrices (+X right, +Y down, +Z camera looks into the screen).
## BibTeX
```bibtex
@article{chen2024mvsplat360,
title = {MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views},
author = {Chen, Yuedong and Zheng, Chuanxia and Xu, Haofei and Zhuang, Bohan and Vedaldi, Andrea and Cham, Tat-Jen and Cai, Jianfei},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2024},
}
```## Acknowledgements
The project is based on [MVSplat](https://github.com/donydchen/mvsplat), [pixelSplat](https://github.com/dcharatan/pixelsplat), [UniMatch](https://github.com/autonomousvision/unimatch) and [generative-models](https://github.com/Stability-AI/generative-models). Many thanks to these projects for their excellent contributions!