https://github.com/donydchen/mvsplat360

🎞️ [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views
https://github.com/donydchen/mvsplat360

feed-forward-gaussian-splatting gaussian-splatting generative-models neurips-2024 novel-view-synthesis stable-video-diffusion video-diffusion-model

Last synced: about 2 months ago
JSON representation

🎞️ [NeurIPS'24] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

Host: GitHub
URL: https://github.com/donydchen/mvsplat360
Owner: donydchen
License: mit
Created: 2024-11-07T17:47:56.000Z (7 months ago)
Default Branch: main
Last Pushed: 2024-12-03T00:08:49.000Z (6 months ago)
Last Synced: 2025-03-28T11:07:04.310Z (about 2 months ago)
Topics: feed-forward-gaussian-splatting, gaussian-splatting, generative-models, neurips-2024, novel-view-synthesis, stable-video-diffusion, video-diffusion-model
Language: Python
Homepage: https://donydchen.github.io/mvsplat360/
Size: 1.15 MB
Stars: 237
Watchers: 7
Forks: 9
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


  
MVSplat360: Feed-Forward 360 Scene Synthesis 
 from Sparse Views

  

    Yuedong Chen

     · 

    Chuanxia Zheng

     · 

    Haofei Xu

     · 

    Bohan Zhuang 


    Andrea Vedaldi

     · 

    Tat-Jen Cham

     · 

    Jianfei Cai

  

  NeurIPS 2024

  

Paper | Project Page | Pretrained Models 


https://github.com/user-attachments/assets/4cfa6654-5bb5-4f72-a264-6941bcf00bed

## Installation

To get started, create a conda virtual environment using Python 3.10+ and install the requirements:

```bash

conda create -n mvsplat360 python=3.10

conda activate mvsplat360

pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 xformers==0.0.25.post1 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

```

## Acquiring Datasets

This project mainly uses [DL3DV](https://github.com/DL3DV-10K/Dataset) and [RealEstate10K](https://google.github.io/realestate10k/index.html) datasets.

The dataset structure aligns with our previous work, [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#acquiring-datasets). You may refer to the script [convert_dl3dv.py](src/scripts/convert_dl3dv.py) for converting the DL3DV-10K datasets to the torch chunks used in this project.

You might also want to check out the [DepthSplat's DATASETS.md](https://github.com/cvg/depthsplat/blob/main/DATASETS.md), which provides detailed instructions on pre-processing DL3DV and RealEstate10K for use here (as both projects share the same code base from pixelSplat).

A pre-processed tiny subset of DL3DV (containing 5 scenes) is provided [here](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_tiny.zip) for quick reference. To use it, simply download it and unzip it to `datasets/dl3dv_tiny`.

## Running the Code

### Evaluation

To render novel views,

* get the pre-trained models [dl3dv_480p.ckpt](https://huggingface.co/donydchen/mvsplat360/blob/main/dl3dv_480p.ckpt), and save it to `/checkpoints`

* run the following:

```bash

# dl3dv; requires at least 22G VRAM

python -m src.main +experiment=dl3dv_mvsplat360 \

wandb.name=dl3dv_480P_ctx5_tgt56 \

mode=test \

dataset/view_sampler=evaluation \

dataset.roots=[datasets/dl3dv_tiny] \

checkpointing.load=checkpoints/dl3dv_480p.ckpt

```

* the rendered novel views will be stored under `outputs/test/{wandb.name}`

To evaluate the quantitative performance, kindly refer to [compute_dl3dv_metrics.py](src/scripts/compute_dl3dv_metrics.py)

To render videos from a pre-trained model, run the following

```bash

# dl3dv; requires at least 38G VRAM

python -m src.main +experiment=dl3dv_mvsplat360_video \

wandb.name=dl3dv_480P_ctx5_tgt56_video \

mode=test \

dataset/view_sampler=evaluation \

dataset.roots=[datasets/dl3dv_tiny] \

checkpointing.load=checkpoints/dl3dv_480p.ckpt 

```

### Training

* Download the encoder pre-trained weight from [MVSplat](https://github.com/donydchen/mvsplat?tab=readme-ov-file#evaluation) and save it to `checkpoints/re10k.ckpt`.

* Download SVD pre-trained weight from [generative-models](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/tree/main) and save it to `checkpoints/svd.safetensors`.

* Run the following:

```bash

# train mvsplat360; requires at least 80G VRAM

python -m src.main +experiment=dl3dv_mvsplat360 dataset.roots=[datasets/dl3dv]

```

* Alternatively, you can also fine-tune from our released model by appending `checkpointing.load=checkpoints/dl3dv_480p.ckpt` and `checkpointing.resume=false` to the above command. 

* You can also set up your wandb account [here](config/main.yaml) for logging. Have fun.

## Camera Conventions

The camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). More details are at [this comment](https://github.com/donydchen/mvsplat/issues/28#issuecomment-2126416038).

The camera extrinsic matrices are OpenCV-style camera-to-world matrices (+X right, +Y down, +Z camera looks into the screen).

## BibTeX

```bibtex

@article{chen2024mvsplat360,

    title     = {MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views},

    author    = {Chen, Yuedong and Zheng, Chuanxia and Xu, Haofei and Zhuang, Bohan and Vedaldi, Andrea and Cham, Tat-Jen and Cai, Jianfei},

    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},

    year      = {2024},

}

```

## Acknowledgements

The project is based on [MVSplat](https://github.com/donydchen/mvsplat), [pixelSplat](https://github.com/dcharatan/pixelsplat), [UniMatch](https://github.com/autonomousvision/unimatch) and [generative-models](https://github.com/Stability-AI/generative-models). Many thanks to these projects for their excellent contributions!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/donydchen/mvsplat360

Awesome Lists containing this project

README

MVSplat360: Feed-Forward 360 Scene Synthesis
from Sparse Views

NeurIPS 2024

Paper | Project Page | Pretrained Models