https://github.com/vita-group/gnt-move

[ICCV2023] "Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts" by Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma, Yi Wang, Zhangyang Wang
https://github.com/vita-group/gnt-move

3d generalizable-nerf nerf

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/vita-group/gnt-move
Owner: VITA-Group
Created: 2023-08-24T14:40:34.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-08-25T05:49:56.000Z (almost 2 years ago)
Last Synced: 2025-03-29T09:42:03.794Z (2 months ago)
Topics: 3d, generalizable-nerf, nerf
Language: Python
Homepage:
Size: 40.7 MB
Stars: 48
Watchers: 13
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts [ArXiv](https://arxiv.org/abs/2308.11793)
[Wenyan Cong]()^1∗,
[Hanxue Liang]()^2,1∗,
[Peihao Wang](https://peihaowang.github.io/)¹,
[Zhiwen Fan]()¹,
[Tianlong Chen](https://tianlong-chen.github.io/)¹,
[Mukund Varma T](https://mukundvarmat.github.io/)^3,1,
[Yi Wang]()¹,
[Zhangyang Wang](https://vita-group.github.io/)¹

¹University of Texas at Austin, ²University of Cambridge, ³Indian Institute of Technology Madras

^* denotes equal contribution.

This repository is built based on GNT's [offical repository](https://github.com/VITA-Group/GNT)

## Introduction

Cross-scene generalizable NeRF models, which can di- rectly synthesize novel views of unseen scenes, have be- come a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end “neuralized” ar- chitectures, i.e., replacing scene representation and/or ren- dering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed- forward inference pipeline. While those feedforward “neu- ralized” architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a re- cent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to en- hance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respec- tively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of- the-art results when transferring to unseen scenes, indicat- ing remarkably better cross-scene generalization in both zero-shot and few-shot settings.

## Installation

Clone this repository:

```bash
git clone https://github.com/VITA-Group/GNT-MOVE.git
cd GNT-MOVE/
```

The code is tested with python 3.8, cuda == 11.1, pytorch == 1.10.1. Additionally dependencies include:

```bash
torchvision
ConfigArgParse
imageio
matplotlib
numpy
opencv_contrib_python
Pillow
scipy
imageio-ffmpeg
lpips
scikit-image
```

## Datasets

We reuse the training, evaluation datasets from [IBRNet](https://github.com/googleinterns/IBRNet). All datasets must be downloaded to a directory `data/` within the project folder and must follow the below organization.
```bash
├──data/
├──ibrnet_collected_1/
├──ibrnet_collected_2/
├──real_iconic_noface/
├──spaces_dataset/
├──RealEstate10K-subset/
├──google_scanned_objects/
├──nerf_synthetic/
├──nerf_llff_data/
```
We refer to [IBRNet's](https://github.com/googleinterns/IBRNet) repository to download and prepare data. For ease, we consolidate the instructions below:
```bash
mkdir data
cd data/

# IBRNet captures
gdown https://drive.google.com/uc?id=1rkzl3ecL3H0Xxf5WTyc2Swv30RIyr1R_
unzip ibrnet_collected.zip

# LLFF
gdown https://drive.google.com/uc?id=1ThgjloNt58ZdnEuiCeRf9tATJ-HI0b01
unzip real_iconic_noface.zip

## [IMPORTANT] remove scenes that appear in the test set
cd real_iconic_noface/
rm -rf data2_fernvlsb data2_hugetrike data2_trexsanta data3_orchid data5_leafscene data5_lotr data5_redflower
cd ../

# Spaces dataset
git clone https://github.com/augmentedperception/spaces_dataset

# RealEstate 10k
## make sure to install ffmpeg - sudo apt-get install ffmpeg
git clone https://github.com/qianqianwang68/RealEstate10K_Downloader
cd RealEstate10K_Downloader
python3 generate_dataset.py train
cd ../

# Google Scanned Objects
gdown https://drive.google.com/uc?id=1w1Cs0yztH6kE3JIz7mdggvPGCwIKkVi2
unzip google_scanned_objects_renderings.zip

# Blender dataset
gdown https://drive.google.com/uc?id=18JxhpWD-4ZmuFKLzKlAw-w5PpzZxXOcG
unzip nerf_synthetic.zip

# LLFF dataset (eval)
gdown https://drive.google.com/uc?id=16VnMcF1KJYxN9QId6TClMsZRahHNMW5g
unzip nerf_llff_data.zip
```

## Usage

## Cite this work

If you find our work / code implementation useful for your own research, please cite our paper.

```
@inproceedings{
gntmove2023,
title={Enhancing Ne{RF} akin to Enhancing {LLM}s: Generalizable Ne{RF} Transformer with Mixture-of-View-Experts},
author={Wenyan Cong and Hanxue Liang and Peihao Wang and Zhiwen Fan and Tianlong Chen and Mukund Varma and Yi Wang and Zhangyang Wang},
booktitle={ICCV},
year={2023}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vita-group/gnt-move

Awesome Lists containing this project

README