https://github.com/apple/ml-pgdvs

[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"
https://github.com/apple/ml-pgdvs

3d-vision dynamic-view-synthesis neural-rendering novel-view-synthesis view-synthesis

Last synced: 8 months ago
JSON representation

[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"

Host: GitHub
URL: https://github.com/apple/ml-pgdvs
Owner: apple
License: other
Created: 2024-03-13T22:34:13.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-21T22:39:48.000Z (12 months ago)
Last Synced: 2025-01-30T07:42:33.714Z (9 months ago)
Topics: 3d-vision, dynamic-view-synthesis, neural-rendering, novel-view-synthesis, view-synthesis
Language: Python
Homepage: https://xiaoming-zhao.github.io/projects/pgdvs/
Size: 1.43 MB
Stars: 29
Watchers: 9
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          
Pseudo-Generalized Dynamic View Synthesis


ICLR 2024




  



**Pseudo-Generalized Dynamic View Synthesis from a Video, ICLR 2024.**


[Xiaoming Zhao](https://xiaoming-zhao.com/), [Alex Colburn](https://www.colburn.org/), [Fangchang Ma](https://fangchangma.github.io/), [Miguel Ángel Bautista](https://scholar.google.com/citations?user=ZrRs-qoAAAAJ&hl=en), [Joshua M Susskind](https://scholar.google.com/citations?user=Sv2TGqsAAAAJ&hl=en), and [Alexander G. Schwing](https://www.alexander-schwing.de/). 

### [Project Page](https://xiaoming-zhao.github.io/projects/pgdvs/) | [Paper](https://arxiv.org/abs/2310.08587)

## Table of Contents

- [Environment Setup](#environment-setup)

- [Try PGDVS on Video in the Wild](#try-pgdvs-on-video-in-the-wild)

- [Benchmarking](#benchmarking)

- [Citation](#citation)

- [License](#license)

- [Acknowledgements](#acknowledgements)

## Environment Setup

This code has been tested on Ubuntu 20.04 with CUDA 11.8 on NVIDIA A100-SXM4-80GB GPU (driver 470.82.01).

We recommend using `conda` for virtual environment control and [`libmamba`](https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community) for a faster dependency check.

```bash

# setup libmamba

conda install -n base conda-libmamba-solver -y

conda config --set solver libmamba

# create virtual environment

conda env create -f envs/pgdvs.yaml

conda activate pgdvs

conda install pytorch3d=0.7.4 -c pytorch3d -y

```

**[optional]** Run the following to install JAX if you want to 

1. try [TAPIR](https://github.com/google-deepmind/tapnet)

2. evaluate with metrics computation from [DyCheck](https://github.com/KAIR-BAIR/dycheck)

```bash

conda activate pgdvs

pip install -r envs/requirements_jax.txt --verbose

```

To check that JAX is installed correctly, run the following.

**NOTE**: the first `import torch` is important since it will make sure that JAX finds the cuDNN installed by `conda`.

```bash

conda activate pgdvs

python -c "import torch; from jax import random; key = random.PRNGKey(0); x = random.normal(key, (10,)); print(x)"

```

## Try PGDVS on Video in the Wild

### Download Checkpoints

```bash

# this environment variable is used for demonstration

cd /path/to/this/repo

export PGDVS_ROOT=$PWD

```

Since we use third parties's pretrained models, we provide two ways to download them:

1. Directly download from those official repositories;

2. Download from our copy for reproducing results in the paper just in case those official repositories's checkpoints are modified in the future.

```bash

FLAG_ORIGINAL=1  # set to 0 if you want to download from our copy

bash ${PGDVS_ROOT}/scripts/download_ckpts.sh ${PGDVS_ROOT}/ckpts ${FLAG_ORIGINAL}

```

### Example of DAVIS

We use [DAVIS](https://davischallenge.org/) as an example to illustrate how to render novel view from monocular videos in the wild. Please see [IN_THE_WILD.md](./docs/IN_THE_WILD.md) for details.

## Benchmarking

Please see [BENCHMARK_NVIDIA.md](./docs/BENCHMARK_NVIDIA.md) and [BENCHMARK_iPhone.md](./docs/BENCHMARK_iPhone.md)  for details about reproducing results on [NVIDIA Dynamic Scenes](https://gorokee.github.io/jsyoon/dynamic_synth/) and [DyCheck's iPhone Dataset](https://github.com/KAIR-BAIR/dycheck) in the paper.

## Citation

>Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing. Pseudo-Generalized Dynamic View Synthesis from a Video. ICLR 2024.

```

@inproceedings{Zhao2024PGDVS,

  title={{Pseudo-Generalized Dynamic View Synthesis from a Video}},

  author={Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Angel Bautista and Joshua M. Susskind and Alexander G. Schwing},

  booktitle={ICLR},

  year={2024},

}

```

## License

This sample code is released under the [LICENSE](./LICENSE) terms.

## Acknowledgements

Our project is not possible without the following ones:

- [GNT](https://github.com/VITA-Group/GNT) (commit `7b63996cb807dbb5c95ab6898e8093996588e73a`)

- [RAFT](https://github.com/princeton-vl/RAFT) (commit `3fa0bb0a9c633ea0a9bb8a79c576b6785d4e6a02`)

- [OneFormer](https://github.com/SHI-Labs/OneFormer) (commit `56799ef9e02968af4c7793b30deabcbeec29ffc0`)

- [segment-anything](https://github.com/facebookresearch/segment-anything) (commit `6fdee8f2727f4506cfbbe553e23b895e27956588`)

- [ZoeDepth](https://github.com/isl-org/ZoeDepth) (commit `edb6daf45458569e24f50250ef1ed08c015f17a7`)

- [TAPIR](https://github.com/deepmind/tapnet) (commit `4ac6b2acd0aed36c0762f4247de9e8630340e2e0`)

- [CoTracker](https://github.com/facebookresearch/co-tracker) (commit `0a0596b277545625054cb041f00419bcd3693ea5`)

- [casualSAM](https://github.com/ztzhang/casualSAM) (we use [our modified version](https://github.com/Xiaoming-Zhao/casualSAM))

- [dynamic-video-depth](https://github.com/google/dynamic-video-depth) (we use [our modified version](https://github.com/Xiaoming-Zhao/dynamic-video-depth))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/apple/ml-pgdvs

Awesome Lists containing this project

README

Pseudo-Generalized Dynamic View Synthesis