https://github.com/apple/ml-pgdvs
[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"
https://github.com/apple/ml-pgdvs
3d-vision dynamic-view-synthesis neural-rendering novel-view-synthesis view-synthesis
Last synced: 8 months ago
JSON representation
[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"
- Host: GitHub
- URL: https://github.com/apple/ml-pgdvs
- Owner: apple
- License: other
- Created: 2024-03-13T22:34:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-21T22:39:48.000Z (12 months ago)
- Last Synced: 2025-01-30T07:42:33.714Z (9 months ago)
- Topics: 3d-vision, dynamic-view-synthesis, neural-rendering, novel-view-synthesis, view-synthesis
- Language: Python
- Homepage: https://xiaoming-zhao.github.io/projects/pgdvs/
- Size: 1.43 MB
- Stars: 29
- Watchers: 9
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
Pseudo-Generalized Dynamic View Synthesis
ICLR 2024
![]()
**Pseudo-Generalized Dynamic View Synthesis from a Video, ICLR 2024.**
[Xiaoming Zhao](https://xiaoming-zhao.com/), [Alex Colburn](https://www.colburn.org/), [Fangchang Ma](https://fangchangma.github.io/), [Miguel Ángel Bautista](https://scholar.google.com/citations?user=ZrRs-qoAAAAJ&hl=en), [Joshua M Susskind](https://scholar.google.com/citations?user=Sv2TGqsAAAAJ&hl=en), and [Alexander G. Schwing](https://www.alexander-schwing.de/).### [Project Page](https://xiaoming-zhao.github.io/projects/pgdvs/) | [Paper](https://arxiv.org/abs/2310.08587)
## Table of Contents
- [Environment Setup](#environment-setup)
- [Try PGDVS on Video in the Wild](#try-pgdvs-on-video-in-the-wild)
- [Benchmarking](#benchmarking)
- [Citation](#citation)
- [License](#license)
- [Acknowledgements](#acknowledgements)## Environment Setup
This code has been tested on Ubuntu 20.04 with CUDA 11.8 on NVIDIA A100-SXM4-80GB GPU (driver 470.82.01).
We recommend using `conda` for virtual environment control and [`libmamba`](https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community) for a faster dependency check.
```bash
# setup libmamba
conda install -n base conda-libmamba-solver -y
conda config --set solver libmamba# create virtual environment
conda env create -f envs/pgdvs.yamlconda activate pgdvs
conda install pytorch3d=0.7.4 -c pytorch3d -y
```**[optional]** Run the following to install JAX if you want to
1. try [TAPIR](https://github.com/google-deepmind/tapnet)
2. evaluate with metrics computation from [DyCheck](https://github.com/KAIR-BAIR/dycheck)
```bash
conda activate pgdvs
pip install -r envs/requirements_jax.txt --verbose
```
To check that JAX is installed correctly, run the following.
**NOTE**: the first `import torch` is important since it will make sure that JAX finds the cuDNN installed by `conda`.
```bash
conda activate pgdvs
python -c "import torch; from jax import random; key = random.PRNGKey(0); x = random.normal(key, (10,)); print(x)"
```## Try PGDVS on Video in the Wild
### Download Checkpoints
```bash
# this environment variable is used for demonstration
cd /path/to/this/repo
export PGDVS_ROOT=$PWD
```Since we use third parties's pretrained models, we provide two ways to download them:
1. Directly download from those official repositories;
2. Download from our copy for reproducing results in the paper just in case those official repositories's checkpoints are modified in the future.
```bash
FLAG_ORIGINAL=1 # set to 0 if you want to download from our copy
bash ${PGDVS_ROOT}/scripts/download_ckpts.sh ${PGDVS_ROOT}/ckpts ${FLAG_ORIGINAL}
```### Example of DAVIS
We use [DAVIS](https://davischallenge.org/) as an example to illustrate how to render novel view from monocular videos in the wild. Please see [IN_THE_WILD.md](./docs/IN_THE_WILD.md) for details.
## Benchmarking
Please see [BENCHMARK_NVIDIA.md](./docs/BENCHMARK_NVIDIA.md) and [BENCHMARK_iPhone.md](./docs/BENCHMARK_iPhone.md) for details about reproducing results on [NVIDIA Dynamic Scenes](https://gorokee.github.io/jsyoon/dynamic_synth/) and [DyCheck's iPhone Dataset](https://github.com/KAIR-BAIR/dycheck) in the paper.
## Citation
>Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing. Pseudo-Generalized Dynamic View Synthesis from a Video. ICLR 2024.
```
@inproceedings{Zhao2024PGDVS,
title={{Pseudo-Generalized Dynamic View Synthesis from a Video}},
author={Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Angel Bautista and Joshua M. Susskind and Alexander G. Schwing},
booktitle={ICLR},
year={2024},
}
```## License
This sample code is released under the [LICENSE](./LICENSE) terms.
## Acknowledgements
Our project is not possible without the following ones:
- [GNT](https://github.com/VITA-Group/GNT) (commit `7b63996cb807dbb5c95ab6898e8093996588e73a`)
- [RAFT](https://github.com/princeton-vl/RAFT) (commit `3fa0bb0a9c633ea0a9bb8a79c576b6785d4e6a02`)
- [OneFormer](https://github.com/SHI-Labs/OneFormer) (commit `56799ef9e02968af4c7793b30deabcbeec29ffc0`)
- [segment-anything](https://github.com/facebookresearch/segment-anything) (commit `6fdee8f2727f4506cfbbe553e23b895e27956588`)
- [ZoeDepth](https://github.com/isl-org/ZoeDepth) (commit `edb6daf45458569e24f50250ef1ed08c015f17a7`)
- [TAPIR](https://github.com/deepmind/tapnet) (commit `4ac6b2acd0aed36c0762f4247de9e8630340e2e0`)
- [CoTracker](https://github.com/facebookresearch/co-tracker) (commit `0a0596b277545625054cb041f00419bcd3693ea5`)
- [casualSAM](https://github.com/ztzhang/casualSAM) (we use [our modified version](https://github.com/Xiaoming-Zhao/casualSAM))
- [dynamic-video-depth](https://github.com/google/dynamic-video-depth) (we use [our modified version](https://github.com/Xiaoming-Zhao/dynamic-video-depth))