Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qiuyu96/carver
An efficient PyTorch-based library for training 3D-aware image synthesis models.
https://github.com/qiuyu96/carver
Last synced: about 2 months ago
JSON representation
An efficient PyTorch-based library for training 3D-aware image synthesis models.
- Host: GitHub
- URL: https://github.com/qiuyu96/carver
- Owner: qiuyu96
- License: other
- Created: 2023-06-20T06:34:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-10-23T07:45:38.000Z (about 1 year ago)
- Last Synced: 2023-10-23T08:37:49.424Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 973 KB
- Stars: 76
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase
![timeline.jpg](figures/3D_benchmark.jpg)
Figure: Overview of our modularized pipeline for 3D-aware image synthesis, which modularizes the
generation process in a universal way. Each module can be improved independently,
facilitating algorithm development. Note that the discriminator is omitted for simplicity.> **Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase**
> [Qiuyu Wang](https://github.com/qiuyu96), [Zifan Shi](https://vivianszf.github.io/), [Kecheng Zheng](https://zkcys001.github.io/), [Yinghao Xu](https://justimyhxu.github.io/), [Sida Peng](https://pengsida.net/), [Yujun Shen](https://shenyujun.github.io/)
> **NeurIPS 2023 Datasets and Benchmarks Track**[[Paper](https://arxiv.org/pdf/2306.12423.pdf)]
## Overview of methods supported by our codebase:
Supported Methods (9)
> - [x] [![](https://img.shields.io/badge/NeurIPS'2020-GRAF-f4d5b3?style=for-the-badge)](https://github.com/autonomousvision/graf)
> - [x] [![](https://img.shields.io/badge/NeurIPS'2022-EpiGRAF-d0e9ff?style=for-the-badge)](https://github.com/universome/epigraf)
> - [x] [![](https://img.shields.io/badge/CVPR'2021-π–GAN-yellowgreen?style=for-the-badge)](https://github.com/marcoamonteiro/pi-GAN)
> - [x] [![](https://img.shields.io/badge/CVPR'2021-GIRAFFE-D14836?style=for-the-badge)](https://github.com/autonomousvision/giraffe)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-EG3D-c2e2de?style=for-the-badge)](https://github.com/NVlabs/eg3d)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-GRAM-854?style=for-the-badge)](https://github.com/microsoft/GRAM)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-StyleSDF-123456?style=for-the-badge)](https://github.com/royorel/StyleSDF)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-VolumeGAN-535?style=for-the-badge)](https://github.com/genforce/volumegan)
> - [x] [![](https://img.shields.io/badge/ICLR'2022-StyleNeRF-1223?style=for-the-badge)](https://github.com/facebookresearch/StyleNeRF)Supported Modules (8)
> - [x] ![](https://img.shields.io/badge/pose_sampler-f4d5b3?style=for-the-badge) Deterministic Pose Sampling, Uncertainty Pose Sampling, ...
> - [x] ![](https://img.shields.io/badge/point_sampler-d0e9ff?style=for-the-badge) Uniform, Normal, Fixed, ...
> - [x] ![](https://img.shields.io/badge/point_embedder-854?style=for-the-badge) Tri-plane, Volume, MLP, MPI, ...
> - [x] ![](https://img.shields.io/badge/feature_decoder-D14836?style=for-the-badge) LeakyReLU, Softplus, SIREN, ReLU, ...
> - [x] ![](https://img.shields.io/badge/volume_renderer-535?style=for-the-badge) Occupancy, Density, Feature, Color, SDF, ...
> - [x] ![](https://img.shields.io/badge/stochasticity_mapper-123456?style=for-the-badge)
> - [x] ![](https://img.shields.io/badge/upsampler-c2e2de?style=for-the-badge)
> - [x] ![](https://img.shields.io/badge/visualizer-1223?style=for-the-badge) color, geometry, ...
> - [x] ![](https://img.shields.io/badge/evaluator-552?style=for-the-badge) FID, ID, RE, DE, PE, ...## Installation
Our code is tested with Python 3.8, CUDA 11.3, and PyTorch 1.11.0.
1. Install package requirements via `conda`:
```shell
conda create -n python=3.8 # create virtual environment with Python 3.8
conda activate
conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch # install PyTorch 1.11.0
pip install -r requirements.txt # install dependencies
```
2. Our code requires [nvdiffrast](https://nvlabs.github.io/nvdiffrast), so please refer to the [documentation](https://nvlabs.github.io/nvdiffrast/#linux) for instructions on how to install it.3. Our code also uses the [face reconstruction model](https://arxiv.org/abs/1903.08527) to evaluate metrics. Please refer to [this guide](https://github.com/sicxu/Deep3DFaceRecon_pytorch#prepare-prerequisite-models) to prepare prerequisite models.
4. To use a video visualizer (optional), please also install `ffmpeg`.
- Ubuntu: `sudo apt-get install ffmpeg`.
- MacOS: `brew install ffmpeg`.5. To reduce memory footprint (optional), you can switch to either `jemalloc` (recommended) or `tcmalloc` rather than your default memory allocator.
- jemalloc (recommended):
- Ubuntu: `sudo apt-get install libjemalloc`
- tcmalloc:
- Ubuntu: `sudo apt-get install google-perftools`## Preparing datasets
**FFHQ** and **ShapeNet Cars**: Please refer to [this guide](https://github.com/NVlabs/eg3d#preparing-datasets) to prepare the datasets.
**Cats**: Please refer to [this guide](https://github.com/microsoft/GRAM#data-preparation) to prepare the dataset.
## Quick demo
### Train [EG3D](https://nvlabs.github.io/eg3d/) on FFHQ in Resolution of 515x512
In your Terminal, run:
```shell
./scripts/training_demos/eg3d_ffhq512.sh [OPTIONS]
```where
- `` refers to the number of GPUs. Setting `` as 1 helps launch a training job on single-GPU platforms.
- `` refers to the path of FFHQ dataset (in a resolution of 256x256) with `zip` format. If running on local machines, a soft link of the data will be created under the `data` folder of the working directory to save disk space.
- `[OPTIONS]` refers to any additional option to pass. Detailed instructions on available options can be shown via `./scripts/training_demos/eg3d_ffhq512.sh --help`.
This demo script uses `eg3d_ffhq512` as the default value of `job_name`, which is particularly used to identify experiments. Concretely, a directory with the name `job_name` will be created under the root working directory (which is set as `work_dirs/` by default). To prevent overwriting previous experiments, an exception will be raised to interrupt the training if the `job_name` directory has already existed. To change the job name, please use `--job_name=` option.
Other 3D GAN models reproduced by our codebase can be trained similarly, please refer to scripts under `./scripts/training_demos/` for more details.
### Ablation `point embedder` using our codebase.
To investigate the effect of various point embedders, one can utilize the following command to train the models.
#### MLP-based
```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'coordinate' --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
```#### Volume-based
```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'volume' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
```#### Tri-plane-based
```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'triplane' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'eg3d' --mlp_depth 2 --mlp_hidden_dim 64 --mlp_output_dim 32 --r1_gamma 1.5
```## Inspect training results
Besides using TensorBoard to track the training process, the raw results (e.g., training losses and running time) are saved in [JSON Lines](https://jsonlines.org/) format. They can be easily inspected with the following script
```python
import jsonfile_name = '/log.json'
data_entries = []
with open(file_name, 'r') as f:
for line in f:
data_entry = json.loads(line)
data_entries.append(data_entry)# An example of data entry
# {"Loss/D Fake": 0.4833524551040682, "Loss/D Real": 0.4966000154727226, "Loss/G": 1.1439273656869773, "Learning Rate/Discriminator": 0.002352941082790494, "Learning Rate/Generator": 0.0020000000949949026, "data time": 0.0036810599267482758, "iter time": 0.24490128830075264, "run time": 66108.140625}
```## Inference for visualization
After training a model, one can employ the following scripts to run inference and visualize the results, including images, videos, and geometries.
```shell
CUDA_VISIBLE_DEVICES=0 python test_3d_inference.py --model --work_dir --save_image true --save_video false --save_shape true --shape_res 512 --num 10 --truncation_psi 0.7
```## Evaluate metrics
After training a model, one can use the following scripts to evaluate various metrics, including FID, face identity consistency (ID), depth error (DE), pose error (PE) and reprojection error (RE).
```shell
python -m torch.distributed.launch --nproc_per_node=1 test_3d_metrics.py --dataset --model --test_fid true --align_face true --test_identity true --test_reprojection_error true --test_pose true --test_depth true --fake_num 1000
```## TODO
- [ ] Upload pretrained checkpoints
- [ ] User Guide## Acknowledgement
This repository is built upon [Hammer](https://github.com/bytedance/Hammer), on top of which we reimplement [GRAF](https://github.com/autonomousvision/graf), [GIRAFFE](https://github.com/autonomousvision/giraffe), [π-GAN](https://github.com/marcoamonteiro/pi-GAN), [StyleSDF](https://github.com/royorel/StyleSDF), [StyleNeRF](https://github.com/facebookresearch/StyleNeRF), [VolumeGAN](https://github.com/genforce/volumegan), [GRAM](https://github.com/microsoft/GRAM), [EpiGRAF](https://github.com/universome/epigraf) and [EG3D](https://github.com/NVlabs/eg3d).
## BibTeX
```bibtex
@article{wang2023benchmarking,
title = {Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase},
author = {Wang, Qiuyu and Shi, Zifan and Zheng, Kecheng and Xu, Yinghao and Peng, Sida and Shen, Yujun},
journal = {arXiv preprint arXiv:2306.12423},
year = {2023}
}
```