Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/qiuyu96/carver

An efficient PyTorch-based library for training 3D-aware image synthesis models.
https://github.com/qiuyu96/carver

Last synced: about 2 months ago
JSON representation

An efficient PyTorch-based library for training 3D-aware image synthesis models.

Awesome Lists containing this project

README

        

# Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase

![timeline.jpg](figures/3D_benchmark.jpg)
Figure: Overview of our modularized pipeline for 3D-aware image synthesis, which modularizes the
generation process in a universal way. Each module can be improved independently,
facilitating algorithm development. Note that the discriminator is omitted for simplicity.

> **Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase**

> [Qiuyu Wang](https://github.com/qiuyu96), [Zifan Shi](https://vivianszf.github.io/), [Kecheng Zheng](https://zkcys001.github.io/), [Yinghao Xu](https://justimyhxu.github.io/), [Sida Peng](https://pengsida.net/), [Yujun Shen](https://shenyujun.github.io/)

> **NeurIPS 2023 Datasets and Benchmarks Track**

[[Paper](https://arxiv.org/pdf/2306.12423.pdf)]

## Overview of methods supported by our codebase:

Supported Methods (9)

> - [x] [![](https://img.shields.io/badge/NeurIPS'2020-GRAF-f4d5b3?style=for-the-badge)](https://github.com/autonomousvision/graf)
> - [x] [![](https://img.shields.io/badge/NeurIPS'2022-EpiGRAF-d0e9ff?style=for-the-badge)](https://github.com/universome/epigraf)
> - [x] [![](https://img.shields.io/badge/CVPR'2021-π–GAN-yellowgreen?style=for-the-badge)](https://github.com/marcoamonteiro/pi-GAN)
> - [x] [![](https://img.shields.io/badge/CVPR'2021-GIRAFFE-D14836?style=for-the-badge)](https://github.com/autonomousvision/giraffe)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-EG3D-c2e2de?style=for-the-badge)](https://github.com/NVlabs/eg3d)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-GRAM-854?style=for-the-badge)](https://github.com/microsoft/GRAM)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-StyleSDF-123456?style=for-the-badge)](https://github.com/royorel/StyleSDF)
> - [x] [![](https://img.shields.io/badge/CVPR'2022-VolumeGAN-535?style=for-the-badge)](https://github.com/genforce/volumegan)
> - [x] [![](https://img.shields.io/badge/ICLR'2022-StyleNeRF-1223?style=for-the-badge)](https://github.com/facebookresearch/StyleNeRF)

Supported Modules (8)

> - [x] ![](https://img.shields.io/badge/pose_sampler-f4d5b3?style=for-the-badge) Deterministic Pose Sampling, Uncertainty Pose Sampling, ...
> - [x] ![](https://img.shields.io/badge/point_sampler-d0e9ff?style=for-the-badge) Uniform, Normal, Fixed, ...
> - [x] ![](https://img.shields.io/badge/point_embedder-854?style=for-the-badge) Tri-plane, Volume, MLP, MPI, ...
> - [x] ![](https://img.shields.io/badge/feature_decoder-D14836?style=for-the-badge) LeakyReLU, Softplus, SIREN, ReLU, ...
> - [x] ![](https://img.shields.io/badge/volume_renderer-535?style=for-the-badge) Occupancy, Density, Feature, Color, SDF, ...
> - [x] ![](https://img.shields.io/badge/stochasticity_mapper-123456?style=for-the-badge)
> - [x] ![](https://img.shields.io/badge/upsampler-c2e2de?style=for-the-badge)
> - [x] ![](https://img.shields.io/badge/visualizer-1223?style=for-the-badge) color, geometry, ...
> - [x] ![](https://img.shields.io/badge/evaluator-552?style=for-the-badge) FID, ID, RE, DE, PE, ...

## Installation

Our code is tested with Python 3.8, CUDA 11.3, and PyTorch 1.11.0.

1. Install package requirements via `conda`:

```shell
conda create -n python=3.8 # create virtual environment with Python 3.8
conda activate
conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch # install PyTorch 1.11.0
pip install -r requirements.txt # install dependencies
```
2. Our code requires [nvdiffrast](https://nvlabs.github.io/nvdiffrast), so please refer to the [documentation](https://nvlabs.github.io/nvdiffrast/#linux) for instructions on how to install it.

3. Our code also uses the [face reconstruction model](https://arxiv.org/abs/1903.08527) to evaluate metrics. Please refer to [this guide](https://github.com/sicxu/Deep3DFaceRecon_pytorch#prepare-prerequisite-models) to prepare prerequisite models.

4. To use a video visualizer (optional), please also install `ffmpeg`.

- Ubuntu: `sudo apt-get install ffmpeg`.
- MacOS: `brew install ffmpeg`.

5. To reduce memory footprint (optional), you can switch to either `jemalloc` (recommended) or `tcmalloc` rather than your default memory allocator.

- jemalloc (recommended):
- Ubuntu: `sudo apt-get install libjemalloc`
- tcmalloc:
- Ubuntu: `sudo apt-get install google-perftools`

## Preparing datasets

**FFHQ** and **ShapeNet Cars**: Please refer to [this guide](https://github.com/NVlabs/eg3d#preparing-datasets) to prepare the datasets.

**Cats**: Please refer to [this guide](https://github.com/microsoft/GRAM#data-preparation) to prepare the dataset.

## Quick demo

### Train [EG3D](https://nvlabs.github.io/eg3d/) on FFHQ in Resolution of 515x512

In your Terminal, run:

```shell
./scripts/training_demos/eg3d_ffhq512.sh [OPTIONS]
```

where

- `` refers to the number of GPUs. Setting `` as 1 helps launch a training job on single-GPU platforms.

- `` refers to the path of FFHQ dataset (in a resolution of 256x256) with `zip` format. If running on local machines, a soft link of the data will be created under the `data` folder of the working directory to save disk space.

- `[OPTIONS]` refers to any additional option to pass. Detailed instructions on available options can be shown via `./scripts/training_demos/eg3d_ffhq512.sh --help`.

This demo script uses `eg3d_ffhq512` as the default value of `job_name`, which is particularly used to identify experiments. Concretely, a directory with the name `job_name` will be created under the root working directory (which is set as `work_dirs/` by default). To prevent overwriting previous experiments, an exception will be raised to interrupt the training if the `job_name` directory has already existed. To change the job name, please use `--job_name=` option.

Other 3D GAN models reproduced by our codebase can be trained similarly, please refer to scripts under `./scripts/training_demos/` for more details.

### Ablation `point embedder` using our codebase.

To investigate the effect of various point embedders, one can utilize the following command to train the models.

#### MLP-based

```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'coordinate' --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
```

#### Volume-based

```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'volume' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5
```

#### Tri-plane-based

```shell
./scripts/training_demos/ablation3d.sh --job_name --root_work_dir --ref_mode 'triplane' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'eg3d' --mlp_depth 2 --mlp_hidden_dim 64 --mlp_output_dim 32 --r1_gamma 1.5
```

## Inspect training results

Besides using TensorBoard to track the training process, the raw results (e.g., training losses and running time) are saved in [JSON Lines](https://jsonlines.org/) format. They can be easily inspected with the following script

```python
import json

file_name = '/log.json'

data_entries = []
with open(file_name, 'r') as f:
for line in f:
data_entry = json.loads(line)
data_entries.append(data_entry)

# An example of data entry
# {"Loss/D Fake": 0.4833524551040682, "Loss/D Real": 0.4966000154727226, "Loss/G": 1.1439273656869773, "Learning Rate/Discriminator": 0.002352941082790494, "Learning Rate/Generator": 0.0020000000949949026, "data time": 0.0036810599267482758, "iter time": 0.24490128830075264, "run time": 66108.140625}
```

## Inference for visualization

After training a model, one can employ the following scripts to run inference and visualize the results, including images, videos, and geometries.

```shell
CUDA_VISIBLE_DEVICES=0 python test_3d_inference.py --model --work_dir --save_image true --save_video false --save_shape true --shape_res 512 --num 10 --truncation_psi 0.7
```

## Evaluate metrics

After training a model, one can use the following scripts to evaluate various metrics, including FID, face identity consistency (ID), depth error (DE), pose error (PE) and reprojection error (RE).

```shell
python -m torch.distributed.launch --nproc_per_node=1 test_3d_metrics.py --dataset --model --test_fid true --align_face true --test_identity true --test_reprojection_error true --test_pose true --test_depth true --fake_num 1000
```

## TODO

- [ ] Upload pretrained checkpoints
- [ ] User Guide

## Acknowledgement

This repository is built upon [Hammer](https://github.com/bytedance/Hammer), on top of which we reimplement [GRAF](https://github.com/autonomousvision/graf), [GIRAFFE](https://github.com/autonomousvision/giraffe), [π-GAN](https://github.com/marcoamonteiro/pi-GAN), [StyleSDF](https://github.com/royorel/StyleSDF), [StyleNeRF](https://github.com/facebookresearch/StyleNeRF), [VolumeGAN](https://github.com/genforce/volumegan), [GRAM](https://github.com/microsoft/GRAM), [EpiGRAF](https://github.com/universome/epigraf) and [EG3D](https://github.com/NVlabs/eg3d).

## BibTeX

```bibtex
@article{wang2023benchmarking,
title = {Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase},
author = {Wang, Qiuyu and Shi, Zifan and Zheng, Kecheng and Xu, Yinghao and Peng, Sida and Shen, Yujun},
journal = {arXiv preprint arXiv:2306.12423},
year = {2023}
}
```