Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrtornado24/ide-3d
[SIGGRAPH Asia 2022] IDE-3D: Interactive Disentangled Editing For High-Resolution 3D-aware Portrait Synthesis
https://github.com/mrtornado24/ide-3d
3d-aware-image-synthesis face-editing generative-adversarial-network inversion nerf
Last synced: 43 minutes ago
JSON representation
[SIGGRAPH Asia 2022] IDE-3D: Interactive Disentangled Editing For High-Resolution 3D-aware Portrait Synthesis
- Host: GitHub
- URL: https://github.com/mrtornado24/ide-3d
- Owner: MrTornado24
- Created: 2022-09-08T17:24:00.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-12T02:14:13.000Z (over 2 years ago)
- Last Synced: 2025-01-27T17:51:34.793Z (7 days ago)
- Topics: 3d-aware-image-synthesis, face-editing, generative-adversarial-network, inversion, nerf
- Language: Jupyter Notebook
- Homepage: https://mrtornado24.github.io/IDE-3D/
- Size: 83.2 MB
- Stars: 487
- Watchers: 20
- Forks: 44
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
## ACM Transactions on Graphics (SIGGRAPH Asia 2022)
![Teaser image](./docs/rep.png)
[![arXiv](https://img.shields.io/badge/arXiv-2108.00946-b31b1b.svg)](https://arxiv.org/abs/2205.15517)
**IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis**
[Jingxiang Sun](https://mrtornado24.github.io/), [Xuan Wang](https://xuanwangvc.github.io/), [Yichun Shi](https://seasonsh.github.io/), [Lizhen Wang](https://lizhenwangt.github.io/), [Jue Wang](https://juewang725.github.io/), [Yebin Liu](http://www.liuyebin.com/)
https://mrtornado24.github.io/IDE-3D/Abstract: *Existing 3D-aware facial generation methods face a dilemma in quality versus editability: they either generate editable results in low resolution, or high quality ones with no editing flexibility. In this work, we propose a new approach that brings the best of both worlds together. Our system consists of three major components: (1) a 3D-semantics-aware generative model that produces view-consistent, disentangled face images and semantic masks; (2) a hybrid GAN inversion approach that initialize the latent codes from the semantic and texture encoder, and further optimized them for faithful reconstruction; and (3) a canonical editor that enables efficient manipulation of semantic masks in canonical view and producs high quality editing results. Our approach is competent for many applications, e.g. free-view face drawing, editing and style control. Both quantitative and qualitative results show that our method reaches the state-of-the-art in terms of photorealism, faithfulness and efficiency.*
## Installation
- ```git clone --recursive https://github.com/MrTornado24/IDE-3D.git ```
- ```cd IDE-3D```
- ```conda env create -f environment.yml```## Getting started
Please download our pre-trained checkpoints from [link](https://drive.google.com/drive/folders/1-i1WLR5YCXOKjNuQEB6ECf-JxMyvqC4l?usp=sharing) and put them under `pretrained_models/`. The link mainly contains the pretrained generator `ide3d-ffhq-64-512.pkl` and the style encoder `encoder-base-hybrid.pkl`. More pretrianed models will be released soon.
## Semantic-aware image synthesis
```.bash
# Generate videos using pre-trained modelpython gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=2x2 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl --interpolate 1 --image_mode image_seg# Generate the same 4 seeds in an interpolation sequence
python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=1x1 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl --interpolate 1 --image_mode image_seg
``````.bash
# Generate images using pre-trained modelpython gen_images.py --outdir=out --trunc=0.7 --seeds=0-3 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl
``````.bash
# Extract shapes (saved as .mrc and .npy) using pre-trained modelpython extract_shapes.py --outdir out --trunc 0.7 --seeds 0-3 \
--network networks/network_snapshot.pkl --cube_size 1
# Render meshes to videopython render_mesh.py --fname out/0.npy --outdir out
```We visualize our .mrc shape files with [UCSF Chimerax](https://www.cgl.ucsf.edu/chimerax/). Please refer to [EG3D](https://github.com/NVlabs/eg3d) for detailed instruction of Chimerax.
## Interactive editing
![UI](./docs/ui.gif)
We provide an interactive tool that can be used for 3D-aware face drawing and editng in real-time. Before using it, please install the enviroment with `pip install -r ./Painter/requirements.txt`.
```.bash
python Painter/run_ui.py
--g_ckpt pretrained_models/ide3d-ffhq-64-512.pkl
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
```## Preparing datasets
**FFHQ**: Download and process the [Flickr-Faces-HQ dataset](https://github.com/NVlabs/ffhq-dataset) following [EG3D](https://github.com/NVlabs/eg3d). Then, parse semantic masks for all processed images using a [pretrained parsing model](https://drive.google.com/file/d/17H1JR-UJllJ3TCnEbtJscx_GgupTBtqS/view?usp=sharing). You can download `dataset.json` for FFHQ [here](https://drive.google.com/file/d/1fnRy9jSscGUsWMA8H0upy1U5hZ6E0ZbP/view?usp=sharing). The processed data would be placed as:
```
├── /path/to/dataset
│ ├── masks512x512
│ ├── maskscolor512x512
│ ├── images512x512
│ │ ├── 00000
├──img00000000.png
│ │ ├── ...
│ │ ├── dataset.json```
**Custom dataset**: You can process your own dataset using the following commands. It would be useful for real portrait image editing.
```
cd dataset_preprocessing/ffhq
python preprocess_in_the_wild.py --indir=INPUT_IMAGE_FOLDER
```## Real portrait image editing
IDE-3D supports 3D-aware real protrait image editing using our interactive tool. Please run the following commands:
```.bash
# infer latent code as initializationpython apps/infer_hybrid_encoder.py
--target_img /path/to/img_0.png
--g_ckpt pretrained_models/ide3d-ffhq-64-512.pkl
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
--outdir out
```
The above command would return `rec_ws.pt` under `out/img_0`.```.bash
# run ptipython inversion/scripts/run_pti.py
--run_name ide3d_plus_initial_code
--projector_type ide3d_plus
--pivotal_tuning
--viz_image
--viz_mesh
--viz_video
--label_path /path/to/dataset.json
--image_name img_0
--initial_w out/img_0/rec_ws.pt```
We adopt [PTI](https://github.com/danielroich/PTI) for 3D inverison. Before running, please place the images into `examples/`. You can pass Flag `ide3d_plus` or `ide3d` to choose different inversion types ('w' and 'w+'). Flag `initial_w` specifies the latent code obtained from the last step. It benefits more reasonable shape especially for images with steep viewing angles. The command would return pose label `label.pt`, reconstructed latent code `latent.pt`, finetuned generator and some visualizations.```.bash
# (optional) finetune encoderpython apps/finetune_hybrid_encoder.py
--target_img /path/to/img_0.png
--target_code /path/to/latent.pt
--target_label /path/to/label.pt
--g_ckpt /path/to/finetuned_generator.pt
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
--outdir out
--max-steps 1000```
This step is to align the shapes reconstructed by encoders and PTI. The finetuned encoder would be saved as `finetuned_encoder.pkl`. Besides, a semantic mask `mask.png` would be saved under the same folder.```.bash
# run UIpython Painter/run_ui.py
--g_ckpt /path/to/finetuned_generator.pt
--e_ckpt /path/to/finetuned_encoder.pkl
--target_code /path/to/latent.pt
--target_label /path/to/label.pt
--inversion```
**Note** you should click `Open Image` and load `mask.png` that is returned in the last step.## 3D-aware CLIP-guided domain adaptation
Please obtain the adapted generators following [IDE3D-NADA](https://github.com/MrTornado24/ide3d-nada/blob/main/README.md). You can perform interactive editing in other domains by simply replacing the original generator by the adapted one:
```.bash
python Painter/run_ui.py
--g_ckpt /path/to/adapted_generator.pt
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
```## Semantic-guided style animation
![Teaser image](./docs/animation.png)
IDE-3D supports animating stylized virtual faces through semantic masks. Please process a video clip and prepare a `dataset.json`. Then run:
```.bash
python apps/infer_face_animation.py
--drive_root /path/to/images
--network pretrained_models/ide3d-ffhq-64-512.pkl
--encoder pretrained_models/encoder-base-hybrid.pkl
--grid 4x1
--seeds 52,197,229
--outdir out
```## Training
Training scipts will be released soon.
## Acknowledgments
Part of the codes are borrowed from [StyleGAN3](https://github.com/NVlabs/stylegan3), [PTI](https://github.com/danielroich/PTI), [EG3D](https://github.com/NVlabs/eg3d) and [StyleGAN-nada](https://github.com/rinongal/StyleGAN-nada).
## Citation
If you use this code for your research, please cite the following works:
```@article{sun2022ide,
title = {IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis},
author = {Sun, Jingxiang and Wang, Xuan and Shi, Yichun and Wang, Lizhen and Wang, Jue and Liu, Yebin},
journal = {ACM Transactions on Graphics (TOG)},
volume = {41},
number = {6},
articleno = {270},
pages = {1--10},
year = {2022},
publisher = {ACM New York, NY, USA},
doi={10.1145/3550454.3555506},
}@inproceedings{sun2022fenerf,
title={Fenerf: Face editing in neural radiance fields},
author={Sun, Jingxiang and Wang, Xuan and Zhang, Yong and Li, Xiaoyu and Zhang, Qi and Liu, Yebin and Wang, Jue},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7672--7682},
year={2022}
}
```