https://github.com/facebookresearch/pippo
Pippo: High-Resolution Multi-View Humans from a Single Image
https://github.com/facebookresearch/pippo
Last synced: 22 days ago
JSON representation
Pippo: High-Resolution Multi-View Humans from a Single Image
- Host: GitHub
- URL: https://github.com/facebookresearch/pippo
- Owner: facebookresearch
- License: other
- Created: 2025-01-23T19:14:29.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-04-04T21:08:22.000Z (29 days ago)
- Last Synced: 2025-04-04T22:23:01.760Z (29 days ago)
- Language: Python
- Homepage:
- Size: 3.18 MB
- Stars: 503
- Watchers: 18
- Forks: 40
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- StarryDivineSky - facebookresearch/pippo - view consistency)来实现逼真的3D人体生成。Pippo的核心在于学习一个条件扩散模型,该模型以单张图像作为输入,生成多视角的3D人体表示。项目特色包括高分辨率的几何细节和纹理,以及在不同视角下保持一致性的能力。Pippo通过迭代地去噪(denoising)过程,逐步完善3D人体模型的细节。该项目提供代码和预训练模型,方便研究人员进行实验和应用。Pippo在人体建模、虚拟现实和增强现实等领域具有潜在应用价值。它解决了从单张图像重建高质量3D人体的挑战,并为相关研究提供了新的思路。该项目对扩散模型在3D人体建模领域的应用进行了探索,并取得了显著成果。 (人像_姿势_3D人脸 / 资源传输下载)
README
Pippo: High-Resolution Multi-View Humans from a Single Image
CVPR, 2025 (Highlight)
![]()
Yash Kant1,2,3
·
Ethan Weber1,4
·
Jin Kyu Kim1
·
Rawal Khirodkar1
·
Su Zhaoen1
·
Julieta Martinez1
Igor Gilitschenski*2,3
·
Shunsuke Saito*1
·
Timur Bagautdinov*1
* Joint Advising
1 Meta Reality Labs ·
2 University of Toronto ·
3 Vector Institute ·
4 UC Berkeley
We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo.
Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image.#### This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.
## Setup code
Clone and add repository to your path:
```
git clone [email protected]:facebookresearch/pippo.git
cd pippo
export PATH=$PATH:$PWD
```## Prerequisites and Dependencies
```
conda create -n pippo python=3.10.1 -c conda-forge
conda activate pippo# can adjust as required (we tested on below configuration)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidiapip install -r requirements.txt
```
## Download and Sample Training
You can launch a sample training run on few samples of [Ava-256 dataset](https://github.com/facebookresearch/ava-256). We provide pre-packaged samples for this training stored as npy files [here](https://huggingface.co/datasets/yashkant/pippo/tree/main). Ensure you are authenticated to huggingface with login token to download the samples.
```
# download packaged Ava-256 samples
python scripts/pippo/download_samples.py
```We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in `config/full/` directory.
```
# launch training (tested on single A100 GPU 80GB): full sized model
python train.py config/full/128_4v.yml
```Additionally, we provide a tiny model config to train on a smaller gpu:
```
# launch training (tested on single T4 GPU 16GB): tiny model
python train.py config/tiny/128_4v_tiny.yml
```## Training on custom dataset (see https://github.com/facebookresearch/pippo/issues/9):
You will have to prepare your dataset similar to the provided [Ava-256 samples stored in numpy files](https://huggingface.co/datasets/yashkant/pippo/tree/main/ava_samples) on your custom dataset.The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) [in this gist here](https://gist.github.com/yashkant/971e205d85b15e17d20d33edd29d6016). You can refer these methods to create these fields on your own custom datasets!
## Re-projection Error
To compute the re-projection error between generated images and ground truth images, run the following command:
```
python scripts/pippo/reprojection_error.py
```## Useful Pointers
Here is a list of useful things to borrow from this codebase:
- ControlMLP to inject spatial control in Diffusion Transformers: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/control_mlp.py#L161)
- Attention Biasing to run inference on 5x longer sequences: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/dit.py#L165)
- Re-projection Error Metric: [see here](https://github.com/facebookresearch/pippo/blob/main/scripts/pippo/reprojection_error.py#L150)## Todos
We plan to add and update the following in the future:
- Cleaning up fluff in pippo.py and dit.py
- Inference script for pretrained models.## License
See LICENSE file for details.
## Citation
If you benefit from this codebase, consider citing our work:
```
@article{Kant2024Pippo,
title={Pippo: High-Resolution Multi-View Humans from a Single Image},
author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov},
year={2025},
}
```