https://github.com/facebookresearch/pippo

Pippo: High-Resolution Multi-View Humans from a Single Image
https://github.com/facebookresearch/pippo

Last synced: 22 days ago
JSON representation

Pippo: High-Resolution Multi-View Humans from a Single Image

Host: GitHub
URL: https://github.com/facebookresearch/pippo
Owner: facebookresearch
License: other
Created: 2025-01-23T19:14:29.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-04-04T21:08:22.000Z (29 days ago)
Last Synced: 2025-04-04T22:23:01.760Z (29 days ago)
Language: Python
Homepage:
Size: 3.18 MB
Stars: 503
Watchers: 18
Forks: 40
Open Issues: 11
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

StarryDivineSky - facebookresearch/pippo - view consistency）来实现逼真的3D人体生成。Pippo的核心在于学习一个条件扩散模型，该模型以单张图像作为输入，生成多视角的3D人体表示。项目特色包括高分辨率的几何细节和纹理，以及在不同视角下保持一致性的能力。Pippo通过迭代地去噪（denoising）过程，逐步完善3D人体模型的细节。该项目提供代码和预训练模型，方便研究人员进行实验和应用。Pippo在人体建模、虚拟现实和增强现实等领域具有潜在应用价值。它解决了从单张图像重建高质量3D人体的挑战，并为相关研究提供了新的思路。该项目对扩散模型在3D人体建模领域的应用进行了探索，并取得了显著成果。 (人像_姿势_3D人脸 / 资源传输下载)

README

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)

Pippo

Yash Kant^1,2,3
·
Ethan Weber^1,4
·
Jin Kyu Kim¹
·
Rawal Khirodkar¹
·
Su Zhaoen¹
·
Julieta Martinez¹

Igor Gilitschenski*^2,3
·
Shunsuke Saito*¹
·
Timur Bagautdinov*¹

* Joint Advising

¹ Meta Reality Labs ·
² University of Toronto ·
³ Vector Institute ·
⁴ UC Berkeley

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo.
Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image.

#### This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

## Setup code
Clone and add repository to your path:
```
git clone [email protected]:facebookresearch/pippo.git
cd pippo
export PATH=$PATH:$PWD
```

## Prerequisites and Dependencies
```
conda create -n pippo python=3.10.1 -c conda-forge
conda activate pippo

# can adjust as required (we tested on below configuration)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia

pip install -r requirements.txt

```

## Download and Sample Training
You can launch a sample training run on few samples of [Ava-256 dataset](https://github.com/facebookresearch/ava-256). We provide pre-packaged samples for this training stored as npy files [here](https://huggingface.co/datasets/yashkant/pippo/tree/main). Ensure you are authenticated to huggingface with login token to download the samples.
```
# download packaged Ava-256 samples
python scripts/pippo/download_samples.py
```

We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in `config/full/` directory.
```
# launch training (tested on single A100 GPU 80GB): full sized model
python train.py config/full/128_4v.yml
```

Additionally, we provide a tiny model config to train on a smaller gpu:
```
# launch training (tested on single T4 GPU 16GB): tiny model
python train.py config/tiny/128_4v_tiny.yml
```

## Training on custom dataset (see https://github.com/facebookresearch/pippo/issues/9):
You will have to prepare your dataset similar to the provided [Ava-256 samples stored in numpy files](https://huggingface.co/datasets/yashkant/pippo/tree/main/ava_samples) on your custom dataset.

The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) [in this gist here](https://gist.github.com/yashkant/971e205d85b15e17d20d33edd29d6016). You can refer these methods to create these fields on your own custom datasets!

## Re-projection Error
To compute the re-projection error between generated images and ground truth images, run the following command:
```
python scripts/pippo/reprojection_error.py
```

## Useful Pointers
Here is a list of useful things to borrow from this codebase:
- ControlMLP to inject spatial control in Diffusion Transformers: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/control_mlp.py#L161)
- Attention Biasing to run inference on 5x longer sequences: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/dit.py#L165)
- Re-projection Error Metric: [see here](https://github.com/facebookresearch/pippo/blob/main/scripts/pippo/reprojection_error.py#L150)

## Todos
We plan to add and update the following in the future:
- Cleaning up fluff in pippo.py and dit.py
- Inference script for pretrained models.

## License

See LICENSE file for details.

## Citation
If you benefit from this codebase, consider citing our work:
```
@article{Kant2024Pippo,
title={Pippo: High-Resolution Multi-View Humans from a Single Image},
author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov},
year={2025},
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/facebookresearch/pippo

Awesome Lists containing this project

README

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)