An open API service indexing awesome lists of open source software.

https://github.com/facebookresearch/pippo

Pippo: High-Resolution Multi-View Humans from a Single Image
https://github.com/facebookresearch/pippo

Last synced: 22 days ago
JSON representation

Pippo: High-Resolution Multi-View Humans from a Single Image

Awesome Lists containing this project

README

        

Pippo: High-Resolution Multi-View Humans from a Single Image




Project Page


Paper PDF


Spaces


Visuals (Drive)

CVPR, 2025 (Highlight)


Pippo


Yash Kant1,2,3
·
Ethan Weber1,4
·
Jin Kyu Kim1
·
Rawal Khirodkar1
·
Su Zhaoen1
·
Julieta Martinez1


Igor Gilitschenski*2,3
·
Shunsuke Saito*1
·
Timur Bagautdinov*1

* Joint Advising



1 Meta Reality Labs ·
2 University of Toronto ·
3 Vector Institute ·
4 UC Berkeley

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo.
Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image.

#### This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

## Setup code
Clone and add repository to your path:
```
git clone [email protected]:facebookresearch/pippo.git
cd pippo
export PATH=$PATH:$PWD
```

## Prerequisites and Dependencies
```
conda create -n pippo python=3.10.1 -c conda-forge
conda activate pippo

# can adjust as required (we tested on below configuration)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia

pip install -r requirements.txt

```

## Download and Sample Training
You can launch a sample training run on few samples of [Ava-256 dataset](https://github.com/facebookresearch/ava-256). We provide pre-packaged samples for this training stored as npy files [here](https://huggingface.co/datasets/yashkant/pippo/tree/main). Ensure you are authenticated to huggingface with login token to download the samples.
```
# download packaged Ava-256 samples
python scripts/pippo/download_samples.py
```

We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in `config/full/` directory.
```
# launch training (tested on single A100 GPU 80GB): full sized model
python train.py config/full/128_4v.yml
```

Additionally, we provide a tiny model config to train on a smaller gpu:
```
# launch training (tested on single T4 GPU 16GB): tiny model
python train.py config/tiny/128_4v_tiny.yml
```

## Training on custom dataset (see https://github.com/facebookresearch/pippo/issues/9):
You will have to prepare your dataset similar to the provided [Ava-256 samples stored in numpy files](https://huggingface.co/datasets/yashkant/pippo/tree/main/ava_samples) on your custom dataset.

The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) [in this gist here](https://gist.github.com/yashkant/971e205d85b15e17d20d33edd29d6016). You can refer these methods to create these fields on your own custom datasets!

## Re-projection Error
To compute the re-projection error between generated images and ground truth images, run the following command:
```
python scripts/pippo/reprojection_error.py
```

## Useful Pointers
Here is a list of useful things to borrow from this codebase:
- ControlMLP to inject spatial control in Diffusion Transformers: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/control_mlp.py#L161)
- Attention Biasing to run inference on 5x longer sequences: [see here](https://github.com/facebookresearch/pippo/blob/main/latent_diffusion/models/dit.py#L165)
- Re-projection Error Metric: [see here](https://github.com/facebookresearch/pippo/blob/main/scripts/pippo/reprojection_error.py#L150)

## Todos
We plan to add and update the following in the future:
- Cleaning up fluff in pippo.py and dit.py
- Inference script for pretrained models.

## License

See LICENSE file for details.

## Citation
If you benefit from this codebase, consider citing our work:
```
@article{Kant2024Pippo,
title={Pippo: High-Resolution Multi-View Humans from a Single Image},
author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov},
year={2025},
}
```