https://github.com/sony/genwarp
https://github.com/sony/genwarp
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/sony/genwarp
- Owner: sony
- License: mit
- Created: 2024-08-06T11:40:47.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-26T09:45:45.000Z (about 1 year ago)
- Last Synced: 2025-03-28T04:12:19.068Z (7 months ago)
- Language: Python
- Size: 2.49 MB
- Stars: 264
- Watchers: 7
- Forks: 20
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
_
NeurIPS 2024
_
Junyoung Seo*,1,3 Kazumi Fukuda1 Takashi Shibuya1 Takuya Narihira1
Naoki Murata1 Shoukang Hu1 Chieh-Hsin Lai1
Seungryong Kim†,3 Yuki Mitsufuji†,1,2
1Sony AI 2Sony Group Corporation 3KAIST
*Work done during an internship at Sony AI. †Co-corresponding authors.
[](https://genwarp-nvs.github.io/)
[](https://huggingface.co/spaces/Sony/genwarp)
[](https://github.com/sony/genwarp/)
[](https://huggingface.co/Sony/genwarp)
[](https://arxiv.org/abs/2405.17251)
[Introduction](#introduction)
| [Demo](#demo)
| [Examples](#examples)
| [How to use](#how-to-use)
| [Citation](#citation)
| [Acknowledgements](#acknowledgements)

## Updates
- **26/09/2024:** Our paper is accepted for NeurIPS 2024
- **13/09/2024:** Added example with Depth Anything V2
- **27/08/2024:** Codes and demos are released
## Introduction
This repository is an official implementation for the paper "[GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping](https://genwarp-nvs.github.io/)". Genwarp can generate novel view images from a single input conditioned on camera poses. In this repository, we offer the codes for inference of the model. For detailed information, please refer to the [paper](https://arxiv.org/abs/2405.17251).

## Demo
Here is a quick preview of GenWarp in action. Try it out by yourself at [Spaces](https://huggingface.co/spaces/Sony/genwarp) or run it locally on your machine. See [How to use](#how-to-use) section for more details. (Left) 3D scene reconstructed from the input image and the estimated depth. (Middle) Warped image. (Right) Generated image.
## Examples
Our model can handle images from various domains including indoor/outdoor scenes, and even illustrations with challenging camera viewpoint changes.
You can find examples on our [project page](https://genwarp-nvs.github.io/) and on our [paper](https://arxiv.org/abs/2405.17251). Or even better, you can try your favourite images on the live demo at [Spaces](https://huggingface.co/spaces/Sony/genwarp).

Generated novel views can be used for 3D reconstruction. In the example below, we reconstructed a 3D scene via [InstantSplat](https://instantsplat.github.io/). We generated the video using [this implementation](https://github.com/ONground-Korea/unofficial-Instantsplat).
## How to use
### Environment
We tested our codes on Ubuntu 20.04 with nVidia A100 GPU. If you're using other machines like Windows, consider using Docker. You can either add packages to your python environment or use Docker to build an python environment. Commands below are all expected to run in the root directory of the repository.
#### Use Docker to build an environment
> [!NOTE]
> You may want to change username and uid variables written in DockerFile. Please check DockerFile before running the commands below.
``` shell
docker build . -t genwarp:latest
docker run --gpus=all -it -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp
```
Inside the docker container, you can install packages as below.
#### Add dependencies to your python environment
We tested the environment with python `>=3.10` and CUDA `=11.8`. To add mandatory dependencies run the command below.
``` shell
pip install -r requirements.txt
```
To run developmental codes such as the example provided in jupyter notebook and the live demo implemented by gradio, add extra dependencies via the command below.
``` shell
pip install -r requirements_dev.txt
```
### Download pretrained models
GenWarp uses pretrained models which consist of both our finetuned models and publicly available third-party ones. Download all the models to `checkpoints` directory or anywhere of your choice. You can do it manually or by the [download_models.sh](scripts/download_models.sh) script.
#### Download script
``` shell
./scripts/download_models.sh ./checkpoints
```
#### Manual download
> [!NOTE]
> Models and checkpoints provided below may be distributed under different licenses. Users are required to check licenses carefully on their behalf.
1. Our finetuned models:
- For details about each model, check out the [model card](https://huggingface.co/Sony/genwarp).
- [multi-dataset model 1](https://huggingface.co/Sony/genwarp)
- download all files into `checkpoints/multi1`
- [multi-dataset model 2](https://huggingface.co/Sony/genwarp)
- download all files into `checkpoints/multi2`
2. Pretrained models:
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)
- download `config.json` and `diffusion_pytorch_model.safetensors` to `checkpoints/sd-vae-ft-mse`
- [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
- download `image_encoder/config.json` and `image_encoder/pytorch_model.bin` to `checkpoints/image_encoder`
The final `checkpoints` directory must look like this:
```
genwarp
└── checkpoints
├── image_encoder
│ ├── config.json
│ └── pytorch_model.bin
├── multi1
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
├── multi2
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
└── sd-vae-ft-mse
├── config.json
└── diffusion_pytorch_model.safetensors
```
### Inference
#### Install MDE module
The model requires depth maps to generate novel views although such a model is not included this repository. To this end, users can install one of Monocular Depth Estimation (MDE) models publicly available.
**ZoeDepth**
We used and therefore recommend [ZoeDepth](https://github.com/isl-org/ZoeDepth).
``` shell
git clone https://github.com/isl-org/ZoeDepth.git extern/ZoeDepth
```
> [!TIP]
> To use ZoeDepth, please install `requirements_dev.txt` for additional packages.
**Depth Anything V2**
More recent models are also available. [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2) is one of the SOTA models of depth estimation. You can use [the metric depth version](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth).
``` shell
git clone https://github.com/DepthAnything/Depth-Anything-V2.git extern/Depth-Anything-V2
```
[Download the models](https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth#pre-trained-models) from their repository. And see the [example notebook](examples/genwarp_inference_dav2.ipynb) for the usage with GenWarp. Notice that they have separate models for indoor and outdoor scenes respectively.
#### API
**Initialisation**
Import GenWarp class and instantiate it with a config. Set the path to the checkpoints directory to `pretrained_model_path` and select the model version in `checkpoint_name`. For more options, check out [GenWarp.py](genwarp/GenWarp.py)
``` python
from genwarp import GenWarp
genwarp_cfg = dict(
pretrained_model_path='./checkpoints',
checkpoint_name='multi1',
half_precision_weights=True
)
genwarp_nvs = GenWarp(cfg=genwarp_full_cfg)
# Load MDE model.
depth_estimator = torch.hub.load(
'./extern/ZoeDepth',
'ZoeD_N',
source='local',
pretrained=True,
trust_repo=True
).to('cuda')
```
**Prepare inputs**
Load the input image and estimate the corresponding depth map. Create camera matrices for the intrinsic and extrinsic parameters. [ops.py](genwarp/ops.py) has helper functions to create matrices.
``` python
from PIL import Image
from torchvision.transforms.functional import to_tensor
src_image = to_tensor(Image.open(image_file).convert('RGB'))[None].cuda()
src_depth = depth_estimator.infer(src_image)
```
``` python
import torch
from genwarp.ops import camera_lookat, get_projection_matrix
proj_mtx = get_projection_matrix(
fovy=fovy,
aspect_wh=1.,
near=near,
far=far
)
src_view_mtx = camera_lookat(
torch.tensor([[0., 0., 0.]]), # From (0, 0, 0)
torch.tensor([[-1., 0., 0.]]), # Cast rays to -x
torch.tensor([[0., 0., 1.]]) # z-up
)
tar_view_mtx = camera_lookat(
torch.tensor([[-0.1, 2., 1.]]), # Camera eye position
torch.tensor([[-5., 0., 0.]]), # Looking at.
z_up # z-up
)
rel_view_mtx = (
tar_view_mtx @ torch.linalg.inv(src_view_mtx.float())
).to(src_image)
```
**Warping**
Call the main function of GenWarp. And check the result.
``` python
renders = genwarp_nvs(
src_image=src_image,
src_depth=src_depth,
rel_view_mtx=rel_view_mtx,
src_proj_mtx=proj_mtx,
tar_proj_mtx=proj_mtx
)
# Outputs.
renders['synthesized'] # Generated image.
renders['warped'] # Depth based warping image (for comparison).
renders['mask'] # Mask image (mask=1 where visible pixels).
renders['correspondence'] # Correspondence map.
```
### Example notebook
We provide a complete example in [genwarp_inference.ipynb](examples/genwarp_inference.ipynb)
To access a Jupyter Notebook running in a docker container, you may need to use the host's network. For further details, please refer to the manual of Docker.
``` shell
docker run --gpus=all -it --net host -v $(pwd):/workspace/genwarp -w /workspace/genwarp genwarp
```
Install `requirements_dev.txt` for additional packages to run the Jupyter Notebook.
### Gradio live demo
An interactive live demo is also available. Start gradio demo by running the command below, and goto [http://127.0.0.1:7860/](http://127.0.0.1:7860/)
If you are running it on the server, be sure to forward the port 7860.
Or you can just visit [Spaces](https://huggingface.co/spaces/Sony/genwarp) hosted by Hugging Face to try it now.
```shell
python app.py
```
## Citation
``` bibtex
@article{seo2024genwarp,
title={GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping},
author={Junyoung Seo and Kazumi Fukuda and Takashi Shibuya and Takuya Narihira and Naoki Murata and Shoukang Hu and Chieh-Hsin Lai and Seungryong Kim and Yuki Mitsufuji},
year={2024},
journal={arXiv preprint arXiv:2405.17251},
}
```
## Acknowledgements
Our codes are based on [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) and other repositories it is based on. We thank the authors of relevant repositories and papers.