Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lukasHoel/text2room
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
https://github.com/lukasHoel/text2room
3d-generation diffusion-models mesh-generation text-to-image
Last synced: 11 days ago
JSON representation
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
- Host: GitHub
- URL: https://github.com/lukasHoel/text2room
- Owner: lukasHoel
- License: mit
- Created: 2023-03-21T15:36:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-15T15:00:12.000Z (12 months ago)
- Last Synced: 2024-08-01T08:19:05.117Z (3 months ago)
- Topics: 3d-generation, diffusion-models, mesh-generation, text-to-image
- Language: Python
- Homepage: https://lukashoel.github.io/text-to-room/
- Size: 8.85 MB
- Stars: 998
- Watchers: 10
- Forks: 66
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - lukasHoel/text2room - Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023). (Python)
- StarryDivineSky - lukasHoel/text2room
README
# Text2Room
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models.This is the official repository that contains source code for the ICCV 2023 paper [Text2Room](https://lukashoel.github.io/text-to-room/).
[[arXiv](https://arxiv.org/abs/2303.11989)] [[Project Page](https://lukashoel.github.io/text-to-room/)] [[Video](https://youtu.be/fjRnFL91EZc)]
![Teaser](docs/teaser.jpg "Text2Room")
If you find Text2Room useful for your work please cite:
```
@InProceedings{hoellein2023text2room,
author = {H\"ollein, Lukas and Cao, Ang and Owens, Andrew and Johnson, Justin and Nie{\ss}ner, Matthias},
title = {Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {7909-7920}
}
```## Prepare Environment
Create a conda environment:
```
conda create -n text2room python=3.9
conda activate text2room
pip install -r requirements.txt
```Then install Pytorch3D by following the [official instructions](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md).
For example, to install Pytorch3D on Linux (tested with PyTorch 1.13.1, CUDA 11.7, Pytorch3D 0.7.2):```
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
```Download the pretrained model weights for the fixed depth inpainting model, that we use:
- refer to the [official IronDepth implemention](https://github.com/baegwangbin/IronDepth) to download the files ```normal_scannet.pt``` and ```irondepth_scannet.pt```.
- place the files under ```text2room/checkpoints```(Optional) Download the pretrained model weights for the text-to-image model:
- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-inpainting```
- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-1```
- ```ln -s checkpoints```
- ```ln -s checkpoints```## Generate a Scene
As default, we generate a living room scene:
```python generate_scene.py```
Outputs are stored in ```text2room/output```.
### Generated outputs
We generate the following outputs per generated scene:
```
Mesh Files:
/fused_mesh/after_generation.ply: generated mesh after the first stage of our method
/fused_mesh/fused_final.ply: generated mesh after the second stage of our method
/fused_mesh/x_poisson_meshlab_depth_y.ply: result of applying poisson surface reconstruction on mesh x with depth y
/fused_mesh/x_poisson_meshlab_depth_y_quadric_z.ply: result of applying poisson surface reconstruction on mesh x with depth y and then decimating the mesh to have at least z faces
Renderings:
/output_rendering/rendering_t.png: image from pose t, that was rendered from the final mesh
/output_rendering/rendering_noise_t.png: image from a slightly different/noised pose t, that was rendered from the final mesh
/output_depth/depth_t.png: depth from pose t, that was rendered from the final mesh
/output_depth/depth_noise_t.png: depth from a slightly different/noised pose t, that was rendered from the final meshMetadata:
/settings.json: all arguments used to generate the scene
/seen_poses.json: list of all poses in Pytorch3D convention used to render output_rendering (no noise)
/seen_poses_noise.json: list of all poses in Pytorch3D convention used to render output_rendering (with noise)
/transforms.json: a file in the standard NeRF convention (e.g. see NeRFStudio) that can be used to optimize a NeRF for the generated scene. It refers to the rendered images in output_rendering (no noise).
```We also generate the following intermediate outputs during generation of the scene:
```
/fused_mesh/fused_until_frame_t.ply: generated mesh using the content until pose t
/rendered/rendered_t.png: image from pose t, that was rendered from mesh_t
/mask/mask_t.png: mask from pose t, that signals unobserved regions
/mask/mask_eroded_dilated_t.png: mask from pose t, after applying erosion/dilation
/rgb/rgb_t.png: image from pose t, that was inpainted with the text-to-image model
/depth/rendered_depth_t.png: depth from pose t, that was rendered from mesh_t
/depth/depth_t.png: depth from pose t, that was predicted/aligned from rgb_t and rendered_depth_t
/rgbd/rgbd_t.png: combination of rgb_t and depth_t placed next to each other
```### Create a scene from a fixed start-image
Already have an in-the-wild image, from which you want to start the generation?
Specify it as ```--input_image_path``` and the generated scene kicks-off from there.```python generate_scene.py --input_image_path sample_data/0.png```
### Create a scene from another room type
Generate indoor-scenes of arbitrary rooms by specifying another ```--trajectory_file``` as input:
```python generate_scene.py --trajectory_file model/trajectories/examples/bedroom.json```
We provide a bunch of [example rooms](model/trajectories/examples).
### Customize Generation
We provide a highly configurable method. See [opt.py](model/utils/opt.py) for a complete list of the configuration options.
### Get creative!
You can specify your own prompts and camera trajectories by simply creating your own ```trajectory.json``` file.
#### Trajectory Format
Each ```trajectory.json``` file should satisfy the following format:
```
[
{
"prompt": (str, optional) the prompt to use for this trajectory,
"negative_prompt": (str, optional) the negative prompt to use for this trajectory,
"n_images": (int, optional) how many images to render between start and end pose of this trajectory,
"surface_normal_threshold": (float, optional) the surface_normal_threshold to use for this trajectory
"fn_name": (str, required) the name of a trajectory_function as specified in model/trajectories/trajectory_util.py
"fn_args": (dict, optional) {
"a": value for an argument with name 'a' of fn_name,
"b": value for an argument with name 'b' of fn_name,
},
"adaptive": (list, optional) [
{
"arg": (str, required) name of an argument of fn_name that represents a float value,
"delta": (float, required) delta value to add to the argument during adaptive pose search,
"min": (float, optional) minimum value during search,
"max": (float, optional) maximum value during search
}
]
},
{... next trajectory with similar structure as above ...}
]
```#### Adding new trajectory functions
We provide a bunch of predefined trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).
Each ```trajectory.json``` file is a combination of the provided trajectory functions.
You can create custom trajectories by creating new combinations of existing functions.
You can also add custom trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).
For automatic integration with our codebase, custom trajectory functions should have the following pattern:```
def custom_trajectory_fn(current_step, n_steps, **args):
# n_steps: how many poses including start and end pose in this trajectory
# current_step: pose in the current trajectory
# your custom trajectory function here...def custom_trajectory(**args):
return _config_fn(custom_trajectory_fn, **args)
```This lets you reference ```custom_trajectory``` as ```fn_name``` in a ```trajectory.json``` file.
## Render an existing scene
We provide a script that renders images from a mesh at different poses:
```python render_cameras.py -m -c ```
where you can provide any cameras in the Pytorch3D convention via ```-c```.
For example, to re-render all poses used during generation and completion:```
python render_cameras.py \
-m /fused_mesh/fused_final_poisson_meshlab_depth_12.ply \
-c /seen_poses.json
```## Optimize a NeRF
We provide an easy way to train a NeRF from our generated scene.
We save a ```transforms.json``` file in the standard NeRF convention, that can be used to optimize a NeRF for the generated scene.
It refers to the rendered images in ```/output_rendering```.
It can be used with standard NeRF frameworks like [Instant-NGP](https://github.com/NVlabs/instant-ngp) or [NeRFStudio](https://github.com/nerfstudio-project/nerfstudio).## Acknowledgements
Our work builds on top of amazing open-source networks and codebases.
We thank the authors for providing them.- [IronDepth](https://github.com/baegwangbin/IronDepth) [1]: a method for monocular depth prediction, that can be used for depth inpainting.
- [StableDiffusion](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) [2]: a state-of-the-art text-to-image inpainting model with publicly released network weights.[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla
[2] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer