{"id":13456081,"url":"https://github.com/lukasHoel/text2room","last_synced_at":"2025-03-24T09:31:27.238Z","repository":{"id":144627336,"uuid":"617044480","full_name":"lukasHoel/text2room","owner":"lukasHoel","description":"Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).","archived":false,"fork":false,"pushed_at":"2023-11-15T15:00:12.000Z","size":9276,"stargazers_count":1016,"open_issues_count":1,"forks_count":71,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-10-28T23:33:52.528Z","etag":null,"topics":["3d-generation","diffusion-models","mesh-generation","text-to-image"],"latest_commit_sha":null,"homepage":"https://lukashoel.github.io/text-to-room/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lukasHoel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-03-21T15:36:42.000Z","updated_at":"2024-10-28T06:04:43.000Z","dependencies_parsed_at":"2024-02-17T10:44:26.219Z","dependency_job_id":null,"html_url":"https://github.com/lukasHoel/text2room","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasHoel%2Ftext2room","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasHoel%2Ftext2room/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasHoel%2Ftext2room/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lukasHoel%2Ftext2room/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lukasHoel","download_url":"https://codeload.github.com/lukasHoel/text2room/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245243283,"owners_count":20583600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-generation","diffusion-models","mesh-generation","text-to-image"],"created_at":"2024-07-31T08:01:15.904Z","updated_at":"2025-03-24T09:31:24.710Z","avatar_url":"https://github.com/lukasHoel.png","language":"Python","funding_links":[],"categories":["Python","其他_机器视觉"],"sub_categories":["网络服务_其他"],"readme":"# Text2Room\nText2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models.\n\nThis is the official repository that contains source code for the ICCV 2023 paper [Text2Room](https://lukashoel.github.io/text-to-room/).\n\n[[arXiv](https://arxiv.org/abs/2303.11989)] [[Project Page](https://lukashoel.github.io/text-to-room/)] [[Video](https://youtu.be/fjRnFL91EZc)]\n\n![Teaser](docs/teaser.jpg \"Text2Room\")\n\nIf you find Text2Room useful for your work please cite:\n```\n@InProceedings{hoellein2023text2room,\n    author    = {H\\\"ollein, Lukas and Cao, Ang and Owens, Andrew and Johnson, Justin and Nie{\\ss}ner, Matthias},\n    title     = {Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models},\n    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},\n    month     = {October},\n    year      = {2023},\n    pages     = {7909-7920}\n}\n```\n\n## Prepare Environment\n\nCreate a conda environment:\n\n```\nconda create -n text2room python=3.9\nconda activate text2room\npip install -r requirements.txt\n```\n\nThen install Pytorch3D by following the [official instructions](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md).\nFor example, to install Pytorch3D on Linux (tested with PyTorch 1.13.1, CUDA 11.7, Pytorch3D 0.7.2):\n\n```\nconda install -c fvcore -c iopath -c conda-forge fvcore iopath\npip install \"git+https://github.com/facebookresearch/pytorch3d.git@stable\"\n```\n\nDownload the pretrained model weights for the fixed depth inpainting model, that we use:\n\n- refer to the [official IronDepth implemention](https://github.com/baegwangbin/IronDepth) to download the files ```normal_scannet.pt``` and ```irondepth_scannet.pt```.\n- place the files under ```text2room/checkpoints```\n\n(Optional) Download the pretrained model weights for the text-to-image model:\n\n- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-inpainting```\n- ```git clone https://huggingface.co/stabilityai/stable-diffusion-2-1```\n- ```ln -s \u003cpath/to/stable-diffusion-2-inpainting\u003e checkpoints```\n- ```ln -s \u003cpath/to/stable-diffusion-2-1\u003e checkpoints```\n\n## Generate a Scene\n\nAs default, we generate a living room scene:\n\n```python generate_scene.py```\n\nOutputs are stored in ```text2room/output```.\n\n### Generated outputs\n\nWe generate the following outputs per generated scene:\n\n```\nMesh Files:\n    \u003coutput_root\u003e/fused_mesh/after_generation.ply: generated mesh after the first stage of our method\n    \u003coutput_root\u003e/fused_mesh/fused_final.ply: generated mesh after the second stage of our method\n    \u003coutput_root\u003e/fused_mesh/x_poisson_meshlab_depth_y.ply: result of applying poisson surface reconstruction on mesh x with depth y\n    \u003coutput_root\u003e/fused_mesh/x_poisson_meshlab_depth_y_quadric_z.ply: result of applying poisson surface reconstruction on mesh x with depth y and then decimating the mesh to have at least z faces\n    \nRenderings:\n    \u003coutput_root\u003e/output_rendering/rendering_t.png: image from pose t, that was rendered from the final mesh\n    \u003coutput_root\u003e/output_rendering/rendering_noise_t.png: image from a slightly different/noised pose t, that was rendered from the final mesh\n    \u003coutput_root\u003e/output_depth/depth_t.png: depth from pose t, that was rendered from the final mesh\n    \u003coutput_root\u003e/output_depth/depth_noise_t.png: depth from a slightly different/noised pose t, that was rendered from the final mesh\n\nMetadata:\n    \u003coutput_root\u003e/settings.json: all arguments used to generate the scene\n    \u003coutput_root\u003e/seen_poses.json: list of all poses in Pytorch3D convention used to render output_rendering (no noise)\n    \u003coutput_root\u003e/seen_poses_noise.json: list of all poses in Pytorch3D convention used to render output_rendering (with noise)\n    \u003coutput_root\u003e/transforms.json: a file in the standard NeRF convention (e.g. see NeRFStudio) that can be used to optimize a NeRF for the generated scene. It refers to the rendered images in output_rendering (no noise).\n```\n\nWe also generate the following intermediate outputs during generation of the scene:\n\n```\n    \u003coutput_root\u003e/fused_mesh/fused_until_frame_t.ply: generated mesh using the content until pose t\n    \u003coutput_root\u003e/rendered/rendered_t.png: image from pose t, that was rendered from mesh_t\n    \u003coutput_root\u003e/mask/mask_t.png: mask from pose t, that signals unobserved regions\n    \u003coutput_root\u003e/mask/mask_eroded_dilated_t.png: mask from pose t, after applying erosion/dilation\n    \u003coutput_root\u003e/rgb/rgb_t.png: image from pose t, that was inpainted with the text-to-image model\n    \u003coutput_root\u003e/depth/rendered_depth_t.png: depth from pose t, that was rendered from mesh_t\n    \u003coutput_root\u003e/depth/depth_t.png: depth from pose t, that was predicted/aligned from rgb_t and rendered_depth_t\n    \u003coutput_root\u003e/rgbd/rgbd_t.png: combination of rgb_t and depth_t placed next to each other\n```\n\n### Create a scene from a fixed start-image\n\nAlready have an in-the-wild image, from which you want to start the generation?\nSpecify it as ```--input_image_path``` and the generated scene kicks-off from there.\n\n```python generate_scene.py --input_image_path sample_data/0.png```\n\n### Create a scene from another room type\n\nGenerate indoor-scenes of arbitrary rooms by specifying another ```--trajectory_file``` as input:\n\n```python generate_scene.py --trajectory_file model/trajectories/examples/bedroom.json```\n\nWe provide a bunch of [example rooms](model/trajectories/examples).\n\n### Customize Generation\n\nWe provide a highly configurable method. See [opt.py](model/utils/opt.py) for a complete list of the configuration options.\n\n### Get creative!\n\nYou can specify your own prompts and camera trajectories by simply creating your own ```trajectory.json``` file.\n\n#### Trajectory Format\n\nEach ```trajectory.json``` file should satisfy the following format:\n\n```\n[\n  {\n    \"prompt\": (str, optional) the prompt to use for this trajectory,\n    \"negative_prompt\": (str, optional) the negative prompt to use for this trajectory,\n    \"n_images\": (int, optional) how many images to render between start and end pose of this trajectory,\n    \"surface_normal_threshold\": (float, optional) the surface_normal_threshold to use for this trajectory\n    \"fn_name\": (str, required) the name of a trajectory_function as specified in model/trajectories/trajectory_util.py\n    \"fn_args\": (dict, optional) {\n      \"a\": value for an argument with name 'a' of fn_name,\n      \"b\": value for an argument with name 'b' of fn_name,\n    },\n    \"adaptive\": (list, optional) [\n      {\n        \"arg\": (str, required) name of an argument of fn_name that represents a float value,\n        \"delta\": (float, required) delta value to add to the argument during adaptive pose search,\n        \"min\": (float, optional) minimum value during search,\n        \"max\": (float, optional) maximum value during search\n      }\n    ]\n  },\n  \n  {... next trajectory with similar structure as above ...}\n]\n```\n\n#### Adding new trajectory functions\n\nWe provide a bunch of predefined trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).\nEach ```trajectory.json``` file is a combination of the provided trajectory functions. \nYou can create custom trajectories by creating new combinations of existing functions.\nYou can also add custom trajectory functions in [trajectory_util.py](model/trajectories/trajectory_util.py).\nFor automatic integration with our codebase, custom trajectory functions should have the following pattern:\n\n```\ndef custom_trajectory_fn(current_step, n_steps, **args):\n    # n_steps: how many poses including start and end pose in this trajectory\n    # current_step: pose in the current trajectory\n    \n    # your custom trajectory function here...\n\ndef custom_trajectory(**args):\n    return _config_fn(custom_trajectory_fn, **args)\n```\n\nThis lets you reference ```custom_trajectory``` as ```fn_name``` in a ```trajectory.json``` file.\n\n## Render an existing scene\n\nWe provide a script that renders images from a mesh at different poses:\n\n```python render_cameras.py -m \u003cpath/to/mesh.ply\u003e -c \u003cpath/to/cameras.json\u003e```\n\nwhere you can provide any cameras in the Pytorch3D convention via ```-c```.\nFor example, to re-render all poses used during generation and completion:\n\n```\npython render_cameras.py \\\n-m \u003coutput_root\u003e/fused_mesh/fused_final_poisson_meshlab_depth_12.ply \\\n-c \u003coutput_root\u003e/seen_poses.json\n```\n\n## Optimize a NeRF\n\nWe provide an easy way to train a NeRF from our generated scene.\nWe save a ```transforms.json``` file in the standard NeRF convention, that can be used to optimize a NeRF for the generated scene.\nIt refers to the rendered images in ```\u003coutput_root\u003e/output_rendering```.\nIt can be used with standard NeRF frameworks like [Instant-NGP](https://github.com/NVlabs/instant-ngp) or [NeRFStudio](https://github.com/nerfstudio-project/nerfstudio).\n\n## Acknowledgements\n\nOur work builds on top of amazing open-source networks and codebases. \nWe thank the authors for providing them.\n\n- [IronDepth](https://github.com/baegwangbin/IronDepth) [1]: a method for monocular depth prediction, that can be used for depth inpainting.\n- [StableDiffusion](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) [2]: a state-of-the-art text-to-image inpainting model with publicly released network weights.\n\n[1] IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty, BMVC 2022, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla\n\n[2] High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FlukasHoel%2Ftext2room","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FlukasHoel%2Ftext2room","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FlukasHoel%2Ftext2room/lists"}