{"id":13563444,"url":"https://github.com/ashawkey/stable-dreamfusion","last_synced_at":"2025-05-14T03:07:38.538Z","repository":{"id":60907945,"uuid":"546476703","full_name":"ashawkey/stable-dreamfusion","owner":"ashawkey","description":"Text-to-3D \u0026 Image-to-3D \u0026 Mesh Exportation with NeRF + Diffusion.","archived":false,"fork":false,"pushed_at":"2023-12-10T23:17:27.000Z","size":17520,"stargazers_count":8562,"open_issues_count":191,"forks_count":747,"subscribers_count":131,"default_branch":"main","last_synced_at":"2025-04-03T16:33:45.464Z","etag":null,"topics":["dreamfusion","gui","image-to-3d","nerf","stable-diffusion","text-to-3d"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashawkey.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-06T06:18:39.000Z","updated_at":"2025-04-02T16:11:54.000Z","dependencies_parsed_at":"2024-01-07T03:51:51.159Z","dependency_job_id":"c8c26cfb-6777-446b-a5c9-94f38a4e150a","html_url":"https://github.com/ashawkey/stable-dreamfusion","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashawkey%2Fstable-dreamfusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashawkey%2Fstable-dreamfusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashawkey%2Fstable-dreamfusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashawkey%2Fstable-dreamfusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashawkey","download_url":"https://codeload.github.com/ashawkey/stable-dreamfusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248288089,"owners_count":21078848,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dreamfusion","gui","image-to-3d","nerf","stable-diffusion","text-to-3d"],"created_at":"2024-08-01T13:01:19.328Z","updated_at":"2025-04-10T20:05:17.704Z","avatar_url":"https://github.com/ashawkey.png","language":"Python","funding_links":[],"categories":["Python","\u003cspan id=\"model\"\u003e3D Model\u003c/span\u003e","Image Synthesis","Uncategorized","Prompt-based Virtual Try-on","Implementations","AI Text to 3D Models:","其他_机器视觉","👑Stable Diffusion","Repos","基于提示词的虚拟试穿\u003ca name='Prompt-based-Virtual-Try-on'\u003e\u003c/a\u003e","Training","Stable Diffusion","Image Generation \u0026 Editing","AI \u0026 Machine Learning for CG","📦 Legacy \u0026 Inactive Projects"],"sub_categories":["\u003cspan id=\"tool\"\u003eLLM (LLM \u0026 Tool)\u003c/span\u003e","Inbox: Stable Diffusion","Uncategorized","网络服务_其他","Python","Specialized Usecases","Image Generation"],"readme":"# Stable-Dreamfusion\n\nA pytorch implementation of the text-to-3D model **Dreamfusion**, powered by the [Stable Diffusion](https://github.com/CompVis/stable-diffusion) text-to-2D model.\n\n**ADVERTISEMENT: Please check out [threestudio](https://github.com/threestudio-project/threestudio) for recent improvements and better implementation in 3D content generation!**\n\n**NEWS (2023.6.12)**:\n\n* Support of [Perp-Neg](https://perp-neg.github.io/) to alleviate multi-head problem in Text-to-3D.\n* Support of Perp-Neg for both [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [DeepFloyd-IF](https://github.com/deep-floyd/IF).\n\nhttps://user-images.githubusercontent.com/25863658/236712982-9f93bd32-83bf-423a-bb7c-f73df7ece2e3.mp4\n\nhttps://user-images.githubusercontent.com/25863658/232403162-51b69000-a242-4b8c-9cd9-4242b09863fa.mp4\n\n### [Update Logs](assets/update_logs.md)\n\n### Colab notebooks:\n* Instant-NGP backbone (`-O`): [![Instant-NGP Backbone](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MXT3yfOFvO0ooKEfiUUvTKwUkrrlCHpF?usp=sharing)\n\n* Vanilla NeRF backbone (`-O2`): [![Vanilla Backbone](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1mvfxG-S_n_gZafWoattku7rLJ2kPoImL?usp=sharing)\n\n# Important Notice\nThis project is a **work-in-progress**, and contains lots of differences from the paper. **The current generation quality cannot match the results from the original paper, and many prompts still fail badly!**\n\n## Notable differences from the paper\n* Since the Imagen model is not publicly available, we use [Stable Diffusion](https://github.com/CompVis/stable-diffusion) to replace it (implementation from [diffusers](https://github.com/huggingface/diffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training.\n* We use the [multi-resolution grid encoder](https://github.com/NVlabs/instant-ngp/) to implement the NeRF backbone (implementation from [torch-ngp](https://github.com/ashawkey/torch-ngp)), which enables much faster rendering (~10FPS at 800x800).\n* We use the [Adan](https://github.com/sail-sg/Adan) optimizer as default.\n\n# Install\n\n```bash\ngit clone https://github.com/ashawkey/stable-dreamfusion.git\ncd stable-dreamfusion\n```\n\n### Optional: create a python virtual environment\n\nTo avoid python package conflicts, we recommend using a virtual environment, e.g.: using conda or venv:\n\n```bash\npython -m venv venv_stable-dreamfusion\nsource venv_stable-dreamfusion/bin/activate # you need to repeat this step for every new terminal\n```\n\n### Install with pip\n\n```bash\npip install -r requirements.txt\n```\n\n### Download pre-trained models\n\nTo use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:\n* [Zero-1-to-3](https://github.com/cvlab-columbia/zero123) for diffusion backend.\n    We use `zero123-xl.ckpt` by default, and it is hard-coded in `guidance/zero123_utils.py`.\n    ```bash\n    cd pretrained/zero123\n    wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt\n    ```\n* [Omnidata](https://github.com/EPFL-VILAB/omnidata/tree/main/omnidata_tools/torch) for depth and normal prediction.\n    These ckpts are hardcoded in `preprocess_image.py`.\n    ```bash\n    mkdir pretrained/omnidata\n    cd pretrained/omnidata\n    # assume gdown is installed\n    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI\u0026confirm=t' # omnidata_dpt_depth_v2.ckpt\n    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR\u0026confirm=t' # omnidata_dpt_normal_v2.ckpt\n    ```\n\nTo use [DeepFloyd-IF](https://github.com/deep-floyd/IF), you need to accept the usage conditions from [hugging face](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0), and login with `huggingface-cli login` in command line.\n\nFor DMTet, we port the pre-generated `32/64/128` resolution tetrahedron grids under `tets`.\nThe 256 resolution one can be found [here](https://drive.google.com/file/d/1lgvEKNdsbW5RS4gVxJbgBS4Ac92moGSa/view?usp=sharing).\n\n### Build extension (optional)\nBy default, we use [`load`](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime.\nWe also provide the `setup.py` to build each extension:\n```bash\ncd stable-dreamfusion\n\n# install all extension modules\nbash scripts/install_ext.sh\n\n# if you want to install manually, here is an example:\npip install ./raymarching # install to python path (you still need the raymarching/ folder, since this only installs the built extension.)\n```\n\n### Taichi backend (optional)\nUse [Taichi](https://github.com/taichi-dev/taichi) backend for Instant-NGP. It achieves comparable performance to CUDA implementation while **No CUDA** build is required. Install Taichi with pip:\n```bash\npip install -i https://pypi.taichi.graphics/simple/ taichi-nightly\n```\n\n### Trouble Shooting:\n* we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., `pip install -U diffusers`). If the problem still holds, [reporting a bug issue](https://github.com/ashawkey/stable-dreamfusion/issues/new?assignees=\u0026labels=bug\u0026template=bug_report.yaml\u0026title=%3Ctitle%3E) will be appreciated!\n* `[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)`: this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in https://github.com/ashawkey/stable-dreamfusion/issues/131 if you are using a headless server.\n* `TypeError: xxx_forward(): incompatible function arguments`： this happens when we update the CUDA source and you used `setup.py` to install the extensions earlier. Try to re-install the corresponding extension (e.g., `pip install ./gridencoder`).\n\n### Tested environments\n* Ubuntu 22 with torch 1.12 \u0026 CUDA 11.6 on a V100.\n\n# Usage\n\nFirst time running will take some time to compile the CUDA extensions.\n\n```bash\n#### stable-dreamfusion setting\n\n### Instant-NGP NeRF Backbone\n# + faster rendering speed\n# + less GPU memory (~16G)\n# - need to build CUDA extensions (a CUDA-free Taichi backend is available)\n\n## train with text prompt (with the default settings)\n# `-O` equals `--cuda_ray --fp16`\n# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.\npython main.py --text \"a hamburger\" --workspace trial -O\n\n# reduce stable-diffusion memory usage with `--vram_O`\n# enable various vram savings (https://huggingface.co/docs/diffusers/optimization/fp16).\npython main.py --text \"a hamburger\" --workspace trial -O --vram_O\n\n# You can collect arguments in a file. You can override arguments by specifying them after `--file`. Note that quoted strings can't be loaded from .args files...\npython main.py --file scripts/res64.args --workspace trial_awesome_hamburger --text \"a photo of an awesome hamburger\"\n\n# use CUDA-free Taichi backend with `--backbone grid_taichi`\npython3 main.py --text \"a hamburger\" --workspace trial -O --backbone grid_taichi\n\n# choose stable-diffusion version (support 1.5, 2.0 and 2.1, default is 2.1 now)\npython main.py --text \"a hamburger\" --workspace trial -O --sd_version 1.5\n\n# use a custom stable-diffusion checkpoint from hugging face:\npython main.py --text \"a hamburger\" --workspace trial -O --hf_key andite/anything-v4.0\n\n# use DeepFloyd-IF for guidance (experimental):\npython main.py --text \"a hamburger\" --workspace trial -O --IF\npython main.py --text \"a hamburger\" --workspace trial -O --IF --vram_O # requires ~24G GPU memory\n\n# we also support negative text prompt now:\npython main.py --text \"a rose\" --negative \"red\" --workspace trial -O\n\n## after the training is finished:\n# test (exporting 360 degree video)\npython main.py --workspace trial -O --test\n# also save a mesh (with obj, mtl, and png texture)\npython main.py --workspace trial -O --test --save_mesh\n# test with a GUI (free view control!)\npython main.py --workspace trial -O --test --gui\n\n### Vanilla NeRF backbone\n# + pure pytorch, no need to build extensions!\n# - slow rendering speed\n# - more GPU memory\n\n## train\n# `-O2` equals `--backbone vanilla`\npython main.py --text \"a hotdog\" --workspace trial2 -O2\n\n# if CUDA OOM, try to reduce NeRF sampling steps (--num_steps and --upsample_steps)\npython main.py --text \"a hotdog\" --workspace trial2 -O2 --num_steps 64 --upsample_steps 0\n\n## test\npython main.py --workspace trial2 -O2 --test\npython main.py --workspace trial2 -O2 --test --save_mesh\npython main.py --workspace trial2 -O2 --test --gui # not recommended, FPS will be low.\n\n### DMTet finetuning\n\n## use --dmtet and --init_with \u003cnerf checkpoint\u003e to finetune the mesh at higher reslution\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial/checkpoints/df.pth\n\n## init dmtet with a mesh to generate texture\n# require install of cubvh: pip install git+https://github.com/ashawkey/cubvh\n# remove --lock_geo to also finetune geometry, but performance may be bad.\npython main.py -O --text \"a white bunny with red eyes\" --workspace trial_dmtet_mesh --dmtet --iters 5000 --init_with ./data/bunny.obj --lock_geo\n\n## test \u0026 export the mesh\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --save_mesh\n\n## gui to visualize dmtet\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --gui\n\n### Image-conditioned 3D Generation\n\n## preprocess input image\n# note: the results of image-to-3D is dependent on zero-1-to-3's capability. For best performance, the input image should contain a single front-facing object, it should have square aspect ratio, with \u003c1024 pixel resolution. Check the examples under ./data.\n# this will exports `\u003cimage\u003e_rgba.png`, `\u003cimage\u003e_depth.png`, and `\u003cimage\u003e_normal.png` to the directory containing the input image.\npython preprocess_image.py \u003cimage\u003e.png\npython preprocess_image.py \u003cimage\u003e.png --border_ratio 0.4 # increase border_ratio if the center object appears too large and results are unsatisfying.\n\n## zero123 train\n# pass in the processed \u003cimage\u003e_rgba.png by --image and do NOT pass in --text to enable zero-1-to-3 backend.\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image --iters 5000\n\n# if the image is not exactly front-view (elevation = 0), adjust default_polar (we use polar from 0 to 180 to represent elevation from 90 to -90)\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image --iters 5000 --default_polar 80\n\n# by default we leverage monocular depth estimation to aid image-to-3d, but if you find the depth estimation inaccurate and harms results, turn it off by:\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image --iters 5000 --lambda_depth 0\n\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image_dmtet --dmtet --init_with trial_image/checkpoints/df.pth\n\n## zero123 with multiple images\npython main.py -O --image_config config/\u003cconfig\u003e.csv --workspace trial_image --iters 5000\n\n## render \u003cnum\u003e images per batch (default 1)\npython main.py -O --image_config config/\u003cconfig\u003e.csv --workspace trial_image --iters 5000 --batch_size 4\n\n# providing both --text and --image enables stable-diffusion backend (similar to make-it-3d)\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text --iters 5000\n\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text_dmtet --dmtet --init_with trial_image_text/checkpoints/df.pth\n\n## test / visualize\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image_dmtet --dmtet --test --save_mesh\npython main.py -O --image \u003cimage\u003e_rgba.png --workspace trial_image_dmtet --dmtet --test --gui\n\n### Debugging\n\n# Can save guidance images for debugging purposes. These get saved in trial_hamburger/guidance.\n# Warning: this slows down training considerably and consumes lots of disk space!\npython main.py --text \"a hamburger\" --workspace trial_hamburger -O --vram_O --save_guidance --save_guidance_interval 5 # save every 5 steps\n```\n\nFor example commands, check [`scripts`](./scripts).\n\nFor advanced tips and other developing stuff, check [Advanced Tips](./assets/advanced.md).\n\n# Evalutation\n\nReproduce the paper CLIP R-precision evaluation\n\nAfter the testing part in the usage, the validation set containing projection from different angle is generated. Test the R-precision between prompt and the image.(R=1)\n\n```bash\npython r_precision.py --text \"a snake is flying in the sky\" --workspace snake_HQ --latest ep0100 --mode depth --clip clip-ViT-B-16\n```\n\n# Acknowledgement\n\nThis work is based on an increasing list of amazing research works and open-source projects, thanks a lot to all the authors for sharing!\n\n* [DreamFusion: Text-to-3D using 2D Diffusion](https://dreamfusion3d.github.io/)\n    ```\n    @article{poole2022dreamfusion,\n        author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},\n        title = {DreamFusion: Text-to-3D using 2D Diffusion},\n        journal = {arXiv},\n        year = {2022},\n    }\n    ```\n\n* [Magic3D: High-Resolution Text-to-3D Content Creation](https://research.nvidia.com/labs/dir/magic3d/)\n   ```\n   @inproceedings{lin2023magic3d,\n      title={Magic3D: High-Resolution Text-to-3D Content Creation},\n      author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},\n      booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},\n      year={2023}\n    }\n   ```\n\n* [Zero-1-to-3: Zero-shot One Image to 3D Object](https://github.com/cvlab-columbia/zero123)\n    ```\n    @misc{liu2023zero1to3,\n        title={Zero-1-to-3: Zero-shot One Image to 3D Object},\n        author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},\n        year={2023},\n        eprint={2303.11328},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n    ```\n    \n* [Perp-Neg: Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond](https://perp-neg.github.io/)\n    ```\n    @article{armandpour2023re,\n      title={Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond},\n      author={Armandpour, Mohammadreza and Zheng, Huangjie and Sadeghian, Ali and Sadeghian, Amir and Zhou, Mingyuan},\n      journal={arXiv preprint arXiv:2304.04968},\n      year={2023}\n    }\n    ```\n    \n* [RealFusion: 360° Reconstruction of Any Object from a Single Image](https://github.com/lukemelas/realfusion)\n    ```\n    @inproceedings{melaskyriazi2023realfusion,\n        author = {Melas-Kyriazi, Luke and Rupprecht, Christian and Laina, Iro and Vedaldi, Andrea},\n        title = {RealFusion: 360 Reconstruction of Any Object from a Single Image},\n        booktitle={CVPR}\n        year = {2023},\n        url = {https://arxiv.org/abs/2302.10663},\n    }\n    ```\n\n* [Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation](https://fantasia3d.github.io/)\n    ```\n    @article{chen2023fantasia3d,\n        title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation},\n        author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},\n        journal={arXiv preprint arXiv:2303.13873},\n        year={2023}\n    }\n    ```\n\n* [Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior](https://make-it-3d.github.io/)\n    ```\n    @article{tang2023make,\n        title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},\n        author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},\n        journal={arXiv preprint arXiv:2303.14184},\n        year={2023}\n    }\n    ```\n\n* [Stable Diffusion](https://github.com/CompVis/stable-diffusion) and the [diffusers](https://github.com/huggingface/diffusers) library.\n\n    ```\n    @misc{rombach2021highresolution,\n        title={High-Resolution Image Synthesis with Latent Diffusion Models},\n        author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n        year={2021},\n        eprint={2112.10752},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n\n    @misc{von-platen-etal-2022-diffusers,\n        author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},\n        title = {Diffusers: State-of-the-art diffusion models},\n        year = {2022},\n        publisher = {GitHub},\n        journal = {GitHub repository},\n        howpublished = {\\url{https://github.com/huggingface/diffusers}}\n    }\n    ```\n\n* The GUI is developed with [DearPyGui](https://github.com/hoffstadt/DearPyGui).\n\n* Puppy image from : https://www.pexels.com/photo/high-angle-photo-of-a-corgi-looking-upwards-2664417/\n\n* Anya images from : https://www.goodsmile.info/en/product/13301/POP+UP+PARADE+Anya+Forger.html\n\n# Citation\n\nIf you find this work useful, a citation will be appreciated via:\n```\n@misc{stable-dreamfusion,\n    Author = {Jiaxiang Tang},\n    Year = {2022},\n    Note = {https://github.com/ashawkey/stable-dreamfusion},\n    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashawkey%2Fstable-dreamfusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashawkey%2Fstable-dreamfusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashawkey%2Fstable-dreamfusion/lists"}