{"id":22068984,"url":"https://github.com/SUDO-AI-3D/zero123plus","last_synced_at":"2025-07-24T07:31:39.992Z","repository":{"id":203216662,"uuid":"705901752","full_name":"SUDO-AI-3D/zero123plus","owner":"SUDO-AI-3D","description":"Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.","archived":false,"fork":false,"pushed_at":"2024-02-23T18:17:53.000Z","size":2498,"stargazers_count":1766,"open_issues_count":25,"forks_count":123,"subscribers_count":29,"default_branch":"main","last_synced_at":"2024-11-21T03:43:04.291Z","etag":null,"topics":["3d","3d-graphics","aigc","diffusers","diffusion-models","image-to-3d","research-project","text-to-3d"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SUDO-AI-3D.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-10-16T23:22:56.000Z","updated_at":"2024-11-19T06:44:16.000Z","dependencies_parsed_at":"2024-01-15T23:09:38.105Z","dependency_job_id":"4d8a32e4-2016-4a7a-9927-41d50a84f1a6","html_url":"https://github.com/SUDO-AI-3D/zero123plus","commit_stats":null,"previous_names":["sudo-ai-3d/zero123plus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SUDO-AI-3D%2Fzero123plus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SUDO-AI-3D%2Fzero123plus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SUDO-AI-3D%2Fzero123plus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SUDO-AI-3D%2Fzero123plus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SUDO-AI-3D","download_url":"https://codeload.github.com/SUDO-AI-3D/zero123plus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227421384,"owners_count":17775011,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d","3d-graphics","aigc","diffusers","diffusion-models","image-to-3d","research-project","text-to-3d"],"created_at":"2024-11-30T20:04:33.353Z","updated_at":"2024-11-30T20:07:25.915Z","avatar_url":"https://github.com/SUDO-AI-3D.png","language":"Python","funding_links":[],"categories":["Paper List"],"sub_categories":["Follow-up Papers"],"readme":"# Zero123++: A Single Image to Consistent Multi-view Diffusion Base Model\n\n![Teaser](resources/teaser-low.jpg)\n\n[\\[Report\\]](https://arxiv.org/abs/2310.15110) \n[\\[Official Demo\\]](https://huggingface.co/spaces/sudo-ai/zero123plus-demo-space) \n[\\[Demo by @yvrjsharma\\]](https://huggingface.co/spaces/ysharma/Zero123PlusDemo) \n[\\[Google Colab\\]](https://colab.research.google.com/drive/1_5ECnTOosRuAsm2tUp0zvBG0DppL-F3V?usp=sharing)\n[\\[Replicate demo\\]](https://replicate.com/jd7h/zero123plusplus)\n\n## UPDATES v1.2\n\nWe are thrilled to release Zero123++ v1.2! Main changes:\n\n+ Camera intrinsics are handled more delibrately. The v1.2 model is more robust to a wider range of input field of views, croppings and unifies the output field of view to **30°** to better reflect that of realistic close-up views.\n+ The fixed set of elevations are changed from 30° and -20° to **20°** and **-10°**.\n+ In contrast with novel-view synthesis, the model focuses more for 3D generation. The model always outputs a set of views assuming a normalized object size instead of changing w.r.t. the input.\n\nAdditionally, we have a **normal generator** ControlNet that can generate view-space normal images. The output can also be used to obtain a more accurate mask than the SAM-based approach. Validation metrics on our validation set from Objaverse: alpha (before matting) IoU 98.81%, mean normal angular error 10.75°, normal PSNR 26.93 dB.\n\n\u003cimg src=\"resources/burger-normal.jpg\" alt=\"Normal\" width=\"480\" /\u003e\n\n### Usage\n\nUse of the v1.2 base model is unchanged. Please see the sections below for usage.\n\n**Use of the normal generator:** See [examples/normal_gen.py](examples/normal_gen.py).\n\nFor **alpha mask generation** from the normal images, please see [examples/matting_postprocess.py](examples/matting_postprocess.py) and [examples/normal_gen.py](examples/normal_gen.py).\n\n### License\n\nThe code is released under Apache 2.0 and the model weights are released under CC-BY-NC 4.0.\n\nThis means that you cannot use the model (or its derivatives) in a commercial product pipeline, but you can still use the outputs from the model freely. And, you are accountable for the output you generate and its subsequent uses.\n\n## Get Started\n\nYou will need `torch` (recommended `2.0` or higher), `diffusers` (recommended `0.20.2`), and `transformers` to start. If you are using `torch` `1.x`, it is recommended to install `xformers` to compute attention in the model efficiently. The code also runs on older versions of `diffusers`, but you may see a decrease in model performance.\n\nAnd you are all set! We provide a custom pipeline for `diffusers`, so no extra code is required.\n\nTo generate multi-view images from a single input image, you can run the following code (also see [examples/img_to_mv.py](examples/img_to_mv.py)):\n\n```python\nimport torch\nimport requests\nfrom PIL import Image\nfrom diffusers import DiffusionPipeline, EulerAncestralDiscreteScheduler\n\n# Load the pipeline\npipeline = DiffusionPipeline.from_pretrained(\n    \"sudo-ai/zero123plus-v1.1\", custom_pipeline=\"sudo-ai/zero123plus-pipeline\",\n    torch_dtype=torch.float16\n)\n\n# Feel free to tune the scheduler!\n# `timestep_spacing` parameter is not supported in older versions of `diffusers`\n# so there may be performance degradations\n# We recommend using `diffusers==0.20.2`\npipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(\n    pipeline.scheduler.config, timestep_spacing='trailing'\n)\npipeline.to('cuda:0')\n\n# Download an example image.\ncond = Image.open(requests.get(\"https://d.skis.ltd/nrp/sample-data/lysol.png\", stream=True).raw)\n\n# Run the pipeline!\nresult = pipeline(cond, num_inference_steps=75).images[0]\n# for general real and synthetic images of general objects\n# usually it is enough to have around 28 inference steps\n# for images with delicate details like faces (real or anime)\n# you may need 75-100 steps for the details to construct\n\nresult.show()\nresult.save(\"output.png\")\n```\n\nThe above example requires ~5GB VRAM to run.\nThe input image needs to be square, and the recommended image resolution is `\u003e=320x320`.\n\nBy default, Zero123++ generates opaque images with a gray background (the `zero` for Stable Diffusion VAE).\nYou may run an extra background removal pass like `rembg` to remove the gray background.\n\n```python\n# !pip install rembg\nimport rembg\nresult = rembg.remove(result)\nresult.show()\n```\n\nTo run the depth ControlNet, you can use the following example (also see [examples/depth_controlnet.py](examples/depth_controlnet.py)):\n\n```python\nimport torch\nimport requests\nfrom PIL import Image\nfrom diffusers import DiffusionPipeline, EulerAncestralDiscreteScheduler, ControlNetModel\n\n# Load the pipeline\npipeline = DiffusionPipeline.from_pretrained(\n    \"sudo-ai/zero123plus-v1.1\", custom_pipeline=\"sudo-ai/zero123plus-pipeline\",\n    torch_dtype=torch.float16\n)\npipeline.add_controlnet(ControlNetModel.from_pretrained(\n    \"sudo-ai/controlnet-zp11-depth-v1\", torch_dtype=torch.float16\n), conditioning_scale=0.75)\n# Feel free to tune the scheduler\npipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(\n    pipeline.scheduler.config, timestep_spacing='trailing'\n)\npipeline.to('cuda:0')\n# Run the pipeline\ncond = Image.open(requests.get(\"https://d.skis.ltd/nrp/sample-data/0_cond.png\", stream=True).raw)\ndepth = Image.open(requests.get(\"https://d.skis.ltd/nrp/sample-data/0_depth.png\", stream=True).raw)\nresult = pipeline(cond, depth_image=depth, num_inference_steps=36).images[0]\nresult.show()\nresult.save(\"output.png\")\n```\n\nThis example requires ~5.7GB VRAM to run.\n\n## Models\n\nThe models are available at [https://huggingface.co/sudo-ai](https://huggingface.co/sudo-ai):\n\n+ `sudo-ai/zero123plus-v1.1`, base Zero123++ model release (v1.1).\n+ `sudo-ai/controlnet-zp11-depth-v1`, depth ControlNet checkpoint release (v1) for Zero123++ (v1.1).\n+ `sudo-ai/zero123plus-v1.2`, base Zero123++ model release (v1.2).\n+ `sudo-ai/controlnet-zp12-normal-gen-v1`, normal generation ControlNet checkpoint release (v1) for Zero123++ (v1.2).\n\nThe source code for the diffusers custom pipeline is available in the [diffusers-support](diffusers-support) directory.\n\n## Camera Parameters\n\nOutput views are a fixed set of camera poses:\n\n+ Azimuth (relative to input view): `30, 90, 150, 210, 270, 330`.\n+ v1.1 Elevation (absolute): `30, -20, 30, -20, 30, -20`.\n+ v1.2 Elevation (absolute): `20, -10, 20, -10, 20, -10`.\n+ v1.2 Field of View (absolute): `30°`.\n\n## Running Demo Locally\n\nYou will need to install extra dependencies:\n```\npip install -r requirements.txt\n```\n\nThen run `streamlit run app.py`.\n\nFor Gradio Demo, you can run `python gradio_app.py`.\n\n## Related Work\n[\\[One-2-3-45\\]](https://github.com/One-2-3-45/One-2-3-45) [\\[One-2-3-45++\\]](https://sudo-ai-3d.github.io/One2345plus_page/) [\\[Zero123\\]](https://github.com/cvlab-columbia/zero123)\n\n## Citation\n\nIf you found Zero123++ helpful, please cite our report:\n```bibtex\n@misc{shi2023zero123plus,\n      title={Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model}, \n      author={Ruoxi Shi and Hansheng Chen and Zhuoyang Zhang and Minghua Liu and Chao Xu and Xinyue Wei and Linghao Chen and Chong Zeng and Hao Su},\n      year={2023},\n      eprint={2310.15110},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSUDO-AI-3D%2Fzero123plus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSUDO-AI-3D%2Fzero123plus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSUDO-AI-3D%2Fzero123plus/lists"}