{"id":49947116,"url":"https://github.com/TencentARC/Pixal3D","last_synced_at":"2026-05-21T15:01:26.623Z","repository":{"id":357269896,"uuid":"1234746825","full_name":"TencentARC/Pixal3D","owner":"TencentARC","description":"[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images","archived":false,"fork":false,"pushed_at":"2026-05-20T16:18:57.000Z","size":80081,"stargazers_count":1131,"open_issues_count":13,"forks_count":95,"subscribers_count":12,"default_branch":"master","last_synced_at":"2026-05-20T21:45:25.180Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ldyang694.github.io/projects/pixal3d/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-10T15:35:28.000Z","updated_at":"2026-05-20T21:14:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/TencentARC/Pixal3D","commit_stats":null,"previous_names":["tencentarc/pixal3d"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/Pixal3D","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FPixal3D","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FPixal3D/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FPixal3D/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FPixal3D/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/Pixal3D/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FPixal3D/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33305277,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"ssl_error","status_checked_at":"2026-05-21T12:22:11.673Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-17T16:00:40.688Z","updated_at":"2026-05-21T15:01:26.617Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":["Python","大厂开源","Modeling/Texture - Flow-matching DiT For Mesh Generation \u0026 Texture Generation","AI \u0026 Machine Learning for CG"],"sub_categories":["腾讯开源","3D Generation"],"readme":"\n\u003cdiv align=\"center\"\u003e\n\n# Pixal3D: Pixel-Aligned 3D Generation from Images\n\n\u003ch3\u003eSIGGRAPH 2026\u003c/h3\u003e\n\n\u003csmall\u003e[Dong-Yang Li](https://ldyang694.github.io/)¹ · [Wang Zhao](https://thuzhaowang.github.io/)²* · [Yuxin Chen](https://orcid.org/0000-0002-7854-1072)² · [Wenbo Hu](https://wbhu.github.io/)² · [Meng-Hao Guo](https://menghaoguo.github.io/)¹ · [Fang-Lue Zhang](https://fanglue.github.io/)³ · [Ying Shan](https://www.linkedin.com/in/YingShanProfile)² · [Shi-Min Hu](https://cg.cs.tsinghua.edu.cn/shimin.htm)¹✉\u003c/small\u003e\n\n¹Tsinghua University (BNRist) \u0026nbsp;\u0026nbsp; ²Tencent ARC Lab \u0026nbsp;\u0026nbsp; ³Victoria University of Wellington\n\n*Project lead \u0026nbsp;\u0026nbsp; ✉Corresponding author\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://ldyang694.github.io/projects/pixal3d/\"\u003e\u003cimg src=https://img.shields.io/badge/Project%20Page-333399.svg?logo=googlehome height=22px\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/spaces/TencentARC/Pixal3D\"\u003e\u003cimg src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/TencentARC/Pixal3D\"\u003e\u003cimg src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px\u003e\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/2605.10922\"\u003e\u003cimg src=https://img.shields.io/badge/Arxiv-b5212f.svg?logo=arxiv height=22px\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=https://img.shields.io/badge/License-MIT-yellow.svg height=22px\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"assets/teaser.png\" alt=\"Teaser image of Pixal3D\"/\u003e\n\u003c/div\u003e\n\n**Pixal3D** generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.\n\n---\n\n## ✨ News\n\n- **May 2026**: Release training code and data preparation toolkit. 🔧\n- **May 2026**: Release the improved version based on [Trellis.2](https://github.com/microsoft/TRELLIS.2) backbone. 💪\n- **May 2026**: Release inference code and online demo. 🤗\n- **Apr 2026**: Our paper is accepted to SIGGRAPH 2026! 🎉\n\n## 📌 Branches\n\n| Branch | Description |\n|--------|-------------|\n| `main` | **Latest version** — improved implementation based on [Trellis.2](https://github.com/microsoft/TRELLIS.2) backbone with better performance. |\n| `paper` | **Paper version** — original implementation based on [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2), corresponding to results reported in our SIGGRAPH 2026 paper. |\n\n\u003e If you want to reproduce the results in our paper, please switch to the `paper` branch.\n\n## 🎮 Try It Online\n\nYou can try Pixal3D directly in your browser without any installation via our Hugging Face Gradio demo:\n\n👉 [**Launch Demo**](https://huggingface.co/spaces/TencentARC/Pixal3D)\n\n## 🚀 Getting Started\n\n### Installation\n\n#### Step 1: Follow TRELLIS.2 Installation\n\nPlease first follow the installation guide of [TRELLIS.2](https://github.com/microsoft/TRELLIS.2) to set up the base environment.\n\n#### Step 2: Install Additional Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n#### Step 3: Install natten\n\n```bash\nNATTEN_CUDA_ARCH=\"xx\" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation\n```\n\nPlease replace `xx` with the CUDA architecture and the number of build workers suitable for your machine.\n\n#### Step 4: Install utils3d\n\n```bash\npip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl\n```\n\n\u003e **Note**: `requirements-hfdemo.txt` is for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.\n\n### Usage\n\n#### Inference\n\nGenerate a GLB mesh from a single image:\n\n```bash\npython inference.py --image assets/images/0_img.png --output ./output.glb\n```\n\n**Low-VRAM mode** (reduces peak VRAM by loading models on-demand):\n\n```bash\npython inference.py --image assets/images/0_img.png --output ./output.glb --low_vram\n```\n\nBy default, the pipeline resolution is **1536** (standard mode) or **1024** (low-VRAM mode). You can override this with `--resolution`:\n\n```bash\n# Force 1536 even in low-VRAM mode\npython inference.py --image assets/images/0_img.png --output ./output.glb --low_vram --resolution 1536\n\n# Force 1024 in standard mode\npython inference.py --image assets/images/0_img.png --output ./output.glb --resolution 1024\n```\n\n**Tip**: If you don't have `flash_attn` installed, you can use PyTorch's built-in SDPA backend instead:\n\u003e ```bash\n\u003e ATTN_BACKEND=sdpa python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram\n\u003e ```\n\n### Web Demo\n\nWe provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.\n\n```bash\npython app.py \n```\n\nLow-VRAM mode is also available for the web demo. The frontend default resolution will automatically switch to 1024 in low-VRAM mode (1536 otherwise), but can be changed manually in the UI.\n\n```bash\npython app.py --low_vram\n# or via environment variable:\nLOW_VRAM=1 python app.py\n```\n## 🔧 Training\n\nWe provide the full training codebase for reproducing Pixal3D from scratch.\n\n### Data Preparation\n\nPrepare view-aligned O-Voxel data and rendered condition images by following the data toolkit instructions:\n\n\u003e 📂 **[data_toolkit/README.md](data_toolkit/README.md)**\n\n### Overview\n\nPixal3D is trained as a three-stage cascade, each progressively increasing resolution:\n\n| Stage | Model | Resolutions | Config Prefix |\n|-------|-------|-------------|---------------|\n| 1 | Sparse Structure | 32 → 64 | `ss_flow_img_dit_*_proj_finetune` |\n| 2 | Shape | 256 → 512 → 1024 | `slat_flow_img2shape_*_proj_finetune` |\n| 3 | Texture | 256 → 512 → 1024 | `slat_flow_imgshape2tex_*_proj_finetune` |\n\nAll stages use **pixel-aligned projection conditioning** and **view-aligned latents** (2 views by default). Within each stage, start from the lowest resolution and progressively fine-tune to higher resolutions by setting `finetune_ckpt` in the config.\n\n### Quick Start\n\n```sh\npython train.py \\\n  --config \u003cCONFIG_JSON\u003e \\\n  --output_dir \u003cOUTPUT_DIR\u003e \\\n  --data_dir '\u003cDATA_DIR_JSON\u003e'\n```\n\n`--data_dir` is a JSON string describing the dataset layout. Different stages require different keys:\n\n| Stage | Required keys |\n|-------|---------------|\n| Sparse Structure | `base`, `ss_latent`, `render_cond` |\n| Shape | `base`, `shape_latent`, `render_cond` |\n| Texture | `base`, `shape_latent`, `pbr_latent`, `render_cond` |\n\n### Example: Training All Three Stages\n\nBelow we show the full training sequence using ObjaverseXL as an example. Each higher-resolution step requires updating `finetune_ckpt` in its config JSON to point to the previous checkpoint.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eStage 1: Sparse Structure (32 → 64)\u003c/b\u003e\u003c/summary\u003e\n\n```sh\n# Resolution 32\npython train.py \\\n  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune.json \\\n  --output_dir results/ss_32 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"ss_latent\": \"datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n\n# Resolution 64 (set finetune_ckpt → results/ss_32 checkpoint)\npython train.py \\\n  --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune_ft64.json \\\n  --output_dir results/ss_ft64 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"ss_latent\": \"datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eStage 2: Shape (256 → 512 → 1024)\u003c/b\u003e\u003c/summary\u003e\n\n```sh\n# Resolution 256\npython train.py \\\n  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune.json \\\n  --output_dir results/shape_256 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n\n# Resolution 512\npython train.py \\\n  --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune_ft512.json \\\n  --output_dir results/shape_ft512 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n\n# Resolution 1024\npython train.py \\\n  --config configs/gen/slat_flow_img2shape_dit_1_3B_512_bf16_proj_finetune_ft1024.json \\\n  --output_dir results/shape_ft1024 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eStage 3: Texture (256 → 512 → 1024)\u003c/b\u003e\u003c/summary\u003e\n\n```sh\n# Resolution 256\npython train.py \\\n  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_256_bf16_proj_finetune.json \\\n  --output_dir results/tex_256 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view\", \"pbr_latent\": \"datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_256_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n\n# Resolution 512\npython train.py \\\n  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune.json \\\n  --output_dir results/tex_512 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view\", \"pbr_latent\": \"datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_512_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n\n# Resolution 1024\npython train.py \\\n  --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune_ft1024.json \\\n  --output_dir results/tex_ft1024 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view\", \"pbr_latent\": \"datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_1024_view\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}'\n```\n\u003c/details\u003e\n\n### Additional Options\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAll command-line arguments\u003c/b\u003e\u003c/summary\u003e\n\n| Argument | Description | Default |\n|----------|-------------|---------|\n| `--config` | Config JSON path | *required* |\n| `--output_dir` | Output directory | *required* |\n| `--data_dir` | Dataset JSON string | `./data/` |\n| `--load_dir` | Checkpoint load directory | `output_dir` |\n| `--ckpt` | Resume from step | `latest` |\n| `--auto_retry` | Retries on failure | `3` |\n| `--tryrun` | Dry run | `false` |\n| `--profile` | Profiling | `false` |\n| `--num_nodes` | Number of nodes | `1` |\n| `--node_rank` | Current node rank | `0` |\n| `--num_gpus` | GPUs per node | all |\n| `--master_addr` | Master address | `localhost` |\n| `--master_port` | Master port | `12666` |\n| `--use_wandb` | Enable W\u0026B logging | `false` |\n| `--wandb_project` | W\u0026B project | `trellis2-training` |\n| `--wandb_name` | W\u0026B run name | basename of `output_dir` |\n| `--wandb_id` | W\u0026B run ID (resume) | — |\n\n\u003c/details\u003e\n\n## 🌐 Community Projects\n\nWe thank the community for building extensions and deployment guides for Pixal3D!\n\n- [Pixal3D-ComfyUI](https://github.com/Saganaki22/Pixal3D-ComfyUI) — ComfyUI integration with deployment guides for Windows, WSL, and more.\n\n## 🤗 Acknowledgements\n\nThis project is heavily built upon [Trellis.2](https://github.com/microsoft/TRELLIS.2) and [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.\n\nWe also thank the following repos for their great contributions:\n\n- [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2)\n- [Trellis](https://github.com/microsoft/TRELLIS)\n- [Trellis.2](https://github.com/microsoft/TRELLIS.2)\n\n## 📄 Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@article{li2026pixal3d,\n    title={Pixal3D: Pixel-Aligned 3D Generation from Images},\n    author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},\n    journal={arXiv preprint arXiv:2605.10922},\n    year={2026}\n}\n```\n\n## 📜 License\n\nThis project is released under the [MIT License](LICENSE). The third-party components included in this project remain licensed under their respective original terms; see [NOTICE](NOTICE) for the full list of dependencies and their licenses.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencentARC%2FPixal3D","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTencentARC%2FPixal3D","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencentARC%2FPixal3D/lists"}