{"id":18842975,"url":"https://github.com/nv-tlabs/xcube","last_synced_at":"2025-05-16T01:06:23.475Z","repository":{"id":245008334,"uuid":"805900550","full_name":"nv-tlabs/XCube","owner":"nv-tlabs","description":"[CVPR 2024 Highlight] XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies","archived":false,"fork":false,"pushed_at":"2025-03-03T20:00:49.000Z","size":2367,"stargazers_count":448,"open_issues_count":15,"forks_count":29,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-08T12:07:51.427Z","etag":null,"topics":["3d-generation","computer-vision","generative-ai","graphics","voxel"],"latest_commit_sha":null,"homepage":"https://research.nvidia.com/labs/toronto-ai/xcube/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nv-tlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-25T19:40:43.000Z","updated_at":"2025-04-05T09:54:31.000Z","dependencies_parsed_at":"2024-12-08T11:06:08.648Z","dependency_job_id":"c9746e6a-dd4a-4fb0-95ab-bab92244b2a4","html_url":"https://github.com/nv-tlabs/XCube","commit_stats":{"total_commits":24,"total_committers":5,"mean_commits":4.8,"dds":"0.29166666666666663","last_synced_commit":"254d95b2bca540c20aced686027055a9bff392ed"},"previous_names":["nv-tlabs/xcube"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nv-tlabs%2FXCube","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nv-tlabs%2FXCube/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nv-tlabs%2FXCube/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nv-tlabs%2FXCube/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nv-tlabs","download_url":"https://codeload.github.com/nv-tlabs/XCube/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254448579,"owners_count":22072764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-generation","computer-vision","generative-ai","graphics","voxel"],"created_at":"2024-11-08T02:56:13.540Z","updated_at":"2025-05-16T01:06:18.464Z","avatar_url":"https://github.com/nv-tlabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies\n![XCube](assets/teaser.png)\n\n**XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies**\u003cbr\u003e\n[Xuanchi Ren](https://xuanchiren.com/),\n[Jiahui Huang](https://huangjh-pub.github.io/),\n[Xiaohui Zeng](https://www.cs.utoronto.ca/~xiaohui/),\n[Ken Museth](https://ken.museth.org/Welcome.html),\n[Sanja Fidler](https://www.cs.toronto.edu/~fidler/),\n[Francis Williams](https://www.fwilliams.info/) \u003cbr\u003e\n**[Paper](https://arxiv.org/pdf/2312.03806), [Project Page](https://research.nvidia.com/labs/toronto-ai/xcube/)**\n\nAbstract: *We present XCube (abbreviated as \u003cspan\u003eX\u003csup\u003e3\u003c/sup\u003e\u003c/span\u003e), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes. Our model can generate millions of voxels with a finest effective resolution of up to \u003cspan\u003e1024\u003csup\u003e3\u003c/sup\u003e\u003c/span\u003e in a feed-forward fashion without time-consuming test-time optimization. To achieve this, we employ a hierarchical voxel latent diffusion model which generates progressively higher resolution grids in a coarse-to-fine manner using a custom framework built on the highly efficient VDB data structure. Apart from generating high-resolution objects, we demonstrate the effectiveness of XCube on large outdoor scenes at scales of 100m x 100m with a voxel size as small as 10cm. We observe clear qualitative and quantitative improvements over past approaches. In addition to unconditional generation, we show that our model can be used to solve a variety of tasks such as user-guided editing, scene completion from a single scan, and text-to-3D.*\n\nFor business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/).\nFor any other questions related to the model, please contact Xuanchi or Jiahui.\n\n## News\n\n- 2024-12-11: Also check out our latest research [InfiniCube](https://research.nvidia.com/labs/toronto-ai/infinicube/), which extends XCube to unbounded 3D generation!\n- 2024-10-27: Check out our NeurIPS 2024 work [SCube](https://research.nvidia.com/labs/toronto-ai/scube/) which extends XCube on large-scale scene reconstruction!\n- 2024-06-18: Code and model released!\n\n## Environment setup\nNote that we currently only support Linux. We welcome support for other platforms.\n\n**(Optional) Install libMamba for a huge quality of life improvement when using Conda**\n```\nconda update -n base conda\nconda install -n base conda-libmamba-solver\nconda config --set solver libmamba\n```\n\n### Conda Environment\n```\n# Clone the repository\ngit clone git@github.com:nv-tlabs/XCube.git\ncd XCube\n\n# Create conda environment\nconda env create -f environment.yml\nconda activate xcube\n\n# Install fVDB (3D learning framework; require GPU later than Ampere)\ngit clone https://github.com/AcademySoftwareFoundation/openvdb.git\ncd openvdb\ngit fetch origin pull/1808/head:feature/fvdb\ngit checkout feature/fvdb\nrm fvdb/setup.py \u0026\u0026 cp ../assets/setup.py fvdb/\ncd fvdb \u0026\u0026 pip install .\ncd ../..\n\n# Mesh extraction\ncd ext/nksr-cuda\npython setup.py develop\ncd ../..\n```\n### Docker Image\nFor docker users, we suggest using a base image from [here](https://github.com/fwilliams/openvdb/tree/fw/fvdb/fvdb#docker-image), and applying the above conda setup over it.\n\n## Quickstart\nDownload pretrained checkpoints from [Google Drive](https://drive.google.com/drive/folders/1PEh0ofpSFcgH56SZtu6iQPC8xAxzhmke?usp=drive_link) and put them under `checkpoints`.\nAlternatively, we provide a script that could automatically download everything for you (temporarily unavailable):\n```\npython inference/download_pretrain.py\n```\n\n**ShapeNet Inference:**\n```\n# Chair\npython inference/sample_shapenet.py none --category chair --total_len 20 --batch_len 4 --ema --use_ddim --ddim_step 100 --extract_mesh\n\n# Car\npython inference/sample_shapenet.py none --category car --total_len 20 --batch_len 4 --ema --use_ddim --ddim_step 100 --extract_mesh\n\n# Plane\npython inference/sample_shapenet.py none --category plane --total_len 20 --batch_len 4 --ema --use_ddim --ddim_step 100 --extract_mesh\n\n# Visualize\npython visualize_object.py -p results/{YOUR_PATH} -i {YOUR_ID}\n```\n\n**Waymo Inference:**\n```\n# Unconditional sampling\npython inference/sample_waymo.py none --total_len 20 --batch_len 4 --ema --use_ddim --ddim_step 100 --extract_mesh\n\n# Single-scan condition (coming soon)\n\n# Visualize\npython visualize_scene.py -p results/{YOUR_PATH} -i {YOUR_ID}\n```\n\n**Objaverse Inference:**\n```\n# Text to 3D\npython inference/sample_objaverse.py none --batch_len 4 --ema --use_ddim --ddim_step 100 --extract_mesh\n\n# Visualize\npython visualize_object.py -p results/{YOUR_PATH} -i {YOUR_ID}\n```\n\n\u003e The released code has some differences from the version described in the paper: \n\u003e 1) The refinement network is omitted for cleaner code, which may cause slight variations in the results, but these differences are not significant. \n\u003e 2) The mesh extraction process has been moved from the VAE to post-processing.\n\nWe have prepared detailed instructions about data preparation and useful tricks at [XCube MISC](MISC.md).\n\n## Training\n\nData download links:\n- ShapeNet: Data is available [here](https://drive.google.com/file/d/1PQmSomS1B7UR7wNuqp5RtgkdXo7stKzG/view?usp=sharing). Put the extracted folder as `../data/shapenet`. Or you chould change `_shapenet_path` in the [config](configs/shapenet/data.yaml).\n- Waymo: Coming soon\n\n### (Coarse) Stage 1 \n**Training autoencoder models:**\n```\n# ShapeNet chair\npython train.py ./configs/shapenet/chair/train_vae_16x16x16_dense.yaml --wname 16x16x16-kld-0.03_dim-16 --max_epochs 100 --cut_ratio 16 --gpus 8 --batch_size 32\n\n# ShapeNet car\npython train.py ./configs/shapenet/car/train_vae_16x16x16_dense.yaml --wname 16x16x16-kld-0.03_dim-16 --max_epochs 100 --cut_ratio 16 --gpus 8 --batch_size 32\n\n# ShapeNet plane\npython train.py ./configs/shapenet/plane/train_vae_16x16x16_dense.yaml --wname 16x16x16-kld-0.03_dim-16 --max_epochs 100 --cut_ratio 16 --gpus 8 --batch_size 32\n\n# Waymo uncond\npython train.py ./configs/waymo/train_vae_32x32x32_dense.yaml --wname 32x32x32-kld-0.03_dim-8 --max_epochs 50 --gpus 8 --batch_size 32 --eval_interval 1\n```\n**Training latent diffusion models:**\n```\n# ShapeNet chair\npython train.py ./configs/shapenet/chair/train_diffusion_16x16x16_dense.yaml --wname 16x16x16_kld-0.03 --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 4\n\n# ShapeNet car\npython train.py ./configs/shapenet/car/train_diffusion_16x16x16_dense.yaml --wname 16x16x16_kld-0.03 --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 4\n\n# ShapeNet plane\npython train.py ./configs/shapenet/plane/train_diffusion_16x16x16_dense.yaml --wname 16x16x16_kld-0.03 --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 4\n\n# Waymo uncond\npython train_auto.py ./configs/waymo/train_diffusion_32x32x32_dense.yaml --wname 32x32x32_kld-0.03 --eval_interval 1 --gpus 8 --batch_size 16 --accumulate_grad_batches 4 --save_topk 2\n```\n\n### (Fine) Stage 2 \n**Training autoencoder models:**\n```\n# ShapeNet chair\npython train.py ./configs/shapenet/chair/train_vae_128x128x128_sparse.yaml --wname 512_to_128-kld-1.0 --max_epochs 100 --gpus 8 --batch_size 8 --accumulate_grad_batches 2\n\n# ShapeNet car\npython train.py ./configs/shapenet/car/train_vae_128x128x128_sparse.yaml --wname 512_to_128-kld-1.0 --max_epochs 100 --gpus 8 --batch_size 8 --accumulate_grad_batches 2\n\n# ShapeNet plane\npython train.py ./configs/shapenet/plane/train_vae_128x128x128_sparse.yaml --wname 512_to_128-kld-1.0 --max_epochs 100 --gpus 8 --batch_size 8 --accumulate_grad_batches 2\n\n# Waymo uncond\npython train.py ./configs/waymo/train_vae_256x256x256_sparse.yaml --wname 1024_to_256-kld-0.3 --max_epochs 50 --gpus 8 --batch_size 8 --accumulate_grad_batches 2\n```\n\n**Training latent diffusion models:**\n```\n# ShapeNet chair\npython train.py ./configs/shapenet/plane/train_diffusion_128x128x128_sparse.yaml --wname 128x128x128_kld-1.0_normal_cond --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 8 --save_topk 2 --save_every 30\n\n# ShapeNet car\npython train.py ./configs/shapenet/car/train_diffusion_128x128x128_sparse.yaml --wname 128x128x128_kld-1.0_normal_cond --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 8 --save_topk 2 --save_every 30\n\n# ShapeNet plane\npython train.py ./configs/shapenet/car/train_diffusion_128x128x128_sparse.yaml --wname 128x128x128_kld-1.0_normal_cond --eval_interval 5 --gpus 8 --batch_size 8 --accumulate_grad_batches 8 --save_topk 2 --save_every 30\n\n# Waymo uncond\npython train.py ./configs/waymo/train_diffusion_256x256x256_sparse.yaml --wname 256x256x64_kld-0.3_semantic_cond --eval_interval 1 --gpus 8 --batch_size 8 --accumulate_grad_batches 4 --save_topk 1\n```\n\nIn addition, you can manually specify different training settings to obtain models that suit your needs. Common flags include:\n- `--wname`: Additional experiment name to specify for wandb logger.\n- `--batch_size`: num of batch **in total** for `autoencoder` and num of batch **per GPU** for `diffusion`.\n- `--logger_type`: we use `wandb` by default; `none` is also supported.\n\n## License\n\nCopyright \u0026copy; 2024, NVIDIA Corporation \u0026 affiliates. All rights reserved.\nThis work is made available under the [Nvidia Source Code License](LICENSE.txt).\n\n## Related Works\n\n- Ren et al. 2024. [SCube: Instant Large-Scale Scene Reconstruction using VoxSplats](https://research.nvidia.com/labs/toronto-ai/scube/).\n- Huang et al. 2023. [Neural Kernel Surface Reconstruction](https://research.nvidia.com/labs/toronto-ai/NKSR).\n- Williams et al. 2024. [𝑓VDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence](https://arxiv.org/abs/2407.01781).\n\n## Citation\n\n```bibtex\n@inproceedings{ren2024xcube,\n    title={XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies}, \n    author={Ren, Xuanchi and Huang, Jiahui and Zeng, Xiaohui and Museth, Ken and Fidler, Sanja and Williams, Francis},\n    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n    year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnv-tlabs%2Fxcube","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnv-tlabs%2Fxcube","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnv-tlabs%2Fxcube/lists"}