{"id":13668630,"url":"https://github.com/scenediffuser/Scene-Diffuser","last_synced_at":"2025-04-27T01:31:37.637Z","repository":{"id":91930956,"uuid":"596406333","full_name":"scenediffuser/Scene-Diffuser","owner":"scenediffuser","description":"Official implementation of CVPR23 paper \"Diffusion-based Generation, Optimization, and Planning in 3D Scenes\"","archived":false,"fork":false,"pushed_at":"2023-06-27T05:26:51.000Z","size":1857,"stargazers_count":351,"open_issues_count":0,"forks_count":24,"subscribers_count":13,"default_branch":"main","last_synced_at":"2024-11-11T05:38:10.525Z","etag":null,"topics":["3d-scene-understanding","diffusion","generative-model"],"latest_commit_sha":null,"homepage":"https://scenediffuser.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scenediffuser.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-02-02T05:27:24.000Z","updated_at":"2024-11-10T06:47:08.000Z","dependencies_parsed_at":"2024-01-14T16:14:30.344Z","dependency_job_id":"af6182c7-6ad5-423b-8a67-bdccd08deeab","html_url":"https://github.com/scenediffuser/Scene-Diffuser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scenediffuser%2FScene-Diffuser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scenediffuser%2FScene-Diffuser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scenediffuser%2FScene-Diffuser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scenediffuser%2FScene-Diffuser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scenediffuser","download_url":"https://codeload.github.com/scenediffuser/Scene-Diffuser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251076992,"owners_count":21532606,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-scene-understanding","diffusion","generative-model"],"created_at":"2024-08-02T08:00:44.243Z","updated_at":"2025-04-27T01:31:32.615Z","avatar_url":"https://github.com/scenediffuser.png","language":"Python","funding_links":[],"categories":["Papers"],"sub_categories":["Arxiv"],"readme":"# Diffusion-based Generation, Optimization, and Planning in 3D Scenes\n\n\u003cp align=\"left\"\u003e\n    \u003ca href='https://scenediffuser.github.io/paper.pdf'\u003e\n      \u003cimg src='https://img.shields.io/badge/Paper-PDF-red?style=plastic\u0026logo=adobeacrobatreader\u0026logoColor=red' alt='Paper PDF'\u003e\n    \u003c/a\u003e\n    \u003ca href='https://arxiv.org/abs/2301.06015'\u003e\n      \u003cimg src='https://img.shields.io/badge/Paper-arXiv-green?style=plastic\u0026logo=arXiv\u0026logoColor=green' alt='Paper arXiv'\u003e\n    \u003c/a\u003e\n    \u003ca href='https://scenediffuser.github.io/'\u003e\n      \u003cimg src='https://img.shields.io/badge/Project-Page-blue?style=plastic\u0026logo=Google%20chrome\u0026logoColor=blue' alt='Project Page'\u003e\n    \u003c/a\u003e\n    \u003ca href='https://huggingface.co/spaces/SceneDiffuser/SceneDiffuserDemo'\u003e\n      \u003cimg src='https://img.shields.io/badge/Demo-HuggingFace-yellow?style=plastic\u0026logo=AirPlay%20Video\u0026logoColor=yellow' alt='HuggingFace'\u003e\n    \u003c/a\u003e\n    \u003ca href='https://drive.google.com/drive/folders/1CKJER3CnVh0o8cwlN8a2c0kQ6HTEqvqj?usp=sharing'\u003e\n      \u003cimg src='https://img.shields.io/badge/Model-Checkpoints-orange?style=plastic\u0026logo=Google%20Drive\u0026logoColor=orange' alt='Checkpoints'\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n[Siyuan Huang*](https://siyuanhuang.com/),\n[Zan Wang*](https://silvester.wang),\n[Puhao Li](https://xiaoyao-li.github.io/),\n[Baoxiong Jia](https://buzz-beater.github.io/),\n[Tengyu Liu](http://tengyu.ai/),\n[Yixin Zhu](https://yzhu.io/),\n[Wei Liang](https://liangwei-bit.github.io/web/),\n[Song-Chun Zhu](http://www.stat.ucla.edu/~sczhu/)\n\nThis repository is the official implementation of paper \"Diffusion-based Generation, Optimization, and Planning in 3D Scenes\".\n\nWe introduce SceneDiffuser, a conditional generative model for 3D scene understanding. SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning. In contrast to prior work, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.\n\n[Paper](https://scenediffuser.github.io/paper.pdf) |\n[arXiv](https://arxiv.org/abs/2301.06015) |\n[Project](https://scenediffuser.github.io/) |\n[HuggingFace Demo](https://huggingface.co/spaces/SceneDiffuser/SceneDiffuserDemo) |\n[Checkpoints](https://drive.google.com/drive/folders/1CKJER3CnVh0o8cwlN8a2c0kQ6HTEqvqj?usp=sharing)\n\n\u003cdiv align=center\u003e\n\u003cimg src='./figures/teaser.png' width=60%\u003e\n\u003c/div\u003e\n\n## Abstract\n\nWe introduce SceneDiffuser, a conditional generative model for 3D scene understanding. SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning. In contrast to prior works, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented. With an iterative sampling strategy, SceneDiffuser jointly formulates the scene-aware generation, physics-based optimization, and goal-oriented planning via a diffusion-based denoising process in a fully differentiable fashion. Such a design alleviates the discrepancies among different modules and the posterior collapse of previous scene-conditioned generative models. We evaluate SceneDiffuser with various 3D scene understanding tasks, including human pose and motion generation, dexterous grasp generation, path planning for 3D navigation, and motion planning for robot arms. The results show significant improvements compared with previous models, demonstrating the tremendous potential of SceneDiffuser for the broad community of 3D scene understanding.\n\n## News\n\n- [ 2023.04 ] We release the code for grasp generation and arm motion planning!\n\n## Setup\n\n1. Create a new `conda` environemnt and activate it.\n\n    ```bash\n    conda create -n 3d python=3.8\n    conda activate 3d\n    ```\n\n2. Install dependent libraries with `pip`.\n\n    ```bash\n    pip install -r pre-requirements.txt\n    pip install -r requirements.txt\n    ```\n\n    - We use `pytorch1.11` and `cuda11.3`, modify `pre-requirements.txt` to install [other versions](https://pytorch.org/get-started/previous-versions/) of `pytorch`.\n\n3. Install [Isaac Gym](https://developer.nvidia.com/isaac-gym) and install [pointnet2](https://github.com/daveredrum/Pointnet2.ScanNet) by executing the following command (optional for grasp generation and arm motion planning).\n\n    ```bash\n    pip install git+https://github.com/daveredrum/Pointnet2.ScanNet.git#subdirectory=pointnet2\n    ```\n\n## Data \u0026 Checkpoints\n\n### 1. Data\n\nYou can use our [pre-processed data](https://drive.google.com/drive/folders/1CKJER3CnVh0o8cwlN8a2c0kQ6HTEqvqj?usp=sharing) or process the data by yourself following the [instructions](./preprocessing/README.md).\n\nBut, you also need to download some official released data assets which are not processed, see [instructions](./preprocessing/README.md). Please remember to use your own data path by modifying the path configuration in:\n\n- `scene_model.pretrained_weights` in `model/*.yaml` for the path of pre-trained scene encoder (if you use a pre-trained scene encoder)\n\n- `dataset.*_dir`/`dataset.*_path` configurations in `task/*.yaml` for the path of data assets\n\n### 2. Checkpoints\n\nDownload our [pre-trained model](https://drive.google.com/drive/folders/1CKJER3CnVh0o8cwlN8a2c0kQ6HTEqvqj?usp=sharing) and unzip them into a folder, e.g., `./outputs/`.\n\ntask|checkpoints|desc\n-|-|-\nPretrained Point Transformer|2022-04-13_18-29-56_POINTTRANS_C_32768|\nPose Generation|2022-11-09_11-22-52_PoseGen_ddm4_lr1e-4_ep100|\nMotion Generation|2022-11-09_12-54-50_MotionGen_ddm_T200_lr1e-4_ep300|w/o start position\nMotion Generation|2022-11-09_14-28-12_MotionGen_ddm_T200_lr1e-4_ep300_obser|w/ start position\nPath Planning|2022-11-25_20-57-28_Path_ddm4_LR1e-4_E100_REL|\nGrasp Generation|2022-11-15_18-07-50_GPUR_l1_pn2_T100|\nArm Motion Planning|2022-11-11_14-28-30_FK2Plan_ptr_T30_4|denoising step is 30\n\n\n## Task-1: Human Pose Generation in 3D Scenes\n\n### Train\n\n- Train with single gpu\n\n    ```bash\n    bash scripts/pose_gen/train.sh ${EXP_NAME}\n    ```\n\n- Train with 4 GPUs (modify `scripts/pose_gen/train_ddm.sh` to specify the visible GPUs)\n\n    ```bash\n    bash scripts/pose_gen/train_ddm.sh ${EXP_NAME}\n    ```\n\n### Test (Quantitative Evaluation)\n\n```bash\nbash scripts/pose_gen/test.sh ${CKPT} [OPT]\n# e.g., bash scripts/pose_gen/test.sh ./outputs/2022-11-09_11-22-52_PoseGen_ddm4_lr1e-4_ep100/ OPT\n```\n\n- `[OPT]` is optional for optimization-guided sampling.\n\n### Sample (Qualitative Visualization)\n\n```bash\nbash scripts/pose_gen/sample.sh ${CKPT} [OPT]\n# e.g., bash scripts/pose_gen/sample.sh ./outputs/2022-11-09_11-22-52_PoseGen_ddm4_lr1e-4_ep100/ OPT\n```\n\n- `[OPT]` is optional for optimization-guided sampling.\n\n## Task-2: Human Motion Generation in 3D Scenes\n\n**The default configuration is motion generation without observation. If you want to explore the setting of motion generation with start observation, please change the `task.has_observation` to `true` in all the scripts in folder `./scripts/motion_gen/`.**\n\n### Train\n\n- Train with single gpu\n\n    ```bash\n    bash scripts/motion_gen/train.sh ${EXP_NAME}\n    ```\n\n- Train with 4 GPUs (modify `scripts/motion_gen/train_ddm.sh` to specify the visible GPUs)\n\n    ```bash\n    bash scripts/motion_gen/train_ddm.sh ${EXP_NAME}\n    ```\n\n### Test (Quantitative Evaluation)\n\n```bash\nbash scripts/motion_gen/test.sh ${CKPT} [OPT]\n# e.g., bash scripts/motion_gen/test.sh ./outputs/2022-11-09_12-54-50_MotionGen_ddm_T200_lr1e-4_ep300/ OPT\n```\n\n- `[OPT]` is optional for optimization-guided sampling.\n\n### Sample (Qualitative Visualization)\n\n```bash\nbash scripts/motion_gen/sample.sh ${CKPT} [OPT]\n# e.g., bash scripts/motion_gen/sample.sh ./outputs/2022-11-09_12-54-50_MotionGen_ddm_T200_lr1e-4_ep300/ OPT\n```\n\n- `[OPT]` is optional for optimization-guided sampling.\n\n## Task-3: Dexterous Grasp Generation for 3D Objects\n\nTo run this code, you first need to change the git branch to `obj` by executing\n\n```bash\ngit checkout obj\n```\n\nMake sure you have installed [Isaac Gym](https://developer.nvidia.com/isaac-gym) and [pointnet2](https://github.com/daveredrum/Pointnet2.ScanNet). See [Setup](#setup) section.\n\n### Train\n\n- Train with single gpu (one gpu is enough)\n\n    ```bash\n    bash scripts/grasp_gen_ur/train.sh ${EXP_NAME}\n    ```\n\n### Sample (Qualitative Visualization)\n\n```bash\nbash scripts/grasp_gen_ur/sample.sh ${CKPT} [OPT]\n# e.g., bash scripts/grasp_gen_ur/sample.sh ./outputs/2022-11-15_18-07-50_GPUR_l1_pn2_T100/ OPT\n```\n\n- `[OPT]` is optional for optimization-guided sampling.\n\n### Test (Quantitative Evaluation)\n\nYou first need to run `scripts/grasp_gen_ur/sample.sh` to sample some results. Then we will compute quantitative metrics with these sampled results.\n\n```bash\nbash scripts/grasp_gen_ur/test.sh ${EVAL_DIR} ${DATASET_DIR}\n# e.g., bash scripts/grasp_gen_ur/test.sh outputs/2022-11-15_18-07-50_GPUR_l1_pn2_T100/eval/final/2023-04-20_13-06-44 YOUR_PATH/data/MultiDex_UR\n```\n\n## Task-4: Path Planning in 3D Scenes\n\n### Train\n\n- Train with single gpu\n\n    ```bash\n    bash scripts/path_planning/train.sh ${EXP_NAME}\n    ```\n\n- Train with 4 GPUs (modify `scripts/path_planning/train_ddm.sh` to specify the visible GPUs)\n\n    ```bash\n    bash scripts/path_planning/train_ddm.sh ${EXP_NAME}\n    ```\n\n### Test (Quantitative Evaluation)\n\n```bash\nbash scripts/path_planning/plan.sh ${CKPT}\n# e.g., bash scripts/path_planning/plan.sh ./outputs/2022-11-25_20-57-28_Path_ddm4_LR1e-4_E100_REL/\n```\n\n### Sample (Qualitative Visualization)\n\n```bash\nbash scripts/path_planning/sample.sh ${CKPT} [OPT] [PLA]\n# e.g., bash scripts/path_planning/sample.sh ./outputs/2022-11-25_20-57-28_Path_ddm4_LR1e-4_E100_REL/ OPT PLA\n```\n\n- The program will generate trajectories with given start position and scene; rendering the results into images. (The results not the planning results, just use diffuser to generate diverse trajectories.)\n- `[OPT]` is optional for optimization-guided sampling.\n- `[PLA]` is optional for planner-guided sampling.\n\n## Task-5: Motion Planning for Robot Arms\n\nTo run this code, you first need to change the git branch to `obj` by executing\n\n```bash\ngit checkout obj\n```\n\nMake sure you have installed [Isaac Gym](https://developer.nvidia.com/isaac-gym) and [pointnet2](https://github.com/daveredrum/Pointnet2.ScanNet). See [Setup](#setup) section.\n\n### Train\n\n- Train with single gpu\n\n    ```bash\n    bash scripts/franka_planning/train.sh ${EXP_NAME}\n    ```\n\n- Train with 4 GPUs (modify `scripts/path_planning/train_ddm.sh` to specify the visible GPUs)\n\n    ```bash\n    bash scripts/franka_planning/train_ddm.sh ${EXP_NAME}\n    ```\n\n### Test (Quantitative Evaluation)\n\n```bash\nbash scripts/franka_planning/plan.sh ${CKPT}\n# e.g., bash scripts/franka_planning/plan.sh outputs/2022-11-11_14-28-30_FK2Plan_ptr_T30_4/\n```\n\n## Citation\n\nIf you find our project useful, please consider citing us:\n\n```tex\n@article{huang2023diffusion,\n  title={Diffusion-based Generation, Optimization, and Planning in 3D Scenes},\n  author={Huang, Siyuan and Wang, Zan and Li, Puhao and Jia, Baoxiong and Liu, Tengyu and Zhu, Yixin and Liang, Wei and Zhu, Song-Chun},\n  journal={arXiv preprint arXiv:2301.06015},\n  year={2023}\n}\n```\n\n## Acknowledgments\n\nSome codes are borrowed from [latent-diffusion](https://github.com/CompVis/latent-diffusion), [PSI-release](https://github.com/yz-cnsdqz/PSI-release), [Pointnet2.ScanNet](https://github.com/daveredrum/Pointnet2.ScanNet), [point-transformer](https://github.com/POSTECH-CVLab/point-transformer), and [diffuser](https://github.com/jannerm/diffuser).\n\n### License\n\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) for more details. The following datasets are used in this project and are subject to their respective licenses:\n\n- PROX is under the [Software Copyright License for non-commercial scientific research purposes](https://prox.is.tue.mpg.de/license.html).\n- LEMO is under the [MIT License](https://github.com/sanweiliti/LEMO/blob/main/LICENSE).\n- ScanNet V2 is under the [ScanNet Terms of Use](http://kaldir.vc.in.tum.de/scannet/ScanNet_TOS.pdf).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscenediffuser%2FScene-Diffuser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscenediffuser%2FScene-Diffuser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscenediffuser%2FScene-Diffuser/lists"}