{"id":19551878,"url":"https://github.com/ysymyth/3d-sdn","last_synced_at":"2025-04-13T04:09:04.481Z","repository":{"id":72158722,"uuid":"154797992","full_name":"ysymyth/3D-SDN","owner":"ysymyth","description":"[NeurIPS 2018] 3D-Aware Scene Manipulation via Inverse Graphics","archived":false,"fork":false,"pushed_at":"2018-12-20T13:02:02.000Z","size":14669,"stargazers_count":265,"open_issues_count":0,"forks_count":39,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-13T04:09:00.071Z","etag":null,"topics":["3d-sdn","3d-vision","deep-learning","disentangled-representations","gans","generative-adversarial-networks","pytorch"],"latest_commit_sha":null,"homepage":"http://3dsdn.csail.mit.edu/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ysymyth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-26T07:53:54.000Z","updated_at":"2024-11-03T03:55:34.000Z","dependencies_parsed_at":"2023-03-11T11:56:50.539Z","dependency_job_id":null,"html_url":"https://github.com/ysymyth/3D-SDN","commit_stats":{"total_commits":5,"total_committers":4,"mean_commits":1.25,"dds":0.6,"last_synced_commit":"d7a4519bfd57d4c5d99dbdb6a53a82ba5b66ec9e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysymyth%2F3D-SDN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysymyth%2F3D-SDN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysymyth%2F3D-SDN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysymyth%2F3D-SDN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ysymyth","download_url":"https://codeload.github.com/ysymyth/3D-SDN/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248661704,"owners_count":21141450,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-sdn","3d-vision","deep-learning","disentangled-representations","gans","generative-adversarial-networks","pytorch"],"created_at":"2024-11-11T04:15:43.424Z","updated_at":"2025-04-13T04:09:04.459Z","avatar_url":"https://github.com/ysymyth.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 3D Scene De-rendering Networks (3D-SDN)\n\n[Project][proj] | [Paper][paper] | [Poster][poster]\n\nPyTorch implementation for 3D-aware scene de-rendering and editing. Our method integrates disentangled representations for semantics, geometry, and appearance into a deep generative model. The disentanglement of semantics, geometry, and appearance supports 3D-aware scene manipulation such as\n(a) translation, (b) rotation, (c) color and texture editing, and\n(d) object removal and occlusion recovery.\n\n\u003cimg src=\"assets/vkitti_edit.jpg\" width=\"800px\"/\u003e\n\n\n**3D-Aware Scene Manipulation via Inverse Graphics**  \n[Shunyu Yao](http://people.csail.mit.edu/yaos)\u0026ast;, [Tzu-Ming Harry Hsu](http://stmharry.github.io/)\u0026ast;, [Jun-Yan Zhu](http://people.csail.mit.edu/junyanz/), [Jiajun Wu](https://jiajunwu.com/), [Antonio Torralba](http://web.mit.edu/torralba/www/), [William T. Freeman](https://billf.mit.edu/), [Joshua B. Tenenbaum](https://web.mit.edu/cocosci/josh.html)    \nIn Neural Information Processing Systems (*NeurIPS*) 2018.  \nMIT CSAIL, Tsinghua University, and Google Research.\n\n## Framework\nOur de-renderer consists of a semantic-, a textural- and a geometric branch. The textural renderer and geometric renderer then learn to reconstruct the original image from the representations obtained by the de-renderer modules.\n\n\u003cimg src=\"assets/model.jpg\" width=\"800px\"/\u003e\n\n## Example Results on Cityscapes\nExample user editing results on Cityscapes.\n(a) We move two cars closer to the camera.\u003cbr\u003e\n(b) We rotate the car with different angles.\u003cbr\u003e\n(c) We recover a tiny and occluded car and move it closer. Our model can synthesize the occluded region. \u003cbr\u003e\n(d) We move a small car closer and then change its locations.\n\n\u003cimg src=\"assets/cityscapes_edit.jpg\" width=\"800px\"/\u003e\n\n## Prerequisites\n\n- Linux\n- Python 3.6+\n- PyTorch 0.4\n- NVIDIA GPU (GPU memory \u003e **8GB**) + CUDA 9.0\n\n\n## Getting Started\n\n### Installation\n\n1. Clone this repository\n    ```bash\n    git clone https://github.com/ysymyth/3D-SDN.git \u0026\u0026 cd 3D-SDN\n    ```\n\n2. Download the pre-trained weights\n    ```bash\n    ./models/download_models.sh\n    ```\n\n3. Set up the conda environment\n    ```bash\n    conda env create -f environment.yml \u0026\u0026 conda activate 3dsdn\n    ```\n\n4. Compile dependencies in `geometric/maskrcnn`\n    ```bash\n    ./scripts/build.sh\n    ```\n\n5. Set up environment variables\n    ```bash\n    source ./scripts/env.sh\n    ```\n\n### Image Editing\n\n\u003cimg src=\"assets/0006_30-deg-right_00043.png\" width=\"800px\"/\u003e\n\n\nWe are using `./assets/0006_30-deg-right_00043.png` as the example image for editing.\n\n#### Semantic Branch\n```bash\npython semantic/vkitti_test.py \\\n    --ckpt ./models \\\n    --id vkitti-semantic \\\n    --root_dataset ./assets \\\n    --test_img 0006_30-deg-right_00043.png \\\n    --result ./assets/example/semantic\n```\n\n#### Geometric Branch\n```bash\npython geometric/scripts/main.py \\\n    --do test \\\n    --dataset vkitti \\\n    --mode extend \\\n    --source maskrcnn \\\n    --ckpt_dir ./models/vkitti-geometric-derender3d \\\n    --maskrcnn_path ./models/vkitti-geometric-maskrcnn/mask_rcnn_vkitti_0100.pth \\\n    --edit_json ./assets/vkitti_edit_example.json \\\n    --input_file ./assets/0006_30-deg-right_00043.png \\\n    --output_dir ./assets/example/geometric\n```\n\n\n#### Textural Branch\n```bash\npython textural/edit_vkitti.py \\\n    --name vkitti-textural \\\n    --checkpoints_dir ./models \\\n    --edit_dir ./assets/example/geometric/vkitti/maskrcnn/0006/30-deg-right \\\n    --edit_source ./assets/0006_30-deg-right_00043.png \\\n    --edit_num 5 \\\n    --segm_precomputed_path ./assets/example/semantic/0006_30-deg-right_00043.png \\\n    --results_dir ./assets/example \\\n    --feat_pose True \\\n    --feat_normal True\n```\n\nThen the edit results can be viewed at `./assets/example/vkitti-textural_edit_edit_60/index.html`.\n\nSimply do `cd ./assets/example/vkitti-textural_edit_edit_60 \u0026\u0026 python -m http.server 1234` and use your browser to connect to the server. You should see the results with intermediate 2.5D representations rendered as follows.\n\n\u003cimg src=\"assets/results.jpg\" width=\"800px\"/\u003e\n\n\n## Training/Testing\nPlease set up the datasets first and refer to `semantic/README.md`, `geometric/README.md`, and `textural/README.md` for training and testing details.\n\n- Download the [Virtual KITTI dataset](http://www.europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds).\n```bash\n./datasets/download_vkitti.sh\n```\nPlease cite their paper if you use their data.\n\n## Experiments\n\n### Virtual KITTI Benchmark\n\nHere is a fragment of our Virtual KITTI benchmark edit specification, in the form of a `json` file. For each edit pair, a source image would be `world/topic/source.png` and a target image would be `world/topic/target.png`. A list of `operations` is specified to transform the source image to the target image. Aligned with human cognition, each operation is either moving (`modify`) an object `from` a position `to` another, or `delete` it from our view. Additionally, we may enlarge (`zoom`) the object or rotate the object along the y-axis (`ry`). Note the y-axis points downwards, consistent with the axis specification of the Virtual KITTI dataset. The `u`'s and `v`'s denote the objects' 3D center projected onto the image plane. We indicate a target region of interest `roi` on top of the target `(u, v)` position. There are 92 such pairs in the benchmark.\n```json\n{\n    \"world\": \"0006\",\n    \"topic\": \"fog\",\n    \"source\": \"00055\",\n    \"target\": \"00050\",\n    \"operations\": [\n        {\n            \"type\": \"modify\",\n            \"from\": {\"u\": \"750.9\", \"v\": \"213.9\"},\n            \"to\": {\"u\": \"804.4\", \"v\": \"227.1\", \"roi\": [194, 756, 269, 865]},\n            \"zoom\": \"1.338\",\n            \"ry\": \"0.007\"\n        }\n    ]\n}\n```\n\n#### Semantic Branch\n```bash\npython semantic/vkitti_test.py \\\n    --ckpt ./models \\\n    --id vkitti-semantic \\\n    --root_dataset ./datasets/vkitti \\\n    --test_img benchmark \\\n    --benchmark_json ./assets/vkitti_edit_benchmark.json \\\n    --result ./assets/vkitti-benchmark/semantic\n```\n#### Geometric Branch\n```bash\npython geometric/scripts/main.py \\\n    --do test \\\n    --dataset vkitti \\\n    --mode extend \\\n    --source maskrcnn \\\n    --ckpt_dir ./models/vkitti-geometric-derender3d \\\n    --maskrcnn_path ./models/vkitti-geometric-maskrcnn/mask_rcnn_vkitti_0100.pth \\\n    --output_dir ./assets/vkitti-benchmark/geometric \\\n    --edit_json ./assets/vkitti_edit_benchmark.json\n```\n#### Textural Branch\n```bash\npython textural/edit_benchmark.py \\\n    --name vkitti-textural \\\n    --checkpoints_dir ./models \\\n    --dataroot ./datasets/vkitti \\\n    --edit_dir ./assets/vkitti-benchmark/geometric/vkitti/maskrcnn \\\n    --edit_list ./assets/vkitti_edit_benchmark.json \\\n    --experiment_name benchmark_3D \\\n    --segm_precomputed_path ./assets/vkitti-benchmark/semantic \\\n    --results_dir ./assets/vkitti-benchmark/ \\\n    --feat_pose True \\\n    --feat_normal True\n```\nThen the benchmark edit results can be viewed at `./assets/vkitti-benchmark/vkitti-textural_benchmark_3D_edit_60/index.html`.\n\n\n## Reference\nIf you find this useful for your research, please cite the following paper.\n\n```\n@inproceedings{3dsdn2018,\n  title={3D-Aware Scene Manipulation via Inverse Graphics},\n  author={Yao, Shunyu and Hsu, Tzu Ming Harry and Zhu, Jun-Yan and Wu, Jiajun and Torralba, Antonio and Freeman, William T. and Tenenbaum, Joshua B.},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2018}\n}\n```\n\nFor any question, please contact [Shunyu Yao](yao-sy15@mails.tsinghua.edu.cn) and [Tzu-Ming Harry Hsu](stmharry@mit.edu).\n\n## Acknowledgements\nThis work is supported by NSF #1231216, NSF #1524817, ONR MURI N00014-16-1-2007, Toyota Research Institute, and Facebook.\n\nThe semantic branch borrows from [Semantic Segmentation on MIT ADE20K dataset in PyTorch](https://github.com/CSAILVision/semantic-segmentation-pytorch), the geometric branch borrows from [pytorch-mask-rcnn](https://github.com/multimodallearning/pytorch-mask-rcnn) and [neural_renderer](https://github.com/hiroharu-kato/neural_renderer), and the textural branch borrows from [pix2pixHD](https://github.com/NVIDIA/pix2pixHD).\n\n[proj]: http://3dsdn.csail.mit.edu/\n[paper]: https://arxiv.org/pdf/1808.09351.pdf\n[poster]: http://3dsdn.csail.mit.edu/3dsdn-poster.pdf\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysymyth%2F3d-sdn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fysymyth%2F3d-sdn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysymyth%2F3d-sdn/lists"}