{"id":29219678,"url":"https://github.com/dvlab-research/dsgn","last_synced_at":"2025-07-03T02:06:37.825Z","repository":{"id":53549209,"uuid":"229276558","full_name":"dvlab-research/DSGN","owner":"dvlab-research","description":"DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)","archived":false,"fork":false,"pushed_at":"2020-08-15T11:30:00.000Z","size":3967,"stargazers_count":328,"open_issues_count":0,"forks_count":50,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-03-20T20:45:40.199Z","etag":null,"topics":["3d-detection","cvpr2020","depth-estimation","stereo-vision"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dvlab-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-20T14:09:23.000Z","updated_at":"2025-01-20T08:38:39.000Z","dependencies_parsed_at":"2022-09-11T10:12:02.624Z","dependency_job_id":null,"html_url":"https://github.com/dvlab-research/DSGN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dvlab-research/DSGN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FDSGN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FDSGN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FDSGN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FDSGN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dvlab-research","download_url":"https://codeload.github.com/dvlab-research/DSGN/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvlab-research%2FDSGN/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263245318,"owners_count":23436514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-detection","cvpr2020","depth-estimation","stereo-vision"],"created_at":"2025-07-03T02:06:36.464Z","updated_at":"2025-07-03T02:06:37.803Z","avatar_url":"https://github.com/dvlab-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DSGN\n## Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)\n\nThis is the official implementation of DSGN (CVPR 2020), a strong 3D object detector proposed to jointly **estimate scene depth** and **detect 3D objects** in 3D world with only input of a stereo image pair.\n\n\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"doc/sample_result.png\"\u003e\n\u003c/div\u003e\n\n**DSGN: Deep Stereo Geometry Network for 3D Object Detection**\u003cbr/\u003e\n[Yilun Chen](http://yilunchen.com/about/), [Shu Liu](http://shuliu.me/), [Xiaoyong Shen](http://xiaoyongshen.me/), [Jiaya Jia](http://jiaya.me/). \u003cbr/\u003e\n[[Paper]](https://arxiv.org/abs/2001.03398)\u0026nbsp;  [[Video]](https://www.youtube.com/watch?v=u6mQW89wBbo)\u0026nbsp; \n\nMost state-of-the-art 3D object detectors heavily rely on LiDAR sensors and there remains a large gap in terms of performance between image-based and LiDAR-based methods, caused by inappropriate representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation – 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with a few LiDAR-based methods on the KITTI 3D object detection leaderboard.\n\n### Overall Pipeline\n\n\u003cdiv align=\"center\"\u003e\n \u003cimg src=\"doc/pipeline.png\"\u003e\n \u003cp\u003e\u003cfont size=\"2\"\u003eDSGN consists of four components: (a) A 2D image feature extractor for capture of both pixel-level and high-level feature. (b) Constructing the plane-sweep volume and 3D geometric volume. (c) Depth Estimation on the plane-sweep volume. (d) 3D object detection on 3D geometric volume.\u003c/font\u003e\u003c/p\u003e\n\u003c/div\u003e\n\n### Reported Results on KITTI Leaderboard\n\n\u003cp align=\"center\"\u003e \u003cimg src=\"./doc/result.jpg\" width=\"80%\"\u003e\u003c/p\u003e\n\n### Requirements\nAll the codes are tested in the following environment:\n* Ubuntu 16.04\n* Python 3.7\n* PyTorch 1.1.0 or 1.2.0 or 1.3.0\n* Torchvision 0.2.2 or 0.4.1\n\nThe models reported in paper are trained with 4 *NVIDIA Tesla V100* (32G) GPUs with batch-size 4. The training GPU memory requirement is close to 29G and the testing GPU memory requirement is feasible for a normal *NVIDIA TITAN* (12G) GPU. One full image pair is fed into the network and used to construct the 3D volume. For your reference, PSMNet is trained with input patch size of 512x256. Please note your GPU memory. \n\n### Installation\n\n(1) Clone this repository.\n```\ngit clone https://github.com/chenyilun95/DSGN.git \u0026\u0026 cd DSGN\n```\n\n(2) Setup Python environment.\n```\nconda activate -n dsgn\npip install -r requirements.txt --user\n\n## conda deactivate dsgn\n```\n\n(3) Compile the rotated IoU library. \n```\ncd dsgn/utils/rotate_iou \u0026\u0026 bash compile.sh \u0026 cd ../../../\n```\n\n(4) Compile and install DSGN library.\n```\n# the following will install the lib with symbolic links, so that\n# you can modify the file if you want and won't need to re-build it.\npython3 setup.py build develop --user\n```\n\n### Data Preparation\n\n(1) Please download the KITTI dataset and create the model folders. KITTI dataset is avaible [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Download KITTI [point clouds](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_velodyne.zip), [left images](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip), [right images](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_3.zip), [calibrations matrices](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_calib.zip) and [object labels](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip). \n```\nln -s /path/to/KITTI_DATA_PATH ./data/kitti/\nln -s /path/to/OUTPUT_PATH ./outputs/\n```\n\n(2) Generate the depth map from the ground-truth LiDAR point cloud and save them in ./data/kitti/training/depth/.\n```\npython3 preprocessing/generate_disp.py --data_path ./data/kitti/training/ --split_file ./data/kitti/trainval.txt \npython3 preprocessing/generate_disp.py --data_path ./data/kitti/training/ --split_file ./data/kitti/trainval.txt --right_calib\n```\n\n(3) Pre-compute the bbox targets in pre-defined grid and save them in ./outputs/temp/.\n```\npython3 tools/generate_targets.py --cfg CONFIG_PATH\n```\n\nAfter training the models, the overall directory will look like below:\n```\n.                                           (root directory)\n|-- dsgn                                    (dsgn library file)\n|-- configs                                 (model configurations folder)\n|-- ...\n|-- data\n|   |-- kitti                               (dataset directory)\n|       |-- train.txt                       (KITTI train images list (3712 samples))\n|       |-- val.txt                         (KITTI val images list (3769 samples))\n|       |-- test.txt                        (KITTI test images list (7518 samples))\n|       |-- training\n|       |   |-- image_2\n|       |   |-- image_3\n|       |   |-- ...\n|       |-- testing\n|       |-- depth                           (generated depth map)\n|-- outputs\n    |-- MODEL_DSGN_v1                       (Model config and snapshots should be saved in the same model folder)\n        |-- finetune_53.tar                 (saved model)\n        |-- save_config.py                  (saved model configuration file)\n        |-- save_config.py.tmp              (automatic generated copy of previous configuration)\n        |-- training.log                    (full training log)\n        |-- result_kitti_finetune_53.txt    (kitti evaluated results for the saved model)\n        |-- kitti_output                    (kitti detection results folder)\n    |-- MODEL_DSGN_v2\n    |-- temp                                (temporary folder for saving the pre-computed bbox targets)\n        |-- ...                             (pre-computed bbox targets under some specific configurations)\n```\n\n### Multi-GPU Training\n\nThe training scripts support [multi-processing distributed training](https://github.com/pytorch/examples/tree/master/imagenet), which is much faster than the typical PyTorch DataParallel interface.\n```\npython3 tools/train_net.py --cfg ./configs/config_xxx.py --savemodel ./outputs/MODEL_NAME -btrain 4 -d 0-3 --multiprocessing-distributed\n```\nor\n```\nbash scripts/mptrain_xxx.sh\n```\nThe training models, configuration and logs will be saved in the model folder.\n\nTo load some pretrained model, you can run\n```\npython3 tools/train_net.py --cfg xxx/config.py --loadmodel ./outputs/MODEL_NAMEx --start_epoch xxx --savemodel ./outputs/MODEL_NAME -btrain 4 -d 0-3 --multiprocessing-distributed\n```\nIf you want to continue training from some epochs, just set the cfg, loadmodel and start_epoch to the respective model path.\n\nBesides, you can start a tensorboard session by\n```\ntensorboard --logdir=./outputs/MODEL_NAME/tensorboard --port=6666\n```\nand visualize your training process by accessing https://localhost:6666 on your browser.\n\n### Inference and Evaluation\n\nEvaluating the models by\n```\npython3 tools/test_net.py --loadmodel ./outputs/MODEL_NAME/finetune_xx.tar -btest 8 -d 0-3\n```\nKITTI Detection results and evaluation results will be saved in the model folder. \n\n### Performance and Model Zoo\n\nWe provide several pretrained models for our experiments, which are evaluated on KITTI val set.\n\n\u003ctable\u003e\n    \u003cthead\u003e\n        \u003ctr\u003e\n            \u003cth\u003eMethods\u003c/th\u003e\n            \u003cth\u003eEpochs\u003c/th\u003e\n            \u003c!-- \u003cth\u003eInference Time(s/im)\u003c/th\u003e --\u003e\n            \u003cth\u003eTrain Mem (GB/Img)\u003c/th\u003e\n            \u003cth\u003eTest Mem (GB/Img)\u003c/th\u003e\n            \u003cth\u003e3D AP\u003c/th\u003e\n            \u003cth\u003eBEV AP\u003c/th\u003e\n            \u003cth\u003e2D AP\u003c/th\u003e\n            \u003cth\u003eModels\u003c/th\u003e\n        \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eDSGN(Car)\u003c/td\u003e\n            \u003ctd\u003e53\u003c/td\u003e\n            \u003ctd\u003e~29\u003c/td\u003e\n            \u003ctd\u003e6.05\u003c/td\u003e\n            \u003ctd\u003e53.95\u003c/td\u003e\n            \u003ctd\u003e64.44\u003c/td\u003e\n            \u003ctd\u003e84.62\u003c/td\u003e\n            \u003ctd\u003e\u003ca href=\"https://drive.google.com/open?id=1pbvyRGOknlovmIK96MwEyvV0_z76Bfks\"\u003e GoogleDrive \u003c/a\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eDSGN(Pedestrian)\u003c/td\u003e\n            \u003ctd rowspan=2\u003e27\u003c/td\u003e\n            \u003ctd rowspan=2\u003e ~27 \u003c/td\u003e\n            \u003ctd rowspan=2\u003e 5.47 \u003c/td\u003e\n            \u003ctd\u003e31.42\u003c/td\u003e\n            \u003ctd\u003e39.35\u003c/td\u003e\n            \u003ctd\u003e55.68\u003c/td\u003e\n            \u003ctd rowspan=2\u003e\u003ca href=\"https://drive.google.com/open?id=1L14QisrQMyIbowhSSOf_FjaOx9CVe0oF\"\u003e GoogleDrive \u003c/a\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eDSGN(Cyclist)\u003c/td\u003e\n            \u003ctd\u003e23.16\u003c/td\u003e\n            \u003ctd\u003e24.81\u003c/td\u003e\n            \u003ctd\u003e32.86\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eDSGN_24g(Car)\u003c/td\u003e\n            \u003ctd\u003e53\u003c/td\u003e\n            \u003ctd\u003e~24\u003c/td\u003e\n            \u003ctd\u003e~6\u003c/td\u003e\n            \u003ctd\u003e51.05\u003c/td\u003e\n            \u003ctd\u003e61.04\u003c/td\u003e\n            \u003ctd\u003e83.46\u003c/td\u003e\n            \u003ctd\u003e TODO \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eDSGN_12g(Car)\u003c/td\u003e\n            \u003ctd\u003e48\u003c/td\u003e\n            \u003ctd\u003e10.0\u003c/td\u003e\n            \u003ctd\u003e3.0\u003c/td\u003e\n            \u003ctd\u003e44.61\u003c/td\u003e\n            \u003ctd\u003e55.70\u003c/td\u003e\n            \u003ctd\u003e78.25\u003c/td\u003e\n            \u003ctd\u003e\u003ca href=\"https://drive.google.com/file/d/1Vt7mcSV-_xw-HrgKhrza4KmLAOhrzeA0/view?usp=sharing\"\u003e GoogleDrive \u003c/a\u003e\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\n### Video Demo\n\nWe provide a video demo for showing the result of DSGN. Here we show the predicted depth map and 3D detection results on both front view (the left camera view) and bird's eye view (the ground-truth point cloud).\n\n\u003cp align=\"center\"\u003e \u003ca href=\"https://www.youtube.com/watch?v=u6mQW89wBbo\"\u003e\u003cimg src=\"./doc/demo_cover.jpg\" width=\"50%\"\u003e\u003c/a\u003e \u003c/p\u003e\n\n### TODO List\n- [x] Multiprocessing GPU training\n- [x] TensorboardX\n- [x] Reduce training GPU memory usage\n- [ ] Result visualization\n- [ ] Still in progress\n\n### Troubleshooting\n\nIf you have issues running or compiling this code, we have compiled a list of common issues in [TROUBLESHOOTING.md](TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue.\n\n### Citations\nIf you find our work useful in your research, please consider citing:\n```\n@article{chen2020dsgn,\n  title={DSGN: Deep Stereo Geometry Network for 3D Object Detection},\n  author={Chen, Yilun and Liu, Shu and Shen, Xiaoyong and Jia, Jiaya},\n  journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  year={2020}\n}\n```\n\n### Acknowledgment\nThis repo borrows code from several repos, like [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark), [PSMNet](https://github.com/JiaRenChang/PSMNet), [FCOS](https://github.com/tianzhi0549/FCOS) and [kitti-object-eval-python](https://github.com/traveller59/kitti-object-eval-python).\n\n### Contact\nIf you have any questions or suggestions about this repo, please feel free to contact me (chenyilun95@gmail.com).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fdsgn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdvlab-research%2Fdsgn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvlab-research%2Fdsgn/lists"}