{"id":13443154,"url":"https://github.com/TRI-ML/dd3d","last_synced_at":"2025-03-20T16:30:34.964Z","repository":{"id":41381649,"uuid":"390157170","full_name":"TRI-ML/dd3d","owner":"TRI-ML","description":"Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.","archived":false,"fork":false,"pushed_at":"2022-11-29T21:17:39.000Z","size":4035,"stargazers_count":473,"open_issues_count":42,"forks_count":75,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-03-14T11:05:29.736Z","etag":null,"topics":["computer-vision","deep-learning","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TRI-ML.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-27T23:39:29.000Z","updated_at":"2025-03-03T18:39:46.000Z","dependencies_parsed_at":"2022-07-19T02:04:24.276Z","dependency_job_id":null,"html_url":"https://github.com/TRI-ML/dd3d","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TRI-ML%2Fdd3d","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TRI-ML%2Fdd3d/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TRI-ML%2Fdd3d/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TRI-ML%2Fdd3d/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TRI-ML","download_url":"https://codeload.github.com/TRI-ML/dd3d/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244649681,"owners_count":20487467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","pytorch"],"created_at":"2024-07-31T03:01:56.825Z","updated_at":"2025-03-20T16:30:33.414Z","avatar_url":"https://github.com/TRI-ML.png","language":"Python","funding_links":[],"categories":["Python","二、Camera-based BEV"],"sub_categories":["1. List of camera-based BEV sensing methods"],"readme":"\u003ca href=\"https://www.tri.global/\" target=\"_blank\"\u003e\n \u003cimg align=\"right\" src=\"/media/figs/tri-logo.png\" width=\"25%\"/\u003e\n\u003c/a\u003e\n\n## DD3D: \"Is Pseudo-Lidar needed for Monocular 3D Object detection?\"\n\n[Install](#installation) // [Datasets](#datasets) // [Experiments](#experiments) //  [Models](#models) // [License](#license) // [Reference](#reference)\n\n\n\u003ca href=\"https://youtu.be/rXBoUpq9CVQ\" target=\"_blank\"\u003e\n\u003cimg width=\"100%\" src=\"/media/figs/demo_dd3d_kitti_val_short.gif\"/\u003e\n\u003c/a\u003e\n\n[Full video](https://youtu.be/rXBoUpq9CVQ)\n\nOfficial [PyTorch](https://pytorch.org/) implementation of _DD3D_: [**Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021)**](https://arxiv.org/abs/2108.06417),\n*Dennis Park\u003csup\u003e\\*\u003c/sup\u003e, Rares Ambrus\u003csup\u003e\\*\u003c/sup\u003e, Vitor Guizilini, Jie Li, and Adrien Gaidon*.\n\n## Installation\nWe recommend using docker (see [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker) instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):\n\n```bash\ngit clone https://github.com/TRI-ML/dd3d.git\ncd dd3d\n# If you want to use docker (recommended)\nmake docker-build # CUDA 10.2\n# Alternative docker image for cuda 11.1\n# make docker-build DOCKERFILE=Dockerfile-cu111\n```\nPlease check the version of your nvidia driver and [cuda compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/) to determine which Dockerfile to use.\n\nWe will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with `make docker-dev` to land in a shell where you can type those commands, or you can do it in one step:\n\n```bash\n# single GPU\nmake docker-run COMMAND=\"\u003csome-command\u003e\"\n# multi GPU\nmake docker-run-mpi COMMAND=\"\u003csome-command\u003e\"\n```\n\nIf you want to use features related to [AWS](https://aws.amazon.com/) (for caching the output directory)\nand [Weights \u0026 Biases](https://www.wandb.com/) (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables **before** building the docker image:\n\n```bash\nexport AWS_SECRET_ACCESS_KEY=\"\u003csomething\u003e\"\nexport AWS_ACCESS_KEY_ID=\"\u003csomething\u003e\"\nexport AWS_DEFAULT_REGION=\"\u003csomething\u003e\"\nexport WANDB_ENTITY=\"\u003csomething\u003e\"\nexport WANDB_API_KEY=\"\u003csomething\u003e\"\n```\nYou should also enable these features in configuration, such as [`WANDB.ENABLED`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L14) and [`SYNC_OUTPUT_DIR_S3.ENABLED`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L29).\n\n### Datasets\nBy default, datasets are assumed to be downloaded in `/data/datasets/\u003cdataset-name\u003e` (can be a symbolic link). The dataset root is configurable by [`DATASET_ROOT`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L35).\n\n#### KITTI\n\nThe KITTI 3D dataset used in our experiments can be downloaded from the [KITTI website](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d).\nFor convenience, we provide the standard splits used in [3DOP](https://xiaozhichen.github.io/papers/nips15chen.pdf) for training and evaluation:\n```\n# download a standard splits subset of KITTI\ncurl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D\n```\n\nThe dataset must be organized as follows:\n\n```\n\u003cDATASET_ROOT\u003e\n    └── KITTI3D\n        ├── mv3d_kitti_splits\n        │   ├── test.txt\n        │   ├── train.txt\n        │   ├── trainval.txt\n        │   └── val.txt\n        ├── testing\n        │   ├── calib\n        |   │   ├── 000000.txt\n        |   │   ├── 000001.txt\n        |   │   └── ...\n        │   └── image_2\n        │       ├── 000000.png\n        │       ├── 000001.png\n        │       └── ...\n        └── training\n            ├── calib\n            │   ├── 000000.txt\n            │   ├── 000001.txt\n            │   └── ...\n            ├── image_2\n            │   ├── 000000.png\n            │   ├── 000001.png\n            │   └── ...\n            └── label_2\n                ├── 000000.txt\n                ├── 000001.txt\n                └── ..\n```\n\n#### nuScenes\nThe nuScenes dataset (v1.0) can be downloaded from the [nuScenes website](https://www.nuscenes.org/download). The dataset must be organized as follows:\n```\n\u003cDATASET_ROOT\u003e\n    └── nuScenes\n        ├── samples\n        │   ├── CAM_FRONT\n        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg\n        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg\n        │   │   ├── ...\n        │   │  \n        │   ├── CAM_FRONT_LEFT\n        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg\n        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg\n        │   │   ├── ...\n        │   │  \n        │   ├── ...\n        │  \n        ├── v1.0-trainval\n        │   ├── attribute.json\n        │   ├── calibrated_sensor.json\n        │   ├── category.json\n        │   ├── ...\n        │  \n        ├── v1.0-test\n        │   ├── attribute.json\n        │   ├── calibrated_sensor.json\n        │   ├── category.json\n        │   ├── ...\n        │  \n        ├── v1.0-mini\n        │   ├── attribute.json\n        │   ├── calibrated_sensor.json\n        │   ├── category.json\n        │   ├── ...\n```\n\n### Pre-trained DD3D models\nThe DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:\n| backbone | download |\n| :---: | :---: |\n| DLA34 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_dla34-2lnfuzr1.pth) |\n| V2-99 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_v99-3jlw0p36-20210423_010520-model_final-remapped.pth) |\n| OmniML | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_omninet-small-3nxjur71.pth) |\n\nThe `OmniML` model is optimized by [OmniML](https://www.omniml.ai/) for highly efficient deployment on target hardware with better accuracy. The `OmniML` model achieves 1.75x speedup (measured with NVIDIA Xavier, int8, batch_size=1), 60% less GFlops (measured with input size 512x896) with better performance compared to standard DLA-34. Please see the Models section for configs.\n\n#### (Optional) Eigen-clean subset of KITTI raw.\nTo train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded [here](https://tri-ml-public.s3.amazonaws.com/github/dd3d/eigen_clean.txt). Each row contains left and right image pairs. The KITTI raw dataset can be download [here](http://www.cvlibs.net/datasets/kitti/raw_data.php).\n\n### Validating installation\nTo validate and visualize the dataloader (including [data augmentation](./configs/defaults/augmentation.yaml)), run the following:\n\n```bash\n./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4\n```\n\nTo validate the entire training loop (including [evaluation](./configs/evaluators) and [visualization](./configs/visualizers)), run the [overfit experiment](configs/experiments/dd3d_kitti_dla34_overfit.yaml) (trained on test set):\n\n```bash\n./scripts/train.py +experiments=dd3d_kitti_dla34_overfit\n```\n| experiment | backbone | train mem. (GB) | train time (hr) | train log | Box AP (%) | BEV AP (%) | download |\n| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| [config](configs/experiments/dd3d_kitti_dla34_overfit.yaml) | DLA-34 | 6 | 0.25 | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/dla34-kitti-overfit/logs/log.txt) | 84.54 |  88.83 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/dla34-kitti-overfit/model_final.pth) |\n\n\n## Experiments\n### Configuration\nWe use [hydra](https://hydra.cc/) to configure experiments, specifically following [this pattern](https://hydra.cc/docs/patterns/configuring_experiments) to organize and compose configurations. The experiments under [configs/experiments](./configs/experiments) describe the delta from the [default configuration](./configs/defaults.yaml), and can be run as follows:\n```bash\n# omit the '.yaml' extension from the experiment file.\n./scripts/train.py +experiments=\u003cexperiment-file\u003e \u003cconfig-override\u003e\n```\nThe configuration is modularized by various components such as [datasets](./configs/train_datasets/), [backbones](./configs/backbones/), [evaluators](./configs/evaluators/), and [visualizers](./configs/visualizers), etc.\n\n\n### Using multiple GPUs\nThe [training script](./scripts/train.py) supports (single-node) multi-GPU for training and evaluation via [mpirun](https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php). This is most conveniently executed by the `make docker-run-mpi` command (see [above](#installation)).\nInternally, `IMS_PER_BATCH` parameters of the [optimizer](https://github.com/TRI-ML/dd3d/blob/main/configs/common/optimizer.yaml#L5) and the [evaluator](https://github.com/TRI-ML/dd3d/blob/main/configs/common/test.yaml#L9) denote the **total** size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.\n\n### Evaluation\nOne can run only evaluation using the pretrained models:\n```bash\n./scripts/train.py +experiments=\u003csome-experiment\u003e EVAL_ONLY=True MODEL.CKPT=\u003cpath-to-pretrained-model\u003e\n# use smaller batch size for single-gpu\n./scripts/train.py +experiments=\u003csome-experiment\u003e EVAL_ONLY=True MODEL.CKPT=\u003cpath-to-pretrained-model\u003e TEST.IMS_PER_BATCH=4\n```\n\n### Gradient accumulation\nIf you have insufficient GPU memory for any experiment, you can use [gradient accumulation](https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa) by configuring [`ACCUMULATE_GRAD_BATCHES`](https://github.com/TRI-ML/dd3d/blob/main/configs/common/optimizer.yaml#L63), at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. [V2-99, KITTI](./configs/experiments/dd3d_kitti_v99.yaml)) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:\n```bash\n# The original batch size is 64.\n./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4\n```\n\n## Models\nAll DLA-34 and V2-99 experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟶ 0.25Hz) to save training time.\n\n(*): Trained using 8 A5000 GPUs.\n(**): Benchmarked on NVIDIA Xavier.\n\n### KITTI\n| experiment | backbone | train mem. (GB) | train time (hr) | GFLOPs | latency (ms) | train log |  Box AP (%) | BEV AP (%) | download |\n| :---: | :--: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |  :---: | \n| [config](configs/experiments/dd3d_kitti_dla34.yaml) | DLA-34 | 256 | 4.5 | 103 | 19.9** | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/26675chm-20210826_083148/logs/log.txt) | 16.92 |  24.77 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/26675chm-20210826_083148/model_final.pth) |\n| [config](configs/experiments/dd3d_kitti_v99.yaml) | V2-99 | 400 | 9.0 | 453 | - | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/4elbgev2-20210825_201852/logs/log.txt) | 23.90 |  32.01 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/4elbgev2-20210825_201852/model_final.pth) |\n| [config](configs/experiments/dd3d_kitti_omninets.yaml) | OmniML | 70* | 3.0* | 41 | 11.4** | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/DD3D-OmniML-kitti-log.txt) | 20.58 |  28.73 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/DD3D-OmniML-kitti.pth) |\n\n### nuScenes\n| experiment | backbone | train mem. (GB) | train time (hr) | train log | mAP (%) | NDS | download |\n| :---: | :--: | :---: | :---: | :---: | :---: | :---: | :---: |\n| [config](configs/experiments/dd3d_nusc_dla34.yaml) | DLA-34 | TBD | TBD | TBD) | TBD |  TBD | TBD |\n| [config](configs/experiments/dd3d_nusc_v99.yaml) | V2-99 | TBD | TBD | TBD | TBD |  TBD | TBD |\n\n\n## License\nThe source code is released under the [MIT license](LICENSE.md). We note that some code in this repository is adapted from the following repositories:\n- [detectron2](https://github.com/facebookresearch/detectron2)\n- [AdelaiDet](https://github.com/aim-uofa/AdelaiDet)\n\n## Reference\n```\n@inproceedings{park2021dd3d,\n  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},\n  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},\n  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},\n  primaryClass = {cs.CV},\n  year = {2021},\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTRI-ML%2Fdd3d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTRI-ML%2Fdd3d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTRI-ML%2Fdd3d/lists"}