https://github.com/TRI-ML/dd3d

Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.
https://github.com/TRI-ML/dd3d
computer-vision deep-learning pytorch
Last synced: 4 months ago
JSON representation
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.
Host: GitHub
URL: https://github.com/TRI-ML/dd3d
Owner: TRI-ML
License: mit
Created: 2021-07-27T23:39:29.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-11-29T21:17:39.000Z (over 2 years ago)
Last Synced: 2025-03-14T11:05:29.736Z (4 months ago)
Topics: computer-vision, deep-learning, pytorch
Language: Python
Homepage:
Size: 3.85 MB
Stars: 473
Watchers: 21
Forks: 75
Open Issues: 42
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

Awesome-BEV-Perception - project
README

        

 



## DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?"

[Install](#installation) // [Datasets](#datasets) // [Experiments](#experiments) //  [Models](#models) // [License](#license) // [Reference](#reference)







[Full video](https://youtu.be/rXBoUpq9CVQ)

Official [PyTorch](https://pytorch.org/) implementation of _DD3D_: [**Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021)**](https://arxiv.org/abs/2108.06417),

*Dennis Park^\*, Rares Ambrus^\*, Vitor Guizilini, Jie Li, and Adrien Gaidon*.

## Installation

We recommend using docker (see [nvidia-docker2](https://github.com/NVIDIA/nvidia-docker) instructions) to have a reproducible environment. To setup your environment, type in a terminal (only tested in Ubuntu 18.04):

```bash

git clone https://github.com/TRI-ML/dd3d.git

cd dd3d

# If you want to use docker (recommended)

make docker-build # CUDA 10.2

# Alternative docker image for cuda 11.1

# make docker-build DOCKERFILE=Dockerfile-cu111

```

Please check the version of your nvidia driver and [cuda compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/) to determine which Dockerfile to use.

We will list below all commands as if run directly inside our container. To run any of the commands in a container, you can either start the container in interactive mode with `make docker-dev` to land in a shell where you can type those commands, or you can do it in one step:

```bash

# single GPU

make docker-run COMMAND=""

# multi GPU

make docker-run-mpi COMMAND=""

```

If you want to use features related to [AWS](https://aws.amazon.com/) (for caching the output directory)

and [Weights & Biases](https://www.wandb.com/) (for experiment management/visualization), then you should create associated accounts and configure your shell with the following environment variables **before** building the docker image:

```bash

export AWS_SECRET_ACCESS_KEY=""

export AWS_ACCESS_KEY_ID=""

export AWS_DEFAULT_REGION=""

export WANDB_ENTITY=""

export WANDB_API_KEY=""

```

You should also enable these features in configuration, such as [`WANDB.ENABLED`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L14) and [`SYNC_OUTPUT_DIR_S3.ENABLED`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L29).

### Datasets

By default, datasets are assumed to be downloaded in `/data/datasets/` (can be a symbolic link). The dataset root is configurable by [`DATASET_ROOT`](https://github.com/TRI-ML/dd3d/blob/main/configs/defaults.yaml#L35).

#### KITTI

The KITTI 3D dataset used in our experiments can be downloaded from the [KITTI website](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d).

For convenience, we provide the standard splits used in [3DOP](https://xiaozhichen.github.io/papers/nips15chen.pdf) for training and evaluation:

```

# download a standard splits subset of KITTI

curl -s https://tri-ml-public.s3.amazonaws.com/github/dd3d/mv3d_kitti_splits.tar | sudo tar xv -C /data/datasets/KITTI3D

```

The dataset must be organized as follows:

```

    └── KITTI3D

        ├── mv3d_kitti_splits

        │   ├── test.txt

        │   ├── train.txt

        │   ├── trainval.txt

        │   └── val.txt

        ├── testing

        │   ├── calib

        |   │   ├── 000000.txt

        |   │   ├── 000001.txt

        |   │   └── ...

        │   └── image_2

        │       ├── 000000.png

        │       ├── 000001.png

        │       └── ...

        └── training

            ├── calib

            │   ├── 000000.txt

            │   ├── 000001.txt

            │   └── ...

            ├── image_2

            │   ├── 000000.png

            │   ├── 000001.png

            │   └── ...

            └── label_2

                ├── 000000.txt

                ├── 000001.txt

                └── ..

```

#### nuScenes

The nuScenes dataset (v1.0) can be downloaded from the [nuScenes website](https://www.nuscenes.org/download). The dataset must be organized as follows:

```

    └── nuScenes

        ├── samples

        │   ├── CAM_FRONT

        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243012465.jpg

        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT__1526915243512465.jpg

        │   │   ├── ...

        │   │  

        │   ├── CAM_FRONT_LEFT

        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243004917.jpg

        │   │   ├── n008-2018-05-21-11-06-59-0400__CAM_FRONT_LEFT__1526915243504917.jpg

        │   │   ├── ...

        │   │  

        │   ├── ...

        │  

        ├── v1.0-trainval

        │   ├── attribute.json

        │   ├── calibrated_sensor.json

        │   ├── category.json

        │   ├── ...

        │  

        ├── v1.0-test

        │   ├── attribute.json

        │   ├── calibrated_sensor.json

        │   ├── category.json

        │   ├── ...

        │  

        ├── v1.0-mini

        │   ├── attribute.json

        │   ├── calibrated_sensor.json

        │   ├── category.json

        │   ├── ...

```

### Pre-trained DD3D models

The DD3D models pre-trained on dense depth estimation using DDAD15M can be downloaded here:

| backbone | download |

| :---: | :---: |

| DLA34 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_dla34-2lnfuzr1.pth) |

| V2-99 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_v99-3jlw0p36-20210423_010520-model_final-remapped.pth) |

| OmniML | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/pretrained/depth_pretrained_omninet-small-3nxjur71.pth) |

The `OmniML` model is optimized by [OmniML](https://www.omniml.ai/) for highly efficient deployment on target hardware with better accuracy. The `OmniML` model achieves 1.75x speedup (measured with NVIDIA Xavier, int8, batch_size=1), 60% less GFlops (measured with input size 512x896) with better performance compared to standard DLA-34. Please see the Models section for configs.

#### (Optional) Eigen-clean subset of KITTI raw.

To train our Pseudo-Lidar detector, we curated a new subset of KITTI (raw) dataset and use it to fine-tune its depth network. This subset can be downloaded [here](https://tri-ml-public.s3.amazonaws.com/github/dd3d/eigen_clean.txt). Each row contains left and right image pairs. The KITTI raw dataset can be download [here](http://www.cvlibs.net/datasets/kitti/raw_data.php).

### Validating installation

To validate and visualize the dataloader (including [data augmentation](./configs/defaults/augmentation.yaml)), run the following:

```bash

./scripts/visualize_dataloader.py +experiments=dd3d_kitti_dla34 SOLVER.IMS_PER_BATCH=4

```

To validate the entire training loop (including [evaluation](./configs/evaluators) and [visualization](./configs/visualizers)), run the [overfit experiment](configs/experiments/dd3d_kitti_dla34_overfit.yaml) (trained on test set):

```bash

./scripts/train.py +experiments=dd3d_kitti_dla34_overfit

```

| experiment | backbone | train mem. (GB) | train time (hr) | train log | Box AP (%) | BEV AP (%) | download |

| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |

| [config](configs/experiments/dd3d_kitti_dla34_overfit.yaml) | DLA-34 | 6 | 0.25 | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/dla34-kitti-overfit/logs/log.txt) | 84.54 |  88.83 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/dla34-kitti-overfit/model_final.pth) |

## Experiments

### Configuration

We use [hydra](https://hydra.cc/) to configure experiments, specifically following [this pattern](https://hydra.cc/docs/patterns/configuring_experiments) to organize and compose configurations. The experiments under [configs/experiments](./configs/experiments) describe the delta from the [default configuration](./configs/defaults.yaml), and can be run as follows:

```bash

# omit the '.yaml' extension from the experiment file.

./scripts/train.py +experiments= 

```

The configuration is modularized by various components such as [datasets](./configs/train_datasets/), [backbones](./configs/backbones/), [evaluators](./configs/evaluators/), and [visualizers](./configs/visualizers), etc.

### Using multiple GPUs

The [training script](./scripts/train.py) supports (single-node) multi-GPU for training and evaluation via [mpirun](https://www.open-mpi.org/doc/v4.1/man1/mpirun.1.php). This is most conveniently executed by the `make docker-run-mpi` command (see [above](#installation)).

Internally, `IMS_PER_BATCH` parameters of the [optimizer](https://github.com/TRI-ML/dd3d/blob/main/configs/common/optimizer.yaml#L5) and the [evaluator](https://github.com/TRI-ML/dd3d/blob/main/configs/common/test.yaml#L9) denote the **total** size of batch that is sharded across available GPUs while training or evaluating. They are required to be set as a multuple of available GPUs.

### Evaluation

One can run only evaluation using the pretrained models:

```bash

./scripts/train.py +experiments= EVAL_ONLY=True MODEL.CKPT=

# use smaller batch size for single-gpu

./scripts/train.py +experiments= EVAL_ONLY=True MODEL.CKPT= TEST.IMS_PER_BATCH=4

```

### Gradient accumulation

If you have insufficient GPU memory for any experiment, you can use [gradient accumulation](https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa) by configuring [`ACCUMULATE_GRAD_BATCHES`](https://github.com/TRI-ML/dd3d/blob/main/configs/common/optimizer.yaml#L63), at the cost of longer training time. For instance, if the experiment requires at least 400 of GPU memory (e.g. [V2-99, KITTI](./configs/experiments/dd3d_kitti_v99.yaml)) and you have only 128 (e.g., 8 x 16G GPUs), then you can update parameters at every 4th step:

```bash

# The original batch size is 64.

./scripts/train.py +experiments=dd3d_kitti_v99 SOLVER.IMS_PER_BATCH=16 SOLVER.ACCUMULATE_GRAD_BATCHES=4

```

## Models

All DLA-34 and V2-99 experiments here use 8 A100 40G GPUs, and use gradient accumulation when more GPU memory is needed. We subsample nuScenes validation set by a factor of 8 (2Hz ⟶ 0.25Hz) to save training time.

(*): Trained using 8 A5000 GPUs.

(**): Benchmarked on NVIDIA Xavier.

### KITTI

| experiment | backbone | train mem. (GB) | train time (hr) | GFLOPs | latency (ms) | train log |  Box AP (%) | BEV AP (%) | download |

| :---: | :--: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |  :---: | 

| [config](configs/experiments/dd3d_kitti_dla34.yaml) | DLA-34 | 256 | 4.5 | 103 | 19.9** | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/26675chm-20210826_083148/logs/log.txt) | 16.92 |  24.77 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/26675chm-20210826_083148/model_final.pth) |

| [config](configs/experiments/dd3d_kitti_v99.yaml) | V2-99 | 400 | 9.0 | 453 | - | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/4elbgev2-20210825_201852/logs/log.txt) | 23.90 |  32.01 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/4elbgev2-20210825_201852/model_final.pth) |

| [config](configs/experiments/dd3d_kitti_omninets.yaml) | OmniML | 70* | 3.0* | 41 | 11.4** | [log](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/DD3D-OmniML-kitti-log.txt) | 20.58 |  28.73 | [model](https://tri-ml-public.s3.amazonaws.com/github/dd3d/experiments/DD3D-OmniML-kitti.pth) |

### nuScenes

| experiment | backbone | train mem. (GB) | train time (hr) | train log | mAP (%) | NDS | download |

| :---: | :--: | :---: | :---: | :---: | :---: | :---: | :---: |

| [config](configs/experiments/dd3d_nusc_dla34.yaml) | DLA-34 | TBD | TBD | TBD) | TBD |  TBD | TBD |

| [config](configs/experiments/dd3d_nusc_v99.yaml) | V2-99 | TBD | TBD | TBD | TBD |  TBD | TBD |

## License

The source code is released under the [MIT license](LICENSE.md). We note that some code in this repository is adapted from the following repositories:

- [detectron2](https://github.com/facebookresearch/detectron2)

- [AdelaiDet](https://github.com/aim-uofa/AdelaiDet)

## Reference

```

@inproceedings{park2021dd3d,

  author = {Dennis Park and Rares Ambrus and Vitor Guizilini and Jie Li and Adrien Gaidon},

  title = {Is Pseudo-Lidar needed for Monocular 3D Object detection?},

  booktitle = {IEEE/CVF International Conference on Computer Vision (ICCV)},

  primaryClass = {cs.CV},

  year = {2021},

}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/TRI-ML/dd3d

Awesome Lists containing this project

README