Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving

3D Object Detection for Autonomous Driving: A Comprehensive Survey (IJCV 2023)
https://github.com/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving
List: Awesome-3D-Object-Detection-for-Autonomous-Driving
3d-object-detection autonomous-driving autonomous-vehicles kitti-dataset lidar nuscenes waymo-open-dataset
Last synced: 3 months ago
JSON representation
3D Object Detection for Autonomous Driving: A Comprehensive Survey (IJCV 2023)
Host: GitHub
URL: https://github.com/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving
Owner: PointsCoder
Created: 2021-12-12T12:42:59.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2023-05-04T09:49:08.000Z (about 1 year ago)
Last Synced: 2024-03-07T12:07:37.182Z (4 months ago)
Topics: 3d-object-detection, autonomous-driving, autonomous-vehicles, kitti-dataset, lidar, nuscenes, waymo-open-dataset
Homepage: https://link.springer.com/article/10.1007/s11263-023-01790-1
Size: 2.13 MB
Stars: 474
Watchers: 17
Forks: 67
Open Issues: 0
Metadata Files:
- Readme: README.md
Lists

awesome-stars - PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving - 3D Object Detection for Autonomous Driving: A Comprehensive Survey (IJCV 2023) (Others)
awesome-mobile-robotics - 3D Object Detection for Autonomous Driving
README

        # 3D Object Detection for Autonomous Driving: A Comprehensive Survey (IJCV 2023)

[![arXiv](https://img.shields.io/badge/arXiv-2206.09474-b31b1b.svg)](https://arxiv.org/abs/2206.09474)

[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving/graphs/commit-activity)

[![GitHub issues](https://img.shields.io/github/issues/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving)](https://github.com/PointsCoder/Awesome-3D-Object-Detection-for-Autonomous-Driving/issues/)

[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)

![overview](Figs/overview.JPG)

This repository is with our [survey paper](https://arxiv.org/abs/2206.09474):

> **Title:** 3D Object Detection for Autonomous Driving: A Comprehensive Survey 


> **Authors:** Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li


> **Publication:** International Journal of Computer Vision (IJCV) 


a.k.a

> **Title:** 3D Object Detection for Autonomous Driving: A Review and New Outlooks 


> **Authors:** Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li


> arXiv preprint arXiv:2206.09474


We also provide a paper collection on 3D object detection for autonomous driving at [Awesome 3D Object Detection for Autonomous Driving](Papers.md).

## Content

![taxonomy](Figs/tax.JPG)



- [1. LiDAR-based 3D Object Detection](#1)

    - [1.1 Data representations for LiDAR 3D object detection](#1.1)

        - [1.1.1 Point-based 3D object detection](#1.1)

        - [1.1.2 Grid-based 3D object detection](#1.2)

        - [1.1.3 Point-voxel based 3D object detection](#1.3)

        - [1.1.4 Range-based 3D object detection](#1.4)

    - [1.2 Learning objectives for LiDAR 3D object detection](#1.5)

        - [1.2.1 Anchor-based 3D object detection](#1.5)

        - [1.2.2 Anchor-free 3D object detection](#1.6)

        - [1.2.3 3D object detection with auxiliary tasks](#1.6)

- [2. Camera-based 3D Object Detection](#2)

    - [2.1 Monocular 3D object detection](#2)

        - [2.1.1 Image-only monocular 3D object detection](#2.1)

        - [2.1.2 Depth-assisted monocular 3D object detection](#2.2)

        - [2.1.3 Prior-guided monocular 3D object detection](#2.3)

    - [2.2 Stereo-based 3D object detection](#2.4)

    - [2.3 Multi-camera 3D object detection](#2.4)

- [3. Multi-Modal 3D Object Detection](#3)

    - [3.1 Multi-modal detection with LiDAR-camera fusion](#3.1)

        - [3.1.1 Early-fusion based 3D object detection](#3.1)

        - [3.1.2 Intermediate-fusion based 3D object detection](#3.2)

        - [3.1.3 Late-fusion based 3D object detection](#3.3)

    - [3.2 Multi-modal detection with radar signals](#3.3)

    - [3.3 Multi-modal detection with high-definition maps](#3.3)

- [4. Temporal 3D Object Detection](#4)

    - [4.1 3D object detection from LiDAR sequences](#4.1)

    - [4.2 3D object detection from streaming data](#4.2)

    - [4.3 3D object detection from videos](#4.2)

- [5. Label-Efficient 3D Object Detection](#5)

    - [5.1 Domain adaptation for 3D object detection](#5.1)

    - [5.2 Weakly-supervised 3D object detection](#5.2)

    - [5.3 Semi-supervised 3D object detection](#5.3)

    - [5.4 Self-supervised 3D object detection](#5.4)

- [6. 3D Object Detection in Driving Systems](#6)

    - [6.1 End-to-end learning for autonomous driving](#6.1)

    - [6.2 Simulation for 3D object detection](#6.1)

    - [6.3 Robustness for 3D object detection](#6.1)

    - [6.4 Collaborative 3D object detection](#6.2)



## LiDAR-based 3D Object Detection

![](Figs/lidarmap.JPG)

A chronological overview of the most prestigious LiDAR-based 3D object detection methods. [[Back to content]](#0)



### Point-based 3D object detection [[Papers]](Docs/Sensor/LiDAR/point_view.md)

![](Figs/point.JPG)

A general point-based detection framework contains a point-based backbone network and a prediction head. The point-based backbone consists of several blocks for point cloud

sampling and feature learning, and the prediction head directly estimates 3D bounding boxes from the candidate points. [[Back to content]](#0)



### Grid-based 3D object detection [[Papers]](Docs/Sensor/LiDAR/volumetric_view.md)

![](Figs/grid.JPG)

The grid-based approaches rasterize point cloud into

3 grid representations: voxels, pillars, and bird’s-eye view (BEV) feature maps. 2D convolutional neural networks or 3D

sparse neural networks are applied on grids for feature extraction. 3D objects are finally predicted from BEV grid cells. [[Back to content]](#0)



### Point-voxel based 3D object detection [[Papers]](Docs/Sensor/LiDAR/mixed_views.md)

![](Figs/pv.JPG)

Single-stage point-voxel detection framework fuses

point and voxel features in the backbone network. Two-stage point-voxel detection framework first generates 3D object

proposals with a voxel-based 3D detector, and then refines these proposals using keypoints sampled from point cloud. [[Back to content]](#0)



### Range-based 3D object detection [[Papers]](Docs/Sensor/LiDAR/range_view.md)

![](Figs/range.JPG)

The first category of range-based approaches directly predicts

3D objects from pixels in range images, with standard 2D

convolutions, or specialized convolutional/graph operators

for feature extraction. The second category transforms features

from range view into bird’s-eye view or point-view,

and then detects 3D objects from the transformed view. [[Back to content]](#0)



### Anchor-based 3D object detection

![](Figs/anchor.JPG)

3D anchor boxes are placed at each BEV grid cell. Those anchors

that have high IoUs with ground truths are selected as

positives. The sizes and centers of 3D objects are regressed

from the positive anchors, and the objects’ heading angles

are predicted by bin-based classification and regression. [[Back to content]](#0)



### Anchor-free 3D object detection

![](Figs/anchorfree.JPG)

The anchor-free

learning targets can be assigned to diverse views, including

the bird’s-eye view, point view, and range view. Object

parameters are predicted directly from the positive samples. [[Back to content]](#0)



## Camera-based 3D Object Detection

![](Figs/cameramap.JPG)

A chronological overview of the camera-based 3D object detection methods. [[Back to content]](#0)



### Image-only monocular 3D object detection [[Papers]](Docs/Sensor/Camera/monocular.md)

![](Figs/imageonly.JPG)

Single-stage anchor-based approaches

predict 3D object parameters leveraging both image features

and predefined 3D anchor boxes. Single-stage anchor-free

methods directly predict 3D object parameters from image

pixels. Two-stage approaches first generate 2D bounding

boxes from a 2D detector, and then lift up 2D detection to

the 3D space by predicting 3D object parameters from the

2D RoI features. [[Back to content]](#0)



### Depth-assisted monocular 3D object detection [[Papers]](Docs/Sensor/Camera/monocular.md)

![](Figs/depth.JPG)

Depth-image based approaches obtain

depth-aware image features by fusing information from both the RGB image and the depth image. Pseudo-LiDAR based

methods first transform the depth image into a 3D pseudo point cloud, and then apply LiDAR-based 3D detector on the

point cloud to detect 3D objects. Patch-based approaches transform the depth image into a 2D coordinate map, and then

apply a 2D neural network on the coordinate map for detection. [[Back to content]](#0)



### Prior-guided monocular 3D object detection [[Papers]](Docs/Sensor/Camera/monocular.md)

![](Figs/prior.JPG)

Prior-guided approaches leverage object shape priors, geometric priors, segmentation and temporal constrains to help detect 3D objects. [[Back to content]](#0)



### Stereo-based 3D object detection [[Papers]](Docs/Sensor/Camera/stereo.md)

![](Figs/stereo.JPG)

2D-detection based methods first generate a pair of 2D

proposals from the left and right image respectively, and then estimate 3D object parameters from the paired proposals.

Pseudo-LiDAR based approaches predict a disparity map by stereo matching, and then transform the disparity estimation

into depth and 3D point cloud subsequently, followed by a LiDAR-based detector for 3D detection. Volume-based methods

construct a 3D feature volume by view transform, and then a grid-based 3D object detector is applied on the 3D volume

for detection. [[Back to content]](#0)



## Multi-Modal 3D Object Detection

![](Figs/fusionmap.JPG)

A chronological overview of the most prestigious multi-modal 3D object detection methods. [[Back to content]](#0)



### Multi-modal detection with LiDAR-camera fusion (Early-Fusion) [[Papers]](Docs/Sensor/MultiModal/lidar_and_camera.md)

![](Figs/early.JPG)

Early-fusion approaches enhance point cloud

features with image information before they are passed through a LiDAR-based 3D object detector. In region-level

knowledge fusion, 2D detection is firstly employed on images to generate 2D bounding boxes. Then 2D boxes are

extruded into viewing frustums to select proper point cloud regions for the subsequent LiDAR-based 3D object detection.

In point-level knowledge fusion, semantic segmentation is firstly applied on images, and then the segmentation results are

transferred from the image pixels to points and used as an additional feature attached to each point. The augmented point

cloud is finally passed through a LiDAR detector for 3D object detection. [[Back to content]](#0)



### Multi-modal detection with LiDAR-camera fusion (Intermediate-Fusion) [[Papers]](Docs/Sensor/MultiModal/lidar_and_camera.md)

![](Figs/inter.JPG)

Intermediate fusion approaches aim to

conduct multi-modal fusion at the intermediate steps of a 3D object detection pipeline. In backbone networks, pixel-to-point

correspondences are firstly established by camera-to-LiDAR transform, and then with the correspondences, LiDAR features

are fused with image features through diverse fusion operators. The fusion can be conducted either at the intermediate

layers or only at the output feature maps. In the proposal generation and refinement stage, 3D object proposals are first

generated and then projected into the camera and LiDAR views to crop features of different modalities. The multi-view

features are finally fused to refine the 3D object proposals for detection. [[Back to content]](#0)



### Multi-modal detection with LiDAR-camera fusion (Late-Fusion) [[Papers]](Docs/Sensor/MultiModal/lidar_and_camera.md)

![](Figs/late.JPG)

Late-fusion based approaches operate on the

outputs, i.e. 3D and 2D bounding boxes, generated from a LiDAR-based 3D object detector and an image-based 2D object

detector respectively. 3D boxes and 2D boxes are combined together and fused to obtain the final detection results. [[Back to content]](#0)



## Temporal 3D Object Detection

![](Figs/temporalmap.JPG)

A chronological overview of the most prestigious temporal 3D object detection methods. [[Back to content]](#0)



### 3D object detection from LiDAR sequences [[Papers]](Docs/Sequential/sequential.md)

![](Figs/lidarseq.JPG)

In temporal 3D object detection from LiDAR sequences, diverse temporal aggregation modules

are employed to fuse features and object proposals from

multi-frame point clouds. [[Back to content]](#0)



### 3D object detection from streaming data [[Papers]](Docs/Sequential/sequential.md)

![](Figs/stream.JPG)

Detection from streaming data is conducted on each LiDAR

packet before the scanner produces a complete sweep. [[Back to content]](#0)



## Label-Efficient 3D Object Detection



### Domain adaptation for 3D object detection [[Papers]](Docs/Learning/domain_adaptation.md)

![](Figs/da.JPG)

In real-world applications, 3D object detectors suffer from severe domain gaps across different datasets, sensors, and weather conditions. [[Back to content]](#0)



### Weakly-supervised 3D object detection [[Papers]](Docs/Learning/weak_learning.md)

![](Figs/weak.JPG)

Weakly-supervised approaches learn to

detect 3D objects with weak supervisory signals. [[Back to content]](#0)



### Semi-supervised 3D object detection [[Papers]](Docs/Learning/semi_learning.md)

![](Figs/semi.JPG)

Semi-supervised approaches first pretrain

a 3D detector on the labeled data, and then use the

pre-trained detector to produce pseudo labels or leverage

teacher-student models for training on the unlabeled data

to further boost the detection performance. [[Back to content]](#0)



### Self-supervised 3D object detection [[Papers]](Docs/Learning/self_learning.md)

![](Figs/self.JPG)

Self-supervised approaches first pre-train

a 3D detector on the unlabeled data in a self-supervised

manner, and then fine-tune the detector on the labeled data. [[Back to content]](#0)



## 3D Object Detection in Driving Systems



### End-to-end learning for autonomous driving [[Papers]](Docs/Applications/system.md)

![](Figs/end.JPG)

End-to-end autonomous driving aims to integrate all

tasks in autonomous driving, e.g. perception, prediction, planning, control, mapping, localization, into a unified framework

and learn these tasks in an end-to-end manner. [[Back to content]](#0)



### Collaborative 3D object detection [[Papers]](Docs/Applications/cooperative_perception.md)

![](Figs/coopr.JPG)

In collaborative 3D object detection, different

vehicles can communicate with each other to obtain a more

reliable detection results. [[Back to content]](#0)