https://github.com/opendrivelab/driveagi

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
https://github.com/opendrivelab/driveagi

autonomous-driving embodied-ai foundation-model general-artificial-intelligence large-dataset policy-learning video-dataset video-generation world-models

Last synced: 5 days ago
JSON representation

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System

Host: GitHub
URL: https://github.com/opendrivelab/driveagi
Owner: OpenDriveLab
License: apache-2.0
Created: 2023-04-24T17:59:42.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-01-24T03:44:15.000Z (4 months ago)
Last Synced: 2025-04-14T12:18:13.177Z (about 1 month ago)
Topics: autonomous-driving, embodied-ai, foundation-model, general-artificial-intelligence, large-dataset, policy-learning, video-dataset, video-generation, world-models
Language: Python
Homepage: https://arxiv.org/abs/2403.09630
Size: 13.4 MB
Stars: 708
Watchers: 32
Forks: 32
Open Issues: 10
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

# DriveAGI
This is **"The One"** project that [**`OpenDriveLab`**](https://opendrivelab.com/) is committed to contribute to the community, providing some thought and general picture of how to embrace `foundation models` into autonomous driving.

## Table of Contents
- [NEWS](#news)
- [At A Glance](#at-a-glance)
- 🚀 [Vista](#vista) (NeurIPS 2024)
- ⭐ [GenAD: OpenDV Dataset](#opendv) (CVPR 2024 Hightlight)
- ⭐ [DriveLM](#drivelm) (ECCV 2024 Oral)
- [DriveData Survey](#drivedata-survey)

- [OpenScene](#openscene)
- [OpenLane-V2 Update](#openlane-v2-update)

## NEWS

**[ NEW❗️] `2024/09/08`** We released a mini version of `OpenDV-YouTube`, containing **25 hours** of driving videos. Feel free to try the mini subset by following instructions at [OpenDV-mini](https://github.com/OpenDriveLab/DriveAGI/blob/main/opendv/README.md)!

**`2024/05/28`** We released our latest research, [Vista](#vista), a generalizable driving world model. It's capable of predicting high-fidelity and long-horizon futures, executing multi-modal actions, and serving as a generalizable reward function to assess driving behaviors.

**`2024/03/24`** `OpenDV-YouTube Update:` **Full suite of toolkits for OpenDV-YouTube** is now available, including data downloading and processing scripts, as well as language annotations. Please refer to [OpenDV-YouTube](https://github.com/OpenDriveLab/DriveAGI/tree/main/opendv).

**`2024/03/15`** We released the complete video list of `OpenDV-YouTube`, a large-scale driving video dataset, for [GenAD](https://arxiv.org/abs/2403.09630) project. Data downloading and processing script, as well as language annotations, will be released next week. Stay tuned.

**`2024/01/24`**
We are excited to announce some update to [our survey](#drivedata-survey) and would like to thank John Lambert, Klemens Esterle from the public community for their advice to improve the manuscript.

## At A Glance

Here are some key components to construct a large foundation model curated for an autonomous system.

![overview](assets/overview.png "overview")

Below we would like to share the latest update from our team on the **`DriveData`** side. We will release the detail of the **`DriveEngine`** and the **`DriveAGI`** in the future.

## Vista

> Simulated futures in a wide range of driving scenarios by [Vista](https://arxiv.org/abs/2405.17398). Best viewed on [demo page](https://vista-demo.github.io/).

### [🌏 **A Generalizable Driving World Model with High Fidelity and Versatile Controllability**](https://arxiv.org/abs/2405.17398) (NeurIPS 2024)

**Quick facts:**
- Introducing the world's first **generalizable driving world model**.
- Task: High-fidelity, action-conditioned, and long-horizon future prediction for driving scenes in the wild.
- Dataset: [`OpenDV-YouTube`](https://github.com/OpenDriveLab/DriveAGI/tree/main/opendv), `nuScenes`
- Code and model: https://github.com/OpenDriveLab/Vista
- Video Demo: https://vista-demo.github.io
- Related work: [Vista](https://arxiv.org/abs/2405.17398), [GenAD](https://arxiv.org/abs/2403.09630)

```bibtex
@inproceedings{gao2024vista,
title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability},
author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}

@inproceedings{yang2024genad,
title={{Generalized Predictive Model for Autonomous Driving}},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
```

## GenAD: OpenDV Dataset
![opendv](assets/opendv_examples.png)
> Examples of **real-world** driving scenarios in the OpenDV dataset, including urban, highway, rural scenes, etc.

### [⭐ **Generalized Predictive Model for Autonomous Driving**](https://arxiv.org/abs/2403.09630) (**CVPR 2024, Highlight**)

### [Paper](https://arxiv.org/abs/2403.09630) | [Video](https://www.youtube.com/watch?v=a4H6Jj-7IC0) | [Poster](assets/cvpr24_genad_poster.png) | [Slides](https://opendrivelab.github.io/content/GenAD_slides_with_vista.pdf)

🎦 The **Largest Driving Video dataset** to date, containing more than **1700 hours** of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.

- **Complete video list** (under YouTube license): [OpenDV Videos](https://docs.google.com/spreadsheets/d/1bHWWP_VXeEe5UzIG-QgKFBdH7mNlSC4GFSJkEhFnt2I).
- The downloaded raw videos (`mostly 1080P`) consume about `3 TB` storage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming.
- Therefore, we preprocess them into conseductive images which are more flexible and efficient to load during training. Processed images consumes about `24 TB` storage space in total.
- It's recommended to set up your experiments on a small subset, say **1/20** of the whole dataset. An official mini subset is also provided and you can refer to [**OpenDV-mini**](https://github.com/OpenDriveLab/DriveAGI/tree/main/opendv#about-opendv-youtube-and-opendv-mini) for details. After stablizing the training, you can then apply your method on the whole dataset and hope for the best 🤞.
- **[ New❗️]** **Mini subset**: [OpenDV-mini](https://github.com/OpenDriveLab/DriveAGI/tree/main/opendv).
- A mini version of `OpenDV-YouTube`. The raw videos consume about `44 GB` of storage space and the processed images will consume about `390 GB` of storage space.
- **Step-by-step instruction for data preparation**: [OpenDV-YouTube](https://github.com/OpenDriveLab/DriveAGI/tree/main/opendv/README.md).
- **Language annotation for OpenDV-YouTube**: [OpenDV-YouTube-Language](https://huggingface.co/datasets/OpenDriveLab/OpenDV-YouTube-Language).

**Quick facts:**
- Task: large-scale video prediction for driving scenes.
- Data source: `YouTube`, with careful collection and filtering process.
- Diversity Highlights: 1700 hours of driving videos, covering more than 244 cities in 40 countries.
- Related work: [GenAD](https://arxiv.org/abs/2403.09630) **`Accepted at CVPR 2024, Highlight`**
- `Note`: Annotations for other public datasets in OpenDV-2K will not be released since we randomly sampled a subset of them in training, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following [our paper]((https://arxiv.org/abs/2403.09630)).

```bibtex
@inproceedings{yang2024genad,
title={Generalized Predictive Model for Autonomous Driving},
author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
```

## DriveLM
Introducing the First benchmark on **Language Prompt for Driving**.

**Quick facts:**
- Task: given the language prompts as input, predict the trajectory in the scene
- Origin dataset: `nuScenes`, `CARLA (To be released)`
- Repo: https://github.com/OpenDriveLab/DriveLM, https://github.com/OpenDriveLab/ELM
- Related work: [DriveLM](https://arxiv.org/abs/2312.14150), [ELM](https://arxiv.org/abs/2403.04593)
- Related challenge: [Driving with Language AGC Challenge 2024](https://opendrivelab.com/challenge2024/#driving_with_language)

## DriveData Survey

### Abstract
With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. In this survey, we provide a comprehensive analysis of more than 70 papers on the timeline, impact, challenges, and future trends in autonomous driving dataset.

> **Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future**
> - [English Version](https://arxiv.org/abs/2312.03408)
> - [Chinese Version](https://www.sciengine.com/SSI/doi/10.1360/SSI-2023-0313) **`Accepted at SCIENTIA SINICA Informationis (中文版)`**

```bib
@article{li2024_driving_dataset_survey,
title = {Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future},
author = {Hongyang Li and Yang Li and Huijie Wang and Jia Zeng and Huilin Xu and Pinlong Cai and Li Chen and Junchi Yan and Feng Xu and Lu Xiong and Jingdong Wang and Futang Zhu and Chunjing Xu and Tiancai Wang and Fei Xia and Beipeng Mu and Zhihui Peng and Dahua Lin and Yu Qiao},
journal = {SCIENTIA SINICA Informationis},
year = {2024},
doi = {10.1360/SSI-2023-0313}
}
```

![overview](assets/Drivedata_overview.jpg "Drivedata_overview")
>Current autonomous driving datasets can broadly be categorized into two generations since the 2010s. We define the Impact (y-axis) of a dataset based on sensor configuration, input modality, task category, data scale, ecosystem, etc.

![overview](assets/Drivedata_timeline.jpg "Drivedata_timeline")

### Related Work Collection

We present comprehensive paper collections, leaderboards, and challenges.(Click to expand)

Challenges and Leaderboards

Title
Host
Year
Task
Entry

Autonomous Driving Challenge
OpenDriveLab
CVPR2023
Perception / OpenLane Topology
111

Perception / Online HD Map Construction

Perception / 3D Occupancy Prediction

Prediction & Planning / nuPlan Planning

Waymo Open Dataset Challenges
Waymo
CVPR2023
Perception / 2D Video Panoptic Segmentation
35

Perception / Pose Estimation

Prediction / Motion Prediction

Prediction / Sim Agents

CVPR2022
Prediction / Motion Prediction
128

Prediction / Occupancy and Flow Prediction

Perception / 3D Semantic Segmentation

Perception / 3D Camera-only Detection

CVPR2021
Prediction / Motion Prediction
115

Prediction / Interaction Prediction

Perception / Real-time 3D Detection

Perception / Real-time 2D Detection

Argoverse Challenges
Argoverse
CVPR2023
Prediction / Multi-agent Forecasting
81

Perception & Prediction / Unified Sensorbased Detection, Tracking, and Forecasting

Perception / LiDAR Scene Flow

Prediction / 3D Occupancy Forecasting

CVPR2022
Perception / 3D Object Detection
81

Prediction / Motion Forecasting

Perception / Stereo Depth Estimation

CVPR2021
Perception / Stereo Depth Estimation
368

Prediction / Motion Forecasting

Perception / Streaming 2D Detection

CARLA Autonomous Driving Challenge
CARLA Team, Intel
2023
Planning / CARLA AD Challenge 2.0
-

NeurIPS2022
Planning / CARLA AD Challenge 1.0
19

NeurIPS2021
Planning / CARLA AD Challenge 1.0
-

粤港澳大湾区
（黄埔）国际算法算例大赛
琶洲实验室
2023
感知 / 跨场景单目深度估计
-

感知 / 路侧毫米波雷达标定和目标跟踪
-

2022
感知 / 路侧三维感知算法
-

感知 / 街景图像店面招牌文字识别
-

AI Driving Olympics
ETH Zurich, University of Montreal,Motional
NeurIP2021
Perception / nuScenes Panoptic
11

ICRA2021
Perception / nuScenes Detection
456

Perception / nuScenes Tracking

Prediction / nuScenes Prediction

Perception / nuScenes LiDAR Segmentation

计图 (Jittor)人工智能算法挑战赛
国家自然科学基金委信息科学部
2021
感知 / 交通标志检测
37

KITTI Vision Benchmark Suite
University of Tübingen
2012
Perception / Stereo, Flow, Scene Flow, Depth,
Odometry, Object, Tracking, Road, Semantics
5,610

(back to top)

Perception Datasets

Dataset
Year
Diversity
Sensor
Annotation
Paper

Scenes
Hours
Region
Camera
Lidar
Other

KITTI
2012
50
6
EU
Font-view
✗
GPS & IMU
2D BBox & 3D BBox
Link

Cityscapes 2016
-
-
EU
Font-view
✗

2D Seg
Link

Lost and Found 2016
112
-
-
Font-view
✗

2D Seg
Link

Mapillary
2016
-
-
Global
Street-view
✗

2D Seg
Link

DDD17
2017
36
12
EU
Front-view
✗
GPS & CAN-bus & Event Camera
-
Link

Apolloscape
2016
103
2.5
AS
Front-view
✗
GPS & IMU
3D BBox & 2D Seg
Link

BDD-X
2018
6984
77
NA
Front-view
✗

Language
Link

HDD
2018
-
104
NA
Front-view
✓
GPS & IMU & CAN-bus
2D BBox
Link

IDD
2018
182
-
AS
Front-view
✗

2D Seg
Link

SemanticKITTI
2019
50
6
EU
✗
✓

3D Seg
Link

Woodscape
2019
-
-
Global
360°
✓
GPS & IMU & CAN-bus
3D BBox & 2D Seg
Link

DrivingStereo
2019
42
-
AS
Front-view
✓

-
Link

Brno-Urban
2019
67
10
EU
Front-view
✓
GPS & IMU & Infrared Camera
-
Link

A*3D
2019
-
55
AS
Front-view
✓

3D BBox
Link

Talk2Car
2019
850
283.3
NA
Front-view
✓

Language & 3D BBox
Link

Talk2Nav
2019
10714
-
Sim
360°
✗

Language
Link

PIE
2019
-
6
NA
Front-view
✗

2D BBox
Link

UrbanLoco
2019
13
-
AS & NA
360°
✓
IMU
-
Link

TITAN
2019
700
-
AS
Front-view
✗

2D BBox
Link

H3D
2019
160
0.77
NA
Front-view
✓
GPS & IMU
-
Link

A2D2
2020
-
5.6
EU
360°
✓
GPS & IMU & CAN-bus
3D BBox & 2D Seg
Link

CARRADA
2020
30
0.3
NA
Front-view
✗
Radar
3D BBox
Link

DAWN
2019
-
-
Global
Front-view
✗

2D BBox
Link

4Seasons
2019
-
-
-
Front-view
✗
GPS & IMU
-
Link

UNDD
2019
-
-
-
Front-view
✗

2D Seg
Link

SemanticPOSS
2020
-
-
AS
✗
✓
GPS & IMU
3D Seg
Link

Toronto-3D
2020
4
-
NA
✗
✓

3D Seg
Link

ROAD
2021
22
-
EU
Front-view
✗

2D BBox & Topology
Link

Reasonable Crowd
2021
-
-
Sim
Front-view
✗

Language
Link

METEOR
2021
1250
20.9
AS
Front-view
✗
GPS
Language
Link

PandaSet
2021
179
-
NA
360°
✓
GPS & IMU
3D BBox
Link

MUAD
2022
-
-
Sim
360°
✓

2D Seg& 2D BBox
Link

TAS-NIR
2022
-
-
-
Front-view
✗
Infrared Camera
2D Seg
Link

LiDAR-CS
2022
6
-
Sim
✗
✓

3D BBox
Link

WildDash
2022
-
-
-
Front-view
✗

2D Seg
Link

OpenScene
2023
1000
5.5
AS & NA
360°
✗

3D Occ
Link

ZOD
2023
1473
8.2
EU
360°
✓
GPS & IMU & CAN-bus
3D BBox & 2D Seg
Link

nuScenes
2019
1000
5.5
AS & NA
360°
✓
GPS & CAN-bus & Radar & HDMap
3D BBox & 3D Seg
Link

Argoverse V1
2019
324k
320
NA
360°
✓
HDMap
3D BBox & 3D Seg
Link

Waymo
2019
1000
6.4
NA
360°
✓

2D BBox & 3D BBox
Link

KITTI-360
2020
366
2.5
EU
360°
✓

3D BBox & 3D Seg
Link

ONCE
2021
-
144
AS
360°
✓

3D BBox
Link

nuPlan
2021
-
120
AS & NA
360°
✓

3D BBox
Link

Argoverse V2
2022
1000
4
NA
360°
✓
HDMap
3D BBox
Link

DriveLM
2023
1000
5.5
AS & NA
360°
✗

Language
Link

(back to top)

Mapping Datasets

Dataset
Year
Diversity
Sensor
Annotation
Paper

Scenes
Frames
Camera
Lidar
Type
Space
Inst.
Track

Caltech Lanes
2008
4
1224/1224

✗

PV
✓
✗
Link

VPG
2017
-
20K/20K

✗

PV
✗
-
Link

TUsimple
2017
6.4K
6.4K/128K

✗

PV
✓
✗
Link

CULane
2018
-
133K/133K

✗

PV
✓
-
Link

ApolloScape
2018
235
115K/115K

✓

PV
✗
✗
Link

LLAMAS
2019
14
79K/100K
Front-view Image
✗
Laneline
PV
✓
✗
Link

3D Synthetic
2020
-
10K/10K

✗

PV
✓
-
Link

CurveLanes
2020
-
150K/150K

✗

PV
✓
-
Link

VIL-100
2021
100
10K/10K

✗

PV
✓
✗
Link

OpenLane-V1
2022
1K
200K/200K

✗

3D
✓
✓
Link

ONCE-3DLane
2022
-
211K/211K

✗

3D
✓
-
Link

OpenLane-V2
2023
2K
72K/72K
Multi-view Image
✗
Lane Centerline, Lane Segment
3D
✓
✓
Link

Prediction and Planning Datasets

Subtask
Input
Output
Evaluation
Dataset

Motion Prediction
Surrounding Traffic States
Spatiotemporal Trajectories of Single/Multiple Vehicle(s)
Displacement Error
Argoverse

Trajectory Planning
Motion States for Ego Vehicles, Scenario Cognition and Prediction
Trajectories for Ego Vehicles
Displacement Error, Safety, Compliance, Comfort
nuPlan

CARLA

MetaDrive

Apollo

Path Planning
Maps for Road Network
Routes Connecting to Nodes and Links
Efficiency, Energy Conservation
OpenStreetMap

Transportation Networks

DTAlite

PeMS

New York City Taxi Data

## OpenScene

The Largest up-to-date **3D Occupancy Forecasting** dataset for visual pre-training.

**Quick facts:**
- Task: given the large amount of data, predict the 3D occupancy in the environment.
- Origin dataset: `nuPlan`
- Repo: https://github.com/OpenDriveLab/OpenScene
- Related work: [OccNet](https://github.com/OpenDriveLab/OccNet)
- Related challenge: [3D Occupancy Prediction Challenge 2023](https://opendrivelab.com/AD23Challenge.html#Track3), [Occupancy and Flow AGC Challenge 2024](https://opendrivelab.com/challenge2024/#occupancy_and_flow), [Predictive World Model AGC Challenge 2024](https://opendrivelab.com/challenge2024/#predictive_world_model)

## OpenLane-V2 Update

Flourishing [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) with **Standard Definition (SD) Map and Map Elements**.

**Quick facts:**
- Task: given multi-view images and SD-map (also known as ADAS map) as input, build the driving scene on the fly _without_ the aid of HD-map.
- Repo: https://github.com/OpenDriveLab/OpenLane-V2
- Related work: [OpenLane-V2](https://openreview.net/forum?id=OMOOO3ls6g), [TopoNet](https://github.com/OpenDriveLab/TopoNet), [LaneSegNet](https://github.com/OpenDriveLab/LaneSegNet)
- Related challenge: [Lane Topology Challenge 2023](https://opendrivelab.com/AD23Challenge.html#openlane_topology), [Mapless Driving AGC Challenge 2024](https://opendrivelab.com/challenge2024/#mapless_driving)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/opendrivelab/driveagi

Awesome Lists containing this project

README