An open API service indexing awesome lists of open source software.

https://github.com/OpenDriveLab/DriveLM

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
https://github.com/OpenDriveLab/DriveLM

autonomous-driving chain-of-thought graph-of-thoughts large-language-models llm prompt-engineering prompting tree-of-thoughts vision-language

Last synced: about 1 month ago
JSON representation

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

Awesome Lists containing this project

README

        





**DriveLM:** *Driving with **G**raph **V**isual **Q**uestion **A**nswering*

`Autonomous Driving Challenge 2024` **Driving-with-Language** [Leaderboard](https://opendrivelab.com/challenge2024/#driving_with_language).

[![](https://img.shields.io/badge/Project%20Page-8A2BE2)](https://opendrivelab.com/DriveLM/)
[![License: Apache2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](#licenseandcitation)
[![arXiv](https://img.shields.io/badge/arXiv-2312.14150-b31b1b.svg)](https://arxiv.org/abs/2312.14150)
[![](https://img.shields.io/badge/Latest%20release-v1.1-yellow)](#gettingstarted)
[![Hugging Face](https://img.shields.io/badge/Test%20Server-%F0%9F%A4%97-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/spaces/AGC2024/driving-with-language-official)

https://github.com/OpenDriveLab/DriveLM/assets/54334254/cddea8d6-9f6e-4e7e-b926-5afb59f8dce2

## Highlights

🔥 We instantiate datasets (**DriveLM-Data**) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (**DriveLM-Agent**) for jointly performing **Graph VQA** and end-to-end driving.

🏁 **DriveLM** serves as a main track in the [**`CVPR 2024 Autonomous Driving Challenge`**](https://opendrivelab.com/challenge2024/#driving_with_language). Everything you need for the challenge is [HERE](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge), including baseline, test data and submission format and evaluation pipeline!



## News

- **`[2025/01/08]`** [Drive-Bench](https://drive-bench.github.io/) release! In-depth analysis in what are DriveLM really benchmarking. Take a look at [arxiv](https://arxiv.org/pdf/2501.04003).
- **`[2024/07/16]`** DriveLM [official leaderboard](https://huggingface.co/spaces/AGC2024/driving-with-language-official) reopen!
- **`[2024/07/01]`** DriveLM got accepted to ECCV 2024! Congrats to the team!
- **`[2024/06/01]`** Challenge ended up! [See the final leaderboard](https://opendrivelab.com/challenge2024/#driving_with_language).
- **`[2024/03/25]`** Challenge test server is online and the test questions are released. [Chekc it out!](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge)
- **`[2024/02/29]`** Challenge repo release. Baseline, data and submission format, evaluation pipeline. [Have a look!](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge)
- **`[2023/08/25]`** DriveLM-nuScenes demo released.
- **`[2023/12/22]`** DriveLM-nuScenes full `v1.0` and [paper](https://arxiv.org/abs/2312.14150) released.

## Table of Contents
1. [Highlights](#highlight)
2. [Getting Started](#gettingstarted)
- [Prepare DriveLM-nuScenes](docs/data_prep_nus.md)
3. [Current Endeavors and Future Horizons](#timeline)
4. [TODO List](#newsandtodolist)
5. [DriveLM-Data](#drivelmdata)
- [Comparison and Stats](#comparison)
- [GVQA Details](docs/gvqa.md)
- [Annotation and Features](docs/data_details.md)
6. [License and Citation](#licenseandcitation)
7. [Other Resources](#otherresources)

## Getting Started
To get started with DriveLM:
- [Prepare DriveLM-nuScenes](/docs/data_prep_nus.md)
- [Challenge devkit](/challenge/)
- [More content coming soon](#todolist)

(back to top)

## Current Endeavors and Future Directions
> - The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
> - Date below reflects the arXiv submission date.
> - If there is any missing work, please reach out to us!



DriveLM attempts to address some of the challenges faced by the community.

- **Lack of data**: DriveLM-Data serves as a comprehensive benchmark for driving with language.
- **Embodiment**: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
- **Closed-loop**: DriveLM-CARLA attempts to explore closed-loop planning with language.

(back to top)

## TODO List

- [x] DriveLM-Data
- [x] DriveLM-nuScenes
- [x] DriveLM-CARLA
- [x] DriveLM-Metrics
- [x] GPT-score
- [ ] DriveLM-Agent
- [x] Inference code on DriveLM-nuScenes
- [ ] Inference code on DriveLM-CARLA

(back to top)

## DriveLM-Data

We facilitate the `Perception, Prediction, Planning, Behavior, Motion` tasks with human-written reasoning logic as a connection between them. We propose the task of [GVQA](docs/gvqa.md) on the DriveLM-Data.

### 📊 Comparison and Stats
**DriveLM-Data** is the *first* language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.



Links to details about [GVQA task](docs/gvqa.md), [Dataset Features](docs/data_details.md/#features), and [Annotation](docs/data_details.md/#annotation).

(back to top)

## License and Citation
All assets and code in this repository are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. The language data is under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

```BibTeX
@article{sima2023drivelm,
title={DriveLM: Driving with Graph Visual Question Answering},
author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
journal={arXiv preprint arXiv:2312.14150},
year={2023}
}
```

```BibTeX
@misc{contributors2023drivelmrepo,
title={DriveLM: Driving with Graph Visual Question Answering},
author={DriveLM contributors},
howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
year={2023}
}
```

(back to top)

## Other Resources

Twitter Follow

**OpenDriveLab**
- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [OccNet](https://github.com/OpenDriveLab/OccNet)


Twitter Follow

**Autonomous Vision Group**
- [tuPlan garage](https://github.com/autonomousvision/tuplan_garage) | [CARLA garage](https://github.com/autonomousvision/carla_garage) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [PlanT](https://github.com/autonomousvision/plant) | [KING](https://github.com/autonomousvision/king) | [TransFuser](https://github.com/autonomousvision/transfuser) | [NEAT](https://github.com/autonomousvision/neat)

(back to top)