https://github.com/OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
https://github.com/OpenDriveLab/DriveLM
autonomous-driving chain-of-thought graph-of-thoughts large-language-models llm prompt-engineering prompting tree-of-thoughts vision-language
Last synced: about 1 month ago
JSON representation
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
- Host: GitHub
- URL: https://github.com/OpenDriveLab/DriveLM
- Owner: OpenDriveLab
- License: apache-2.0
- Created: 2023-08-08T12:07:33.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-04T16:56:25.000Z (about 1 month ago)
- Last Synced: 2025-03-10T14:11:26.934Z (about 1 month ago)
- Topics: autonomous-driving, chain-of-thought, graph-of-thoughts, large-language-models, llm, prompt-engineering, prompting, tree-of-thoughts, vision-language
- Language: HTML
- Homepage: https://opendrivelab.com/DriveLM/
- Size: 274 MB
- Stars: 998
- Watchers: 22
- Forks: 65
- Open Issues: 21
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
- Awesome-Reasoning-Foundation-Models - [Code
- Awesome-Multimodal-LLM-Autonomous-Driving - DriveLM 2023
- awesome-knowledge-driven-AD - DriveLM - Score | (:books: Papers / Dataset \& Benchmark)
- Awesome-LLM4AD - DriveLM: Drive on Language
README
![]()
**DriveLM:** *Driving with **G**raph **V**isual **Q**uestion **A**nswering*`Autonomous Driving Challenge 2024` **Driving-with-Language** [Leaderboard](https://opendrivelab.com/challenge2024/#driving_with_language).
[](https://opendrivelab.com/DriveLM/)
[](#licenseandcitation)
[](https://arxiv.org/abs/2312.14150)
[](#gettingstarted)
[](https://huggingface.co/spaces/AGC2024/driving-with-language-official)https://github.com/OpenDriveLab/DriveLM/assets/54334254/cddea8d6-9f6e-4e7e-b926-5afb59f8dce2
🔥 We instantiate datasets (**DriveLM-Data**) built upon nuScenes and CARLA, and propose a VLM-based baseline approach (**DriveLM-Agent**) for jointly performing **Graph VQA** and end-to-end driving.
🏁 **DriveLM** serves as a main track in the [**`CVPR 2024 Autonomous Driving Challenge`**](https://opendrivelab.com/challenge2024/#driving_with_language). Everything you need for the challenge is [HERE](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge), including baseline, test data and submission format and evaluation pipeline!
![]()
- **`[2025/01/08]`** [Drive-Bench](https://drive-bench.github.io/) release! In-depth analysis in what are DriveLM really benchmarking. Take a look at [arxiv](https://arxiv.org/pdf/2501.04003).
- **`[2024/07/16]`** DriveLM [official leaderboard](https://huggingface.co/spaces/AGC2024/driving-with-language-official) reopen!
- **`[2024/07/01]`** DriveLM got accepted to ECCV 2024! Congrats to the team!
- **`[2024/06/01]`** Challenge ended up! [See the final leaderboard](https://opendrivelab.com/challenge2024/#driving_with_language).
- **`[2024/03/25]`** Challenge test server is online and the test questions are released. [Chekc it out!](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge)
- **`[2024/02/29]`** Challenge repo release. Baseline, data and submission format, evaluation pipeline. [Have a look!](https://github.com/OpenDriveLab/DriveLM/tree/main/challenge)
- **`[2023/08/25]`** DriveLM-nuScenes demo released.
- **`[2023/12/22]`** DriveLM-nuScenes full `v1.0` and [paper](https://arxiv.org/abs/2312.14150) released.## Table of Contents
1. [Highlights](#highlight)
2. [Getting Started](#gettingstarted)
- [Prepare DriveLM-nuScenes](docs/data_prep_nus.md)
3. [Current Endeavors and Future Horizons](#timeline)
4. [TODO List](#newsandtodolist)
5. [DriveLM-Data](#drivelmdata)
- [Comparison and Stats](#comparison)
- [GVQA Details](docs/gvqa.md)
- [Annotation and Features](docs/data_details.md)
6. [License and Citation](#licenseandcitation)
7. [Other Resources](#otherresources)## Getting Started
To get started with DriveLM:
- [Prepare DriveLM-nuScenes](/docs/data_prep_nus.md)
- [Challenge devkit](/challenge/)
- [More content coming soon](#todolist)## Current Endeavors and Future Directions
> - The advent of GPT-style multimodal models in real-world applications motivates the study of the role of language in driving.
> - Date below reflects the arXiv submission date.
> - If there is any missing work, please reach out to us!
![]()
DriveLM attempts to address some of the challenges faced by the community.
- **Lack of data**: DriveLM-Data serves as a comprehensive benchmark for driving with language.
- **Embodiment**: GVQA provides a potential direction for embodied applications of LLMs / VLMs.
- **Closed-loop**: DriveLM-CARLA attempts to explore closed-loop planning with language.- [x] DriveLM-Data
- [x] DriveLM-nuScenes
- [x] DriveLM-CARLA
- [x] DriveLM-Metrics
- [x] GPT-score
- [ ] DriveLM-Agent
- [x] Inference code on DriveLM-nuScenes
- [ ] Inference code on DriveLM-CARLAWe facilitate the `Perception, Prediction, Planning, Behavior, Motion` tasks with human-written reasoning logic as a connection between them. We propose the task of [GVQA](docs/gvqa.md) on the DriveLM-Data.
### 📊 Comparison and Stats
**DriveLM-Data** is the *first* language-driving dataset facilitating the full stack of driving tasks with graph-structured logical dependencies.
![]()
Links to details about [GVQA task](docs/gvqa.md), [Dataset Features](docs/data_details.md/#features), and [Annotation](docs/data_details.md/#annotation).
## License and Citation
All assets and code in this repository are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. The language data is under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.```BibTeX
@article{sima2023drivelm,
title={DriveLM: Driving with Graph Visual Question Answering},
author={Sima, Chonghao and Renz, Katrin and Chitta, Kashyap and Chen, Li and Zhang, Hanxue and Xie, Chengen and Luo, Ping and Geiger, Andreas and Li, Hongyang},
journal={arXiv preprint arXiv:2312.14150},
year={2023}
}
``````BibTeX
@misc{contributors2023drivelmrepo,
title={DriveLM: Driving with Graph Visual Question Answering},
author={DriveLM contributors},
howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
year={2023}
}
```**OpenDriveLab**
- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [BEVFormer](https://github.com/fundamentalvision/BEVFormer) | [OccNet](https://github.com/OpenDriveLab/OccNet)**Autonomous Vision Group**
- [tuPlan garage](https://github.com/autonomousvision/tuplan_garage) | [CARLA garage](https://github.com/autonomousvision/carla_garage) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [PlanT](https://github.com/autonomousvision/plant) | [KING](https://github.com/autonomousvision/king) | [TransFuser](https://github.com/autonomousvision/transfuser) | [NEAT](https://github.com/autonomousvision/neat)