Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/opendrivelab/elm
[ECCV 2024] Embodied Understanding of Driving Scenarios
https://github.com/opendrivelab/elm
autonomous-driving end-to-end-driving vision-language-model
Last synced: about 15 hours ago
JSON representation
[ECCV 2024] Embodied Understanding of Driving Scenarios
- Host: GitHub
- URL: https://github.com/opendrivelab/elm
- Owner: OpenDriveLab
- Created: 2024-02-22T21:50:05.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-04T07:21:28.000Z (about 2 months ago)
- Last Synced: 2024-12-21T14:23:24.831Z (1 day ago)
- Topics: autonomous-driving, end-to-end-driving, vision-language-model
- Language: Python
- Homepage:
- Size: 5.35 MB
- Stars: 161
- Watchers: 12
- Forks: 13
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Awesome Lists containing this project
README
# ELM: Embodied Understanding of Driving Scenarios
**Revive driving scene understanding by delving into the embodiment philosophy**
![](./assets/teaser.png "Embodied Understanding of Driving Scenarios")
>
> [Yunsong Zhou](https://zhouyunsong.github.io/), [Linyan Huang](https://github.com/DevLinyan), [Qingwen Bu](https://github.com/retsuh-bqw), Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and [Hongyang Li](https://lihongyang.info/)
>
> - Presented by [OpenDriveLab](https://opendrivelab.com/) and Shanghai AI Lab
> - :mailbox_with_mail: Primary contact: [Yunsong Zhou]((https://zhouyunsong-sjtu.github.io/)) ( [email protected] )
> - [arXiv paper](https://arxiv.org/abs/2403.04593) | [Blog TODO]() | [Slides](https://drive.google.com/file/d/1hJ_cElQvGhqCq2GOlx_BnJaK5qumMmvh/view?usp=sharing)
> - [CVPR 2024 Autonomous Driving Challenge - Driving with Language](https://opendrivelab.com/challenge2024/):fire: The first **embodied language model** for understanding the long-horizon driving scenarios in `space` and `time`.
:star2: **ELM** expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.
![method](./assets/elm.png "Architecture of ELM")
:trophy: Interpretable driving model, on the basis of language prompting, will be a main track in the `CVPR 2024 Autonomous Driving Challenge`. Please [stay tuned](https://opendrivelab.com/challenge2024/) for further details!
- :fire: Interpretable driving model is launched. Please refer to the [link](https://opendrivelab.com/challenge2024/) for more details.
- `[2024/03]` ELM [paper](https://arxiv.org/abs/2403.04593) released.
- `[2024/03]` ELM code and data initially released.## Table of Contents
1. [Highlights](#highlights)
2. [News](#news)
3. [TODO List](#todo)
4. [Installation](#installation)
5. [Dataset](#dataset)
6. [Training and Inference](#training)
7. [License and Citation](#license-and-citation)
8. [Related Resources](#resources)- [x] Release fine-tuning code and data
- [x] Release reference checkpoints
- [x] Toolkit for label generation1. (Optional) Creating conda environment
```bash
conda create -n elm python=3.8
conda activate elm
```2. install from [PyPI](https://pypi.org/project/salesforce-lavis/)
```bash
pip install salesforce-lavis
```
3. Or, for development, you may build from source```bash
git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .
```**Pre-training data.** We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D.
Here we provide a sample of 🔗 [YouTube video list](https://docs.google.com/spreadsheets/d/1HV-zOO6bh1sKjimhM1ZBcxWqPxgbalE3FDGyh2UHwPw/edit?usp=sharing) we used.
For privacy considerations, we are temporarily keeping the full-set data labels private. Part of pre-training data and reference checkpoints can be found in :floppy_disk: [google drive](https://drive.google.com/drive/folders/1n4S0A4k8_9yDFIPIPWH_JLTUQ6yFc8ME?usp=sharing).**Fine-tuning data.**
The full set of question and answer pairs for the benchmark can be obtained through this 🔗[data link](https://drive.google.com/drive/folders/1QFBIrKqxjn9lfv31XMC3wVIdaAbpMwDL?usp=sharing). You may need to download the corresponding image data from the official [nuScenes](https://www.nuscenes.org/download) and [Ego4D](https://ego4d-data.org/#download) channels.
For a `quick verification` of the pipeline, we recommend downloading the subset dataset of [DriveLM](https://github.com/OpenDriveLab/DriveLM/blob/main/docs/data_prep_nus.md) and organizing the data in line with the format.Please make sure to soft link `nuScenes` and `ego4d` datasets under `data/xx` folder.
You may need to run `tools/video_clip_processor.py` to pre-process data first.
Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.## Training
```bash
# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh
```## Inference
Modify the [advqa_t5_elm.yaml](lavis/projects/blip2/train/advqa_t5_elm.yaml#L71) to enable the evaluate as True.
```bash
bash scripts/train.sh
```
For the evaluation of generated answers, please use the script in `scripts/qa_eval.py`.
```bash
python scripts/qa_eval.py
```## License and Citation
All assets and code in this repository are under the [Apache 2.0 license](./LICENSE) unless specified otherwise. The language data is under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
```BibTeX
@article{zhou2024embodied,
title={Embodied Understanding of Driving Scenarios},
author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
journal={arXiv preprint arXiv:2403.04593},
year={2024}
}
```We acknowledge all the open-source contributors for the following projects to make this work possible:
- [Lavis](https://github.com/salesforce/LAVIS) | [DriveLM](https://github.com/OpenDriveLab/DriveLM)
- [DriveAGI](https://github.com/OpenDriveLab/DriveAGI) | [Survey on BEV Perception](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) | [Survey on E2EAD](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving)
- [UniAD](https://github.com/OpenDriveLab/UniAD) | [OpenLane-V2](https://github.com/OpenDriveLab/OpenLane-V2) | [OccNet](https://github.com/OpenDriveLab/OccNet) | [OpenScene](https://github.com/OpenDriveLab/OpenScene)