Awesome-Multimodal-LLM-Autonomous-Driving

[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving

Last synced: 2 days ago
JSON representation

Awesome Papers
- Datasets
  - BDD-X 2018 - data.berkeley.edu/) | Description | Planning Description & Justification | 8M frames, 20k text strings | **:heavy_check_mark:** |
  - HAD HRI Advice 2019 - ri.com/hdd) | Advice | Goal-oriented & stimulus-driven advice | 5,675 video clips, 45k text strings | **:heavy_check_mark:** |
  - Talk2Car 2019
  - SUTD-TrafficQA 2021 - Collected | QA | QA | 10k frames 62k text strings | **:heavy_check_mark:** |
  - DriveLM 2023
  - MAPLM 2023 - AD/MAPLM) |
  - LingoQA 2023
  - DRAMA 2022 - Collected | Description | QA + Captions | 18k frames, 100k text strings | **:heavy_check_mark:** |
  - nuScenes-QA 2023 - QA](https://github.com/qiantianwen/NuScenes-QA) |
  - Reason2Drive 2023 - text pairs | [Reason2Drive](https://github.com/fudan-zvg/reason2drive) |
  - Rank2Tell 2023 - Collected | QA | Risk Localization and Ranking | 116 video clips (20s each) | [Rank2Tell](https://usa.honda-ri.com/rank2tell) |
- Papers Accepted by WACV 2024 LLVM-AD
- MLLM for Perception & Planning & Control for Autonomous Driving
  - GPT-Driver - 3.5 | Planning | Vision, Language | In-context learning | Text | Trajectory |
  - SurrealDriver - 4 | Planning Control | Language | In-context learning | Text | Text / Action |
  - DriveMLM - Former | Perception Planning | Vision, Language | Training | RGB Image LiDAR Text | Decision State |
  - DriveLM
  - LangProp
  - ChatSim - 4 | Perception (Image Editing) | Image, Language | In-context learning | Vision, Language | Image |
  - VLP
  - Driving with LLMs
  - Talk2BEV - 13b | Perception Planning | Vision, Language | In-context learning | Image Query | Response |
  - GAIA-1 - | Planning | Vision, Language | Pretraining | Video Prompt | Video |
  - LMaZP - 3 Codex | Planning | Language | In-context learning | Text | Plan |
  - Dilu - 3.5 GPT-4 | Planning Control | Language | In-context learning | Text | Action |
  - Receive, Reason, and React - 4 | Planning Control | Language | In-context learning | Text | Action |
  - LanguageMPC - 3.5 | Planning | Language | In-context learning | Text | Action |
  - LaMPilot - 4 / LLaMA-2 / PaLM2 | Planning (Code Generation) | Language | In-context learning | Text | Code as action |
  - LMDrive
  - Drive as You Speak - 4 | Planning | Language | In-context learning | Text | Code |
  - DriveVLM - VL | Planning | Sequence of Images, Language | Training | Vision, Language | Text / Action |
  - Language Agent - 3.5 | Planning | Language | Training | Text | Action |
  - On the Road with GPT-4V(ision) - 4Vision | Perception | Vision, Language | In-context learning | RGB Image Text | Text Description |
  - DriveLLM - 4 | Planning Control | Language | In-context learning | Text | Action |
  - LimSim++ - 4 | Planning | Simulator BEV, Language | In-context learning | Simulator Vision, Language | Text / Action |
  - RAG-Driver - 7B | Planning Control | Video, Language | Training | Vision, Language | Text / Action |
  - Domain Knowledge Distillation from LLMs - 3.5 | Text Generation | Language | In-context learning | Text | Concept |
- Other Survey Papers

Programming Languages

Python 3 HTML 1 JavaScript 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-Multimodal-LLM-Autonomous-Driving

Awesome Papers

Datasets

Papers Accepted by WACV 2024 LLVM-AD

MLLM for Perception & Planning & Control for Autonomous Driving

Other Survey Papers