Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Multimodal-LLM-Autonomous-Driving
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
Last synced: 3 days ago
JSON representation
-
Awesome Papers
-
Datasets
- BDD-X 2018 - data.berkeley.edu/) | Description | Planning Description & Justification | 8M frames, 20k text strings | **:heavy_check_mark:** |
- HAD HRI Advice 2019 - ri.com/hdd) | Advice | Goal-oriented & stimulus-driven advice | 5,675 video clips, 45k text strings | **:heavy_check_mark:** |
- Talk2Car 2019
- SUTD-TrafficQA 2021 - Collected | QA | QA | 10k frames 62k text strings | **:heavy_check_mark:** |
- DriveLM 2023
- MAPLM 2023 - AD/MAPLM) |
- LingoQA 2023
- DRAMA 2022 - Collected | Description | QA + Captions | 18k frames, 100k text strings | **:heavy_check_mark:** |
- nuScenes-QA 2023 - QA](https://github.com/qiantianwen/NuScenes-QA) |
- Reason2Drive 2023 - text pairs | [Reason2Drive](https://github.com/fudan-zvg/reason2drive) |
- Rank2Tell 2023 - Collected | QA | Risk Localization and Ranking | 116 video clips (20s each) | [Rank2Tell](https://usa.honda-ri.com/rank2tell) |
-
Papers Accepted by WACV 2024 LLVM-AD
- A Survey on Multimodal Large Language Models for Autonomous Driving
- Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles
- VLAAD: Vision and Language Assistant for Autonomous Driving
- A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs
- Human-Centric Autonomous Systems With LLMs for User Command Reasoning
- NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
- Latency Driven Spatially Sparse Optimization for Multi-Branch CNNs for Semantic Segmentation
- LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
- A Survey on Multimodal Large Language Models for Autonomous Driving
- A Game of Bundle Adjustment - Learning Efficient Convergence
- Human-Centric Autonomous Systems With LLMs for User Command Reasoning
- NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
- Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
-
MLLM for Perception & Planning & Control for Autonomous Driving
- GPT-Driver - 3.5 | Planning | Vision, Language | In-context learning | Text | Trajectory |
- SurrealDriver - 4 | Planning Control | Language | In-context learning | Text | Text / Action |
- DriveMLM - Former | Perception Planning | Vision, Language | Training | RGB Image LiDAR Text | Decision State |
- DriveLM
- LangProp
- ChatSim - 4 | Perception (Image Editing) | Image, Language | In-context learning | Vision, Language | Image |
- VLP
- Driving with LLMs
- Talk2BEV - 13b | Perception Planning | Vision, Language | In-context learning | Image Query | Response |
- GAIA-1 - | Planning | Vision, Language | Pretraining | Video Prompt | Video |
- LMaZP - 3 Codex | Planning | Language | In-context learning | Text | Plan |
- Dilu - 3.5 GPT-4 | Planning Control | Language | In-context learning | Text | Action |
- Receive, Reason, and React - 4 | Planning Control | Language | In-context learning | Text | Action |
- LanguageMPC - 3.5 | Planning | Language | In-context learning | Text | Action |
- LaMPilot - 4 / LLaMA-2 / PaLM2 | Planning (Code Generation) | Language | In-context learning | Text | Code as action |
- LMDrive
- Drive as You Speak - 4 | Planning | Language | In-context learning | Text | Code |
- DriveVLM - VL | Planning | Sequence of Images, Language | Training | Vision, Language | Text / Action |
- Language Agent - 3.5 | Planning | Language | Training | Text | Action |
- On the Road with GPT-4V(ision) - 4Vision | Perception | Vision, Language | In-context learning | RGB Image Text | Text Description |
- DriveLLM - 4 | Planning Control | Language | In-context learning | Text | Action |
- LimSim++ - 4 | Planning | Simulator BEV, Language | In-context learning | Simulator Vision, Language | Text / Action |
- RAG-Driver - 7B | Planning Control | Video, Language | Training | Vision, Language | Text / Action |
- Domain Knowledge Distillation from LLMs - 3.5 | Text Generation | Language | In-context learning | Text | Concept |
-
Other Survey Papers
- A Survey on Autonomous Driving Datasets: Data Statistic, Annotation, and Outlook
- LLM4Drive: A Survey of Large Language Models for Autonomous Driving
- Towards Knowledge-driven Autonomous Driving
- Applications of Large Scale Foundation Models for Autonomous Driving
- Vision Language Models in Autonomous Driving and Intelligent Transportation Systems - Language Models for Transportation Systems |
- Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies - Loop Autonomous Driving |
- A Survey for Foundation Models in Autonomous Driving
-
Programming Languages
Categories
Sub Categories
Keywords
vision-language
1
tree-of-thoughts
1
prompting
1
prompt-engineering
1
llm
1
large-language-models
1
graph-of-thoughts
1
chain-of-thought
1
autonomous-driving
1
vqa-dataset
1
vqa
1
video-reasoning
1
video-qa
1
traffic-events
1
paper
1
multimodal-deep-learning
1
multimodal
1
dataset
1
cvpr2021
1
cvpr
1
annotations
1