An open API service indexing awesome lists of open source software.

https://github.com/opendrivelab/simscale

Learning to Drive via Real-World Simulation at Scale
https://github.com/opendrivelab/simscale

autonomous-driving co-training data-scaling-laws sim-to-real simulation

Last synced: 7 days ago
JSON representation

Learning to Drive via Real-World Simulation at Scale

Awesome Lists containing this project

README

          



# **Learning to Drive via Real-World Simulation at Scale**

[![Paper](https://img.shields.io/badge/ArXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.23369)
[![Home](https://img.shields.io/badge/project_page-5F259F?style=for-the-badge&logo=homepage&logoColor=white)](https://opendrivelab.com/SimScale/)
[![Hugging Face](https://img.shields.io/badge/hugging_face-ffc107?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/datasets/OpenDriveLab/SimScale)
[![ModelScope](https://img.shields.io/badge/modelscope-624AFF?style=for-the-badge&logo=modelscope&logoColor=white)](https://modelscope.cn/datasets/OpenDriveLab/SimScale)
[![License](https://img.shields.io/badge/Apache--2.0-019B8F?style=for-the-badge&logo=apache)](https://github.com/OpenDriveLab/SimScale/blob/main/LICENSE)





> [Haochen Tian](https://github.com/hctian713),
> [Tianyu Li](https://github.com/sephyli),
> [Haochen Liu](https://georgeliu233.github.io/),
> [Jiazhi Yang](https://github.com/YTEP-ZHI),
> [Yihang Qiu](https://github.com/gihharwtw),
> [Guang Li](https://scholar.google.com/citations?user=McEfO8UAAAAJ&hl=en),
> [Junli Wang](https://openreview.net/profile?id=%7EJunli_Wang4),
> [Yinfeng Gao](https://scholar.google.com/citations?user=VTn0hqIAAAAJ&hl=en),
> [Zhang Zhang](https://scholar.google.com/citations?user=rnRNwEMAAAAJ&hl=en),
> [Liang Wang](https://scholar.google.com/citations?user=8kzzUboAAAAJ&hl=en),
> [Hangjun Ye](https://scholar.google.com/citations?user=68tXhe8AAAAJ&hl=en),
> [Tieniu Tan](https://scholar.google.com/citations?user=W-FGd_UAAAAJ&hl=en),
> [Long Chen](https://long.ooo/),
> [Hongyang Li](https://lihongyang.info/)
>
>
> - πŸ“§ Primary Contact: Haochen Tian (tianhaochen2023@ia.ac.cn)
> - πŸ“œ Materials: 🌐 [𝕏](https://x.com/OpenDriveLab/status/1999507869633527845) | πŸ“° [Media](https://mp.weixin.qq.com/s/OGV3Xlb0bHSSSloG11qFJA) | πŸ—‚οΈ [Slides](https://docs.google.com/presentation/d/17qbsKZU9jdw7MfiPk7hZelaLb3leR2M76gPcMkuf1MI/edit?usp=sharing) | 🎬 [Talk (in Chinese)](https://www.bilibili.com/video/BV1tqrEBNECQ)
> - πŸ–ŠοΈ Joint effort by CASIA, OpenDriveLab at HKU, and Xiaomi EV.

---

## πŸ”₯ Highlights

- πŸ—οΈ A scalable simulation pipepline that synthesizes diverse and high-fidelity reactive driving scenarios with pseudo-expert demonstrations.
- πŸš€ An effective sim-real co-training strategy that improves robustness and generalization synergistically across various end-to-end planners.
- πŸ”¬ A comprehensive recipe that reveals crucial insights into the underlying scaling properties of sim-real learning systems for end-to-end autonomy.

## πŸ“’ News
- **`[2025/1/16]`** We released the data and models on πŸ‘Ύ ModelScope to better serve users in China.
- **`[2026/1/6]`** We released the code **v1.0**.
- **`[2025/12/31]`** We released the data and models **v1.0** on πŸ€— Hugging Face. Happy New Year ! πŸŽ„
- **`[2025/12/1]`** We released our [paper](https://arxiv.org/abs/2511.23369) on arXiv.

## πŸ“‹ TODO List
- [x] More Visualization Results.
- [x] Future Sensors Data.
- [x] Sim-Real Co-training Code release (Jan. 2026).
- [x] Simulation Data release (Dec. 2025).
- [x] Checkpoints release (Dec. 2025).

---

## πŸ“Œ Table of Contents

- πŸ›οΈ [Model Zoo](#%EF%B8%8F-model-zoo)
- 🎯 [Getting Started](#-getting-started)
- πŸ“¦ [Data Preparation](#-data-preparation)
- [Download Dataset](#1-download-dataset)
- [Set Up Configuration](#2-set-up-configuration)
- βš™οΈ [Sim-Real Co-Training](#%EF%B8%8F-sim-real-co-training-recipe)
- [Co-Training with Pseudo-Expert](#co-training-with-pseudo-expert)
- [Co-Training with Rewards Only](#co-training-with-rewards-only)
- πŸ” [Inference](#-inference)
- [NAVSIM v2 navhard](#navsim-v2-navhard)
- [NAVSIM v2 navtest](#navsim-v2-navtest)
- ⭐ [License and Citation](#-license-and-citation)

## πŸ›οΈ Model Zoo


Model
Backbone
Sim-Real Config
NAVSIM v2 navhard
NAVSIM v2 navtest


EPDMS
CKPT
EPDMS
CKPT



LTF
ResNet34
w/ pseudo-expert
30.3 | +6.9
HF /
MS
84.4 | +2.9
HF /
MS



DiffusionDrive
ResNet34
w/ pseudo-expert
32.6 | +5.1
HF /
MS
85.9 | +1.7
HF /
MS



GTRS-Dense
ResNet34
w/ pseudo-expert
46.1 | +7.8
HF /
MS
84.0 | +1.7
HF /
MS


rewards only
46.9 | +8.6
HF /
MS
84.6 | +2.3
HF /
MS


V2-99
w/ pseudo-expert
47.7 | +5.8
HF /
MS
84.5 | +0.5
HF /
MS


rewards only
48.0 | +6.1
HF /
MS
84.8 | +0.8
HF /
MS

> [!NOTE]
> We fixed a minor error in the simulation process without changing the method, resulting in better performance than the numbers reported in the arXiv version. We will update the arXiv paper soon.

## 🎯 Getting Started

### 1. Clone SimScale Repo

```bash
git clone https://github.com/OpenDriveLab/SimScale.git
cd SimScale
```

### 2. Create Environment

```bash
conda env create --name simscale -f environment.yml
conda activate simscale
pip install -e .
```

## πŸ“¦ Data Preparation

Our released simulation data is based on [nuPlan](https://www.nuscenes.org/nuplan) and [NAVSIM](https://github.com/autonomousvision/navsim). **We recommend first preparing the real-world data by following the instructions in [Download NAVSIM](https://github.com/autonomousvision/navsim/blob/main/docs/install.md#2-download-the-dataset). If you plan to use GTRS, please directly refer [Download NAVSIM](./docs/install.md#2-download-the-dataset).**

### 1. Download Dataset

We provide πŸ€— [Script (Hugging Face)](./tools/download_hf.sh) and πŸ‘Ύ [Script (ModelScope)](./tools/download_ms) (users in China) for downloading the simulation data .

Our simulation data format follows that of [OpenScene](https://github.com/OpenDriveLab/OpenScene/blob/main/docs/getting_started.md#download-data), with each clip/log has a fixed temporal horizon of 6 seconds at 2 Hz (2 s history + 4 s future), which are stored separately in `sensor_blobs_hist` and `sensor_blobs_fut`, respectively.
**For policy training, `sensor_blobs_hist` alone is sufficient.**

#### πŸ“Š Overview Table of Simulated Synthetic Data


Split / Sim. Round
# Tokens
Logs
Sensors_Hist
Sensors_Fut
Link


Planner-based Pseudo-Expert


reaction_pdm_v1.0-0
65K
9.9GB
569GB
1.2T
HF+
HF_Fut /
MS


reaction_pdm_v1.0-1
55K
8.5GB
448GB
964GB
HF+
HF_Fut /
MS


reaction_pdm_v1.0-2
46K
6.9GB
402GB
801GB
HF+
HF_Fut /
MS


reaction_pdm_v1.0-3
38K
5.6GB
333GB
663GB
HF+
HF_Fut /
MS


reaction_pdm_v1.0-4
32K
4.7GB
279GB
554GB
HF+
HF_Fut /
MS

Recovery-based Pseudo-Expert


reaction_recovery_v1.0-0
45K
6.8GB
395GB
789GB
HF+
HF_Fut /
MS


reaction_recovery_v1.0-1
36K
5.5GB
316GB
631GB
HF+
HF_Fut /
MS


reaction_recovery_v1.0-2
28K
4.3GB
244GB
488GB
HF+
HF_Fut /
MS


reaction_recovery_v1.0-3
22K
3.3GB
189GB
378GB
HF+
HF_Fut /
MS


reaction_recovery_v1.0-4
17K
2.7GB
148GB
296GB
HF+
HF_Fut /
MS

> [!TIP]
> Before downloading, we recommend checking the table above to select the appropriate split and `sensor_blobs`.

#### 🏭 Simulation Data Pipeline





#### 🧩 Examples of Simulated Synthetic Data



5c9694f15f9c5537





367cfa28901257ee





d37c49db3dcd59fa







Sim. 1




Sim. 2




Sim. 3





Sim. 1




Sim. 2




Sim. 3





Sim. 1




Sim. 2




Sim. 3




### 2. Set Up Configuration
We provide a [Script](./tools/move.sh) for moving the download simulation data to create the following structure.

```angular2html
navsim_workspace/
β”œβ”€β”€ simscale/
β”œβ”€β”€ exp/
└── dataset/
β”œβ”€β”€ maps/
β”œβ”€β”€ navsim_logs/
β”‚ β”œβ”€β”€ test/
β”‚ β”œβ”€β”€ trainval/
β”‚ β”œβ”€β”€ synthetic_reaction_pdm_v1.0-*/
β”‚ β”‚ β”œβ”€β”€ [log]-00*.pkl
β”‚ β”‚ └── ...
β”‚ └── synthetic_reaction_recovery_v1.0-*/
β”œβ”€β”€ sensor_blobs/
β”‚ β”œβ”€β”€ test/
β”‚ β”œβ”€β”€ trainval/
β”‚ β”œβ”€β”€ synthetic_reaction_pdm_v1.0-*/
β”‚ β”‚ └── [token]-00*/
β”‚ β”‚ β”œβ”€β”€ CAM_B0/
β”‚ β”‚ └── ...
β”‚ └── synthetic_reaction_recovery_v1.0-*/
└── navhard_two_stage/
```

## βš™οΈ Sim-Real Co-Training Recipe

### Preparation
1. Refer the [Script](./scripts/training/run_dataset_cache.sh) to cache the real-world and simulation data.
2. Download pretrained image backbone weight, [ResNet34](https://huggingface.co/timm/resnet34.a1_in1k) or [V2_99](https://drive.google.com/file/d/1gQkhWERCzAosBwG5bh2BKkt1k0TJZt-A/view).

### Co-Training with Pseudo-Expert
We provide [Scripts](./scripts/training) for sim–real co-training,
*e.g.*, [run_diffusiondrive_training_syn.sh](./scripts/training/run_diffusiondrive_training_syn.sh).

The main configuration options are as follows:
```bash
export SYN_IDX=0 # 0, 1, 2, 3, 4
export SYN_GT=pdm # pdm, recovery
```
- `SYN_IDX` specifies which rounds of simulation data are included; *e.g.*, `SYN_IDX=2` means that rounds 0, 1, and 2 will be used.
- `SYN_GT` specifies the type of pseudo-expert used for supervision.

In addition, the cache path for simulation data is hard-coded in [dataset.py#136](./navsim/planning/training/dataset.py#L136). Please make sure the path is correctly set to your local simulation data directory before training.
- **Regression-based Policy | *LTF***

We provide a [Script](./scripts/training/run_transfuser_training_syn.sh) to train LTF with 8 GPUs for 100 epochs.

- **Diffusion-based Policy | *DiffusionDrive***

We provide a [Script](./scripts/training/run_diffusiondrive_training_syn.sh) to train DiffusionDrive with 8 GPUs for 100 epochs.

- **Scoring-based Policy | *GTRS-Dense***

We provide a [Script](./scripts/training/run_gtrs_dense_training_multi_syn.sh) to train GTRS_Dense on 4 nodes, each with 8 GPUs, for 50 epochs.

We also provide πŸ€— [Reward Files (Hugging Face)](https://huggingface.co/datasets/OpenDriveLab/SimScale/tree/main/SimScale_rewards) and πŸ‘Ύ [Reward Files (ModelScope)](https://www.modelscope.cn/datasets/OpenDriveLab/SimScale/tree/master/SimScale_rewards) (users in China) for rewards in simulation data. Please download correspending files first and move them to `NAVSIM_TRAJPDM_ROOT/sim`. The reward files path is hard-coded in [gtrs_agent.py#223](./navsim/agents/gtrs_dense/gtrs_agent.py#223). Check it before training.

### Co-Training with Rewards Only

- **Scoring-based Policy | *GTRS-Dense***

It uses the same training
[Script](./scripts/training/run_gtrs_dense_training_multi_syn.sh),
to train GTRS_Dense on 4 nodes, each with 8 GPUs, for 50 epochs.

The main configuration option is as follows:
```bash
syn_imi=false # true, false
```
- `syn_imi`: When set to `false`, the imitation learning loss is disabled for simulation data, while it remains enabled for real-world data.
## πŸ” Inference

### Preparation

Refer the [Script](./scripts/evaluation/run_metric_caching.sh) to cache metric first.

### NAVSIM v2 navhard

We provide [Scripts](./scripts/evaluation_navhard) to evaluate three policies on [navhard](./navsim/planning/script/config/common/train_test_split/scene_filter/navhard_two_stage.yaml) using GPU inference.

### NAVSIM v2 navtest

We provide [Scripts](./scripts/evaluation_navtest) to evaluate three policies on [navtest](./navsim/planning/script/config/common/train_test_split/scene_filter/navtest.yaml) using GPU inference.

## ❀️ Acknowledgements

We acknowledge all the open-source contributors for the following projects to make this work possible:

- [NAVSIM](https://github.com/autonomousvision/navsim) | [MTGS](https://github.com/OpenDriveLab/MTGS) | [GTRS](https://github.com/NVlabs/GTRS) | [DiffusionDrive](https://github.com/hustvl/DiffusionDrive)

## ⭐ License and Citation

All content in this repository is under the [Apache-2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
The released data is based on [nuPlan](https://www.nuscenes.org/nuplan) and is under the [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.

If any parts of our paper and code help your research, please consider citing us and giving a star to our repository.

```bibtex
@article{tian2025simscale,
title={SimScale: Learning to Drive via Real-World Simulation at Scale},
author={Haochen Tian and Tianyu Li and Haochen Liu and Jiazhi Yang and Yihang Qiu and Guang Li and Junli Wang and Yinfeng Gao and Zhang Zhang and Liang Wang and Hangjun Ye and Tieniu Tan and Long Chen and Hongyang Li},
journal={arXiv preprint arXiv:2511.23369},
year={2025}
}
```