https://github.com/willguimont/torch_waymo
PyTorch dataloader for Waymo Open Dataset
https://github.com/willguimont/torch_waymo
dataloader dataset point-cloud pytorch torch waymo
Last synced: 2 months ago
JSON representation
PyTorch dataloader for Waymo Open Dataset
- Host: GitHub
- URL: https://github.com/willguimont/torch_waymo
- Owner: willGuimont
- License: mit
- Created: 2023-01-30T21:12:04.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-12-20T00:11:51.000Z (over 2 years ago)
- Last Synced: 2025-08-16T22:25:04.411Z (10 months ago)
- Topics: dataloader, dataset, point-cloud, pytorch, torch, waymo
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 14
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# torch_waymo
Load Waymo Open Dataset in PyTorch
Cite this repository:
```
@software{Guimont-Martin_A_PyTorch_dataloader_2023,
author = {Guimont-Martin, William},
month = {1},
title = {{A PyTorch dataloader for Waymo Open Dataset}},
version = {0.1.1},
year = {2023}
}
```
## Usage
Requires:
- Python < 3.10
### Download the dataset
```shell
# Login to gcloud
gcloud auth login
# Download the full dataset
cd
gsutil -m cp -r \
"gs://waymo_open_dataset_v_1_4_1/individual_files/training" \
"gs://waymo_open_dataset_v_1_4_1/individual_files/validation" \
.
```
### Convert it
```shell
# Make a tf venv with Python < 3.10
python3.9 -m venv venv_tf
source venv_tf/bin/activate
# We recommend using uv for faster installs
pip install uv
uv pip install "torch_waymo[waymo]"
# Convert all splits (FULL frames with images & lasers) -> writes to /converted
torch-waymo-convert --dataset
# Convert only training split (FULL frames)
torch-waymo-convert --dataset --splits training
# Convert multiple splits (FULL frames)
torch-waymo-convert --dataset --splits training validation
# (NEW) Convert to SIMPLIFIED frames (no camera images stored, point cloud + labels only)
# Writes to /converted_simplified
torch-waymo-convert --dataset --simplified
# Simplified + specific splits
torch-waymo-convert --dataset --simplified --splits training validation
```
### Load it in your project
Now that the dataset is converted, you don't have to depend on `waymo-open-dataset-tf-2-11-0` in your downstream project.
You can simply install `torch_waymo` in your *runtime* environment.
```shell
pip install torch_waymo
```
Example usage:
train_dataset = WaymoDataset('~/Datasets/Waymo/converted_simplified', 'training')
Example usage (Full conversion):
```python
from torch_waymo import WaymoDataset
# Simplified frames (no images, only point clouds + labels)
train_dataset = WaymoDataset('~/Datasets/Waymo/converted_simplified', 'training')
for i in range(10):
# frame is of type SimplifiedFrame
frame = train_dataset[i]
print(frame.timestamp_micros)
print(frame.timestamp_micros, len(frame.lasers))
# Full frames (with images)
train_dataset = WaymoDataset('~/Datasets/Waymo/converted', 'training')
for i in range(10):
# frame is of type Frame
frame = train_dataset[i]
print(frame.timestamp_micros)
print(frame.timestamp_micros, len(frame.images))
```
Notes:
- Paths with `~` are supported; they will expand to your home directory.
- `len.pkl` inside each split directory stores cumulative frame counts for indexing.
- If you re-run conversion, existing frames are skipped (idempotent per frame file).