https://github.com/commaai/commavq

commaVQ is a dataset of compressed driving video
https://github.com/commaai/commavq

Last synced: 8 days ago
JSON representation

commaVQ is a dataset of compressed driving video

Host: GitHub
URL: https://github.com/commaai/commavq
Owner: commaai
License: mit
Created: 2023-06-27T02:16:50.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2025-06-11T00:12:24.000Z (4 months ago)
Last Synced: 2025-06-11T01:25:39.997Z (4 months ago)
Language: Jupyter Notebook
Size: 76.6 MB
Stars: 307
Watchers: 19
Forks: 54
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

awesome-ChatGPT-repositories - commavq - commaVQ is a dataset of compressed driving video (Others)

README

commaVQ challenge

Leaderboard
·
comma.ai/jobs
·
Discord
·
X

| Source Video | Compressed Video | Future Prediction |
| --------------- | ---------------- |------------------ |
| | | |

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains:
- encoder/decoder models used to heavily compress driving scenes
- a world model trained on 3,000,000 minutes of driving videos
- a dataset of 100,000 minutes of compressed driving videos

# Task

## Lossless compression challenge: make me smaller! $500 challenge
Losslessly compress 5,000 minutes of driving video "tokens". Go to [./compression/](./compression/) to start

**Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE**

Submit a single zip file containing the compressed data and a python script to decompress it into its original form using [this form](https://forms.gle/US88Hg7UR6bBuW3BA). Top solutions are listed on [comma's official leaderboard](https://comma.ai/leaderboard).

score

name

method

3.4

szabolcs-cs

self-compressing neural network

2.9

BradyWynn

arithmetic coding with GPT

2.6

pkourouklidis

👑

arithmetic coding with GPT

2.3

anonymous

zpaq

2.3

rostislav

zpaq

2.2

anonymous

zpaq

2.2

anonymous

zpaq

2.2

0x41head

zpaq

2.2

tillinf

zpaq

2.2

nuniesmith

zpaq

1.6

baseline

lzma

## Overview
A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

## Examples
[./notebooks/encode.ipynb](./notebooks/encode.ipynb) and [./notebooks/decode.ipynb](./notebooks/decode.ipynb) for an example of how to visualize the dataset using a segment of driving video from [comma's drive to Taco Bell](https://blog.comma.ai/taco-bell/)

[./notebooks/gpt.ipynb](./notebooks/gpt.ipynb) for an example of how to use the world model to imagine future frames.

[./compression/compress.py](./compression/compress.py) for an example of how to compress the tokens using lzma

## Download the dataset
- Using huggingface datasets
```python
import numpy as np
from datasets import load_dataset
# load the first shard
data_files = {'train': ['data-0000.tar.gz']}
ds = load_dataset('commaai/commavq', data_files=data_files)
tokens = np.array(ds['train'][0]['token.npy'])
poses = np.array(ds['train'][0]['pose.npy'])
```
- Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

## References
[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/commaai/commavq

Awesome Lists containing this project

README

commaVQ challenge

Leaderboard
·
comma.ai/jobs
·
Discord
·
X

https://github.com/commaai/commavq

Awesome Lists containing this project

README

commaVQ challenge

Leaderboard · comma.ai/jobs · Discord · X

Leaderboard
·
comma.ai/jobs
·
Discord
·
X