https://github.com/commaai/commavq
commaVQ is a dataset of compressed driving video
https://github.com/commaai/commavq
Last synced: 8 days ago
JSON representation
commaVQ is a dataset of compressed driving video
- Host: GitHub
- URL: https://github.com/commaai/commavq
- Owner: commaai
- License: mit
- Created: 2023-06-27T02:16:50.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-06-11T00:12:24.000Z (4 months ago)
- Last Synced: 2025-06-11T01:25:39.997Z (4 months ago)
- Language: Jupyter Notebook
- Size: 76.6 MB
- Stars: 307
- Watchers: 19
- Forks: 54
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-ChatGPT-repositories - commavq - commaVQ is a dataset of compressed driving video (Others)
README
| Source Video | Compressed Video | Future Prediction |
| --------------- | ---------------- |------------------ |
| | | |A world model is a model that can predict the next state of the world given the observed previous states and actions.
World models are essential to training all kinds of intelligent agents, especially self-driving models.
commaVQ contains:
- encoder/decoder models used to heavily compress driving scenes
- a world model trained on 3,000,000 minutes of driving videos
- a dataset of 100,000 minutes of compressed driving videos# Task
## Lossless compression challenge: make me smaller! $500 challenge
Losslessly compress 5,000 minutes of driving video "tokens". Go to [./compression/](./compression/) to start**Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE**
Submit a single zip file containing the compressed data and a python script to decompress it into its original form using [this form](https://forms.gle/US88Hg7UR6bBuW3BA). Top solutions are listed on [comma's official leaderboard](https://comma.ai/leaderboard).
score
name
method
3.4
szabolcs-cs
self-compressing neural network
2.9
BradyWynn
arithmetic coding with GPT
2.6
pkourouklidis
👑
arithmetic coding with GPT
2.3
anonymous
zpaq
2.3
rostislav
zpaq
2.2
anonymous
zpaq
2.2
anonymous
zpaq
2.2
0x41head
zpaq
2.2
tillinf
zpaq
2.2
nuniesmith
zpaq
1.6
baseline
lzma
## Overview
A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].
## Examples
[./notebooks/encode.ipynb](./notebooks/encode.ipynb) and [./notebooks/decode.ipynb](./notebooks/decode.ipynb) for an example of how to visualize the dataset using a segment of driving video from [comma's drive to Taco Bell](https://blog.comma.ai/taco-bell/)[./notebooks/gpt.ipynb](./notebooks/gpt.ipynb) for an example of how to use the world model to imagine future frames.
[./compression/compress.py](./compression/compress.py) for an example of how to compress the tokens using lzma
## Download the dataset
- Using huggingface datasets
```python
import numpy as np
from datasets import load_dataset
# load the first shard
data_files = {'train': ['data-0000.tar.gz']}
ds = load_dataset('commaai/commavq', data_files=data_files)
tokens = np.array(ds['train'][0]['token.npy'])
poses = np.array(ds['train'][0]['pose.npy'])
```
- Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq## References
[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
[3] https://worldmodels.github.io/
[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.