{"id":13646090,"url":"https://github.com/commaai/commavq","last_synced_at":"2025-10-05T14:38:56.030Z","repository":{"id":177021454,"uuid":"659027463","full_name":"commaai/commavq","owner":"commaai","description":"commaVQ is a dataset of compressed driving video","archived":false,"fork":false,"pushed_at":"2025-06-11T00:12:24.000Z","size":80321,"stargazers_count":307,"open_issues_count":0,"forks_count":54,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-06-11T01:25:39.997Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/commaai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-27T02:16:50.000Z","updated_at":"2025-06-11T00:12:28.000Z","dependencies_parsed_at":"2023-10-05T01:10:39.231Z","dependency_job_id":"b78a9ac2-2512-4ad1-9903-be6f464283f6","html_url":"https://github.com/commaai/commavq","commit_stats":null,"previous_names":["commaai/commavq"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/commaai/commavq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commaai%2Fcommavq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commaai%2Fcommavq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commaai%2Fcommavq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commaai%2Fcommavq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/commaai","download_url":"https://codeload.github.com/commaai/commavq/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commaai%2Fcommavq/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260838663,"owners_count":23070614,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T01:02:48.275Z","updated_at":"2025-10-05T14:38:56.024Z","avatar_url":"https://github.com/commaai.png","language":"Jupyter Notebook","funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003ecommaVQ challenge\u003c/h1\u003e\n\n\u003ch3\u003e\n  \u003ca href=\"https://comma.ai/leaderboard\"\u003eLeaderboard\u003c/a\u003e\n  \u003cspan\u003e · \u003c/span\u003e\n  \u003ca href=\"https://comma.ai/jobs\"\u003ecomma.ai/jobs\u003c/a\u003e\n  \u003cspan\u003e · \u003c/span\u003e\n  \u003ca href=\"https://discord.comma.ai\"\u003eDiscord\u003c/a\u003e\n  \u003cspan\u003e · \u003c/span\u003e\n  \u003ca href=\"https://x.com/comma_ai\"\u003eX\u003c/a\u003e\n\u003c/h3\u003e\n\n\u003c/div\u003e\n\n\n| Source Video    | Compressed Video | Future Prediction |\n| --------------- | ---------------- |------------------ |\n| \u003cvideo src=\"https://github.com/commaai/commavq/assets/29985433/91894bf7-592b-4204-b3f2-3e805984045c\"\u003e  |  \u003cvideo src=\"https://github.com/commaai/commavq/assets/29985433/3a799ac8-781e-461c-bf14-c15cea42b985\"\u003e    |  \u003cvideo src=\"https://github.com/commaai/commavq/assets/29985433/f6f7699b-b6cb-4f9c-80c9-8e00d75fbfae\"\u003e |\n\nA world model is a model that can predict the next state of the world given the observed previous states and actions.\n\nWorld models are essential to training all kinds of intelligent agents, especially self-driving models.\n\ncommaVQ contains:\n- encoder/decoder models used to heavily compress driving scenes\n- a world model trained on 3,000,000 minutes of driving videos\n- a dataset of 100,000 minutes of compressed driving videos\n\n# Task\n\n## Lossless compression challenge: make me smaller! $500 challenge\nLosslessly compress 5,000 minutes of driving video \"tokens\". Go to [./compression/](./compression/) to start\n\n**Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE**\n\nSubmit a single zip file containing the compressed data and a python script to decompress it into its original form using [this form](https://forms.gle/US88Hg7UR6bBuW3BA). Top solutions are listed on [comma's official leaderboard](https://comma.ai/leaderboard).\n\n\u003c!-- TABLE-START --\u003e\n\u003ctable class=\"ranked\"\u003e\n \u003cthead\u003e\n  \u003ctr\u003e\n   \u003cth\u003e\n   \u003c/th\u003e\n   \u003cth\u003e\n    score\n   \u003c/th\u003e\n   \u003cth\u003e\n    name\n   \u003c/th\u003e\n   \u003cth\u003e\n    method\n   \u003c/th\u003e\n  \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    3.4\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/szabolcs-cs\"\u003e\n     szabolcs-cs\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    self-compressing neural network\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.9\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/BradyWynn\"\u003e\n     BradyWynn\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    arithmetic coding with GPT\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.6\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/pkourouklidis\"\u003e\n     pkourouklidis\n    \u003c/a\u003e\n    👑\n   \u003c/td\u003e\n   \u003ctd\u003e\n    arithmetic coding with GPT\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.3\n   \u003c/td\u003e\n   \u003ctd\u003e\n    anonymous\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.3\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/rostislav\"\u003e\n     rostislav\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.2\n   \u003c/td\u003e\n   \u003ctd\u003e\n    anonymous\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.2\n   \u003c/td\u003e\n   \u003ctd\u003e\n    anonymous\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.2\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/0x41head\"\u003e\n     0x41head\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.2\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/tillinf\"\u003e\n     tillinf\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    2.2\n   \u003c/td\u003e\n   \u003ctd\u003e\n    \u003ca href=\"https://github.com/nuniesmith\"\u003e\n     nuniesmith\n    \u003c/a\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    zpaq\n   \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n   \u003ctd\u003e\n   \u003c/td\u003e\n   \u003ctd\u003e\n    1.6\n   \u003c/td\u003e\n   \u003ctd\u003e\n    baseline\n   \u003c/td\u003e\n   \u003ctd\u003e\n    lzma\n   \u003c/td\u003e\n  \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c!-- TABLE-END --\u003e\n\n## Overview\nA VQ-VAE [1,2] was used to heavily compress each video frame into 128 \"tokens\" of 10 bits each. Each entry of the dataset is a \"segment\" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.\n\nA world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].\n\n## Examples\n[./notebooks/encode.ipynb](./notebooks/encode.ipynb) and [./notebooks/decode.ipynb](./notebooks/decode.ipynb) for an example of how to visualize the dataset using a segment of driving video from [comma's drive to Taco Bell](https://blog.comma.ai/taco-bell/)\n\n[./notebooks/gpt.ipynb](./notebooks/gpt.ipynb) for an example of how to use the world model to imagine future frames.\n\n[./compression/compress.py](./compression/compress.py) for an example of how to compress the tokens using lzma\n\n## Download the dataset\n- Using huggingface datasets\n```python\nimport numpy as np\nfrom datasets import load_dataset\n# load the first shard\ndata_files = {'train': ['data-0000.tar.gz']}\nds = load_dataset('commaai/commavq', data_files=data_files)\ntokens = np.array(ds['train'][0]['token.npy'])\nposes = np.array(ds['train'][0]['pose.npy'])\n```\n- Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq\n\n## References\n[1] Van Den Oord, Aaron, and Oriol Vinyals. \"Neural discrete representation learning.\" Advances in neural information processing systems 30 (2017).\n\n[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. \"Taming transformers for high-resolution image synthesis.\" Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.\n\n[3] https://worldmodels.github.io/\n\n[4] Vaswani, Ashish, et al. \"Attention is all you need.\" Advances in neural information processing systems 30 (2017).\n\n[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. \"Transformers are Sample-Efficient World Models.\" The Eleventh International Conference on Learning Representations. 2022.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommaai%2Fcommavq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcommaai%2Fcommavq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommaai%2Fcommavq/lists"}