{"id":28047877,"url":"https://github.com/vortex-data/vortex","last_synced_at":"2026-04-10T18:29:02.743Z","repository":{"id":229846970,"uuid":"764020969","full_name":"vortex-data/vortex","owner":"vortex-data","description":"An extensible, state of the art columnar file format","archived":false,"fork":false,"pushed_at":"2025-05-09T10:16:59.000Z","size":15275,"stargazers_count":1234,"open_issues_count":127,"forks_count":41,"subscribers_count":16,"default_branch":"develop","last_synced_at":"2025-05-09T10:24:01.777Z","etag":null,"topics":["array","arrow","compression","python","rust"],"latest_commit_sha":null,"homepage":"https://vortex.dev","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vortex-data.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-02-27T10:40:00.000Z","updated_at":"2025-05-09T08:18:56.000Z","dependencies_parsed_at":"2024-05-02T04:30:45.665Z","dependency_job_id":"dc4a1720-ea0f-4a41-8e3a-a130ffd8c0d2","html_url":"https://github.com/vortex-data/vortex","commit_stats":{"total_commits":1301,"total_committers":14,"mean_commits":92.92857142857143,"dds":0.727901614142967,"last_synced_commit":"9532cbff9b6a5640807921a94f0e5899986e4e8d"},"previous_names":["fulcrum-so/vortex","spiraldb/vortex","vortex-data/vortex"],"tags_count":147,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vortex-data%2Fvortex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vortex-data%2Fvortex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vortex-data%2Fvortex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vortex-data%2Fvortex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vortex-data","download_url":"https://codeload.github.com/vortex-data/vortex/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253633122,"owners_count":21939389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["array","arrow","compression","python","rust"],"created_at":"2025-05-11T21:06:22.656Z","updated_at":"2026-02-18T13:03:53.805Z","avatar_url":"https://github.com/vortex-data.png","language":"Rust","readme":"# 🌪️ Vortex\n\n[![Build Status](https://github.com/vortex-data/vortex/actions/workflows/ci.yml/badge.svg)](https://github.com/vortex-data/vortex/actions)\n[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/10567/badge)](https://www.bestpractices.dev/projects/10567)\n[![Documentation](https://docs.rs/vortex/badge.svg)](https://docs.vortex.dev)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/vortex-data/vortex)\n[![Crates.io](https://img.shields.io/crates/v/vortex.svg)](https://crates.io/crates/vortex)\n[![PyPI - Version](https://img.shields.io/pypi/v/vortex-data)](https://pypi.org/project/vortex-data/)\n[![Maven - Version](https://img.shields.io/maven-central/v/dev.vortex/vortex-spark)](https://central.sonatype.com/artifact/dev.vortex/vortex-spark)\n[![codecov](https://codecov.io/github/vortex-data/vortex/graph/badge.svg)](https://codecov.io/github/vortex-data/vortex)\n\n[Join the community on Slack!](https://vortex.dev/slack) | [Documentation](https://docs.vortex.dev/) | [Performance Benchmarks](https://bench.vortex.dev)\n\n## Overview\n\nVortex is a next-generation columnar file format and toolkit designed for high-performance data processing.\nIt is the fastest and most extensible format for building data systems backed by object storage. It provides:\n\n- **Blazing Fast Performance**\n  - 100x faster random access reads (vs. modern Apache Parquet)\n  - 10-20x faster scans\n  - 5x faster writes\n  - Similar compression ratios\n  - Efficient support for wide tables with zero-copy/zero-parse metadata\n\n- **Extensible Architecture**\n  - Modeled after Apache DataFusion's extensible approach\n  - Pluggable encoding system, type system, compression strategy, \u0026 layout strategy\n  - Zero-copy compatibility with Apache Arrow\n\n- **Open Source, Neutral Governance**\n  - A Linux Foundation (LF AI \u0026 Data) Project\n  - Apache-2.0 Licensed\n\n- **Integrations**\n  - Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, \u0026 more\n  - Apache Iceberg (coming soon)\n\n\u003e 🟢 **Development Status**: Library APIs may change from version to version, but we now consider\n\u003e the file format \u003cins\u003e_stable_\u003c/ins\u003e. From release 0.36.0, all future releases of Vortex should\n\u003e maintain backwards compatibility of the file format (i.e., be able to read files written by\n\u003e any earlier version \u003e= 0.36.0).\n\n## Key Features\n\n### Core Capabilities\n\n- **Logical Types** - Clean separation between logical schema and physical layout\n- **Zero-Copy Arrow Integration** - Seamless conversion to/from Apache Arrow arrays\n- **Extensible Encodings** - Pluggable physical layouts with built-in optimizations\n- **Cascading Compression** - Support for nested encoding schemes\n- **High-Performance Computing** - Optimized compute kernels for encoded data\n- **Rich Statistics** - Lazy-loaded summary statistics for optimization\n\n### Technical Architecture\n\n#### Logical vs Physical Design\n\nVortex strictly separates logical and physical concerns:\n\n- **Logical Layer**: Defines data types and schema\n- **Physical Layer**: Handles encoding and storage implementation\n- **Built-in Encodings**: Compatible with Apache Arrow's memory format\n- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)\n\n## Quick Start\n\n### Installation\n\n#### Rust Crate\n\nAll features are exported through the main `vortex` crate.\n\n```bash\ncargo add vortex\n```\n\n#### Python Package\n\n```bash\nuv add vortex-data\n```\n\n#### Command Line UI (vx)\n\nFor browsing the structure of Vortex files, you can use the `vx` command-line tool.\n\n```bash\n# Install latest release\ncargo install vortex-tui --locked\n\n# Or build from source\ncargo install --path vortex-tui --locked\n\n# Usage\nvx browse \u003cfile\u003e\n```\n\n### Development Setup\n\n#### Prerequisites (macOS)\n\n```bash\n# Optional but recommended dependencies\nbrew install flatbuffers protobuf  # For .fbs and .proto files\nbrew install duckdb               # For benchmarks\n\n# Install Rust toolchain\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n# or\nbrew install rustup\n\n# Initialize submodules\ngit submodule update --init --recursive\n\n# Setup dependencies with uv\nuv sync --all-packages\n```\n\n### Benchmarking\n\nUse `vx-bench` to run benchmarks comparing engines (DataFusion, DuckDB) and formats (Parquet, Vortex):\n\n```bash\n# Install the benchmark orchestrator\nuv tool install \"bench_orchestrator @ ./bench-orchestrator/\"\n\n# Run TPC-H benchmarks\nvx-bench run tpch --engine datafusion,duckdb --format parquet,vortex\n\n# Compare results\nvx-bench compare --run latest\n```\n\nSee [bench-orchestrator/README.md](bench-orchestrator/README.md) for full documentation.\n\n### Performance Optimization\n\nFor optimal performance, we suggest using [MiMalloc](https://github.com/microsoft/mimalloc):\n\n```rust,ignore\n#[global_allocator]\nstatic GLOBAL_ALLOC: MiMalloc = MiMalloc;\n```\n\n## Project Information\n\n### License\n\nLicensed under the Apache License, Version 2.0.\n\n### Governance\n\nVortex is an independent open-source project and not controlled by any single company. The Vortex Project is a\nsub-project of the Linux Foundation Projects. The governance model is documented in\n[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of\nthe [Technical Charter](https://vortex.dev/charter.pdf).\n\n### Contributing\n\nPlease **do** read [CONTRIBUTING.md](CONTRIBUTING.md) before you contribute.\n\n### Reporting Vulnerabilities\n\nIf you discover a security vulnerability, please email \u003cvuln-report@vortex.dev\u003e.\n\n### Trademarks\n\nCopyright © Vortex a Series of LF Projects, LLC.\nFor terms of use, trademark policy, and other project policies please see \u003chttps://lfprojects.org\u003e\n\n## Acknowledgments\n\nThe Vortex project benefits enormously from groundbreaking work from the academic \u0026 open-source communities.\n\n### Research in Vortex\n\n- [BtrBlocks](https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/papers/btrblocks.pdf) - Efficient columnar compression\n- [FastLanes](https://www.vldb.org/pvldb/vol16/p2132-afroozeh.pdf) \u0026 [FastLanes on GPU](https://dbdbd2023.ugent.be/abstracts/felius_fastlanes.pdf) - High-performance integer compression\n- [FSST](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) - Fast random access string compression\n- [ALP](https://ir.cwi.nl/pub/33334/33334.pdf) \u0026 [G-ALP](https://dl.acm.org/doi/pdf/10.1145/3736227.3736242) - Adaptive lossless floating-point compression\n- [Procella](https://dl.acm.org/citation.cfm?id=3360438) - YouTube's unified data system\n- [Anyblob](https://www.durner.dev/app/media/papers/anyblob-vldb23.pdf) - High-performance access to object storage\n- [ClickHouse](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) - Fast analytics for everyone\n- [MonetDB/X100](https://www.cidrdb.org/cidr2005/papers/P19.pdf) - Hyper-Pipelining Query Execution\n- [Morsel-Driven Parallelism](https://db.in.tum.de/~leis/papers/morsels.pdf): A NUMA-Aware Query Evaluation Format for the Many-Core Age\n- [The FastLanes File Format](https://github.com/cwida/FastLanes/blob/dev/docs/specification.pdf) - Expression Operators\n\n### Vortex in Research\n\n- [Anyblox](https://gienieczko.com/anyblox-paper) - A Framework for Self-Decoding Datasets\n- [F3](https://dl.acm.org/doi/pdf/10.1145/3749163) - Open-Source Data File Format for the Future\n\n### Open Source Inspiration\n\n- [Apache Arrow](https://arrow.apache.org)\n- [Apache DataFusion](https://github.com/apache/datafusion)\n- [parquet2](https://github.com/jorgecarleitao/parquet2) by Jorge Leitao\n- [DuckDB](https://github.com/duckdb/duckdb)\n- [Velox](https://github.com/facebookincubator/velox) \u0026 [Nimble](https://github.com/facebookincubator/nimble)\n\n#### Thanks to all contributors who have shared their knowledge and code with the community! 🚀\n","funding_links":[],"categories":["HarmonyOS","Rust","Related formats"],"sub_categories":["Windows Manager","Tests"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvortex-data%2Fvortex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvortex-data%2Fvortex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvortex-data%2Fvortex/lists"}