{"id":13441834,"url":"https://github.com/facebookincubator/velox","last_synced_at":"2025-05-16T01:02:57.025Z","repository":{"id":37042752,"uuid":"388946490","full_name":"facebookincubator/velox","owner":"facebookincubator","description":"A composable and fully extensible C++ execution engine library for data management systems.","archived":false,"fork":false,"pushed_at":"2025-05-09T00:24:49.000Z","size":160298,"stargazers_count":3727,"open_issues_count":1130,"forks_count":1252,"subscribers_count":119,"default_branch":"main","last_synced_at":"2025-05-09T00:54:11.663Z","etag":null,"topics":["data-management","query-processing"],"latest_commit_sha":null,"homepage":"https://velox-lib.io/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookincubator.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-23T23:00:23.000Z","updated_at":"2025-05-08T20:36:21.000Z","dependencies_parsed_at":"2024-04-12T03:27:46.812Z","dependency_job_id":"fc0cf73b-7ee8-430a-a6c5-e9dc1d1ade73","html_url":"https://github.com/facebookincubator/velox","commit_stats":{"total_commits":9204,"total_committers":494,"mean_commits":18.63157894736842,"dds":0.884180790960452,"last_synced_commit":"f93eae6534623e1dd5842cb246d2bad72820a69f"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fvelox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fvelox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fvelox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookincubator%2Fvelox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookincubator","download_url":"https://codeload.github.com/facebookincubator/velox/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254448578,"owners_count":22072764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-management","query-processing"],"created_at":"2024-07-31T03:01:38.628Z","updated_at":"2025-05-16T01:02:51.989Z","avatar_url":"https://github.com/facebookincubator.png","language":"C++","funding_links":[],"categories":["Database","HarmonyOS","C++","数据库管理系统","others","Apps"],"sub_categories":["Windows Manager","网络服务_其他","Development"],"readme":"\u003cimg src=\"static/logo.svg\" alt=\"Velox logo\" width=\"50%\" align=\"center\" /\u003e\n\nVelox is a composable execution engine distributed as an open source C++\nlibrary. It provides reusable, extensible, and high-performance data processing\ncomponents that can be (re-)used to build data management systems focused on\ndifferent analytical workloads, including batch, interactive, stream\nprocessing, and AI/ML. Velox was created by Meta and it is currently developed\nin partnership with IBM/Ahana, Intel, Voltron Data, Microsoft, ByteDance and\nmany other companies.\n\nIn common usage scenarios, Velox takes a fully optimized query plan as input\nand performs the described computation. Considering Velox does not provide a\nSQL parser, a dataframe layer, or a query optimizer, it is usually not meant\nto be used directly by end-users; rather, it is mostly used by developers\nintegrating and optimizing their compute engines.\n\nVelox provides the following high-level components:\n\n* **Type**: a generic typing system that supports scalar, complex, and nested\n  types, such as structs, maps, arrays, etc.\n* **Vector**: an [Arrow-compatible columnar memory layout\n  module](https://facebookincubator.github.io/velox/develop/vectors.html),\n  providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in\n  addition to a lazy materialization pattern and support for out-of-order\n  writes.\n* **Expression Eval**: a [fully vectorized expression evaluation\n  engine](https://facebookincubator.github.io/velox/develop/expression-evaluation.html)\n  that allows expressions to be efficiently executed on top of Vector/Arrow\n  encoded data.\n* **Functions**: sets of vectorized scalar, aggregates, and window functions\n  implementations following the Presto and Spark semantic.\n* **Operators**: implementation of relational operators such as scans, writes,\n  projections, filtering, grouping, ordering, shuffle/exchange, [hash, merge,\n  and nested loop joins](https://facebookincubator.github.io/velox/develop/joins.html),\n  unnest, and more.\n* **I/O**: a connector interface for extensible data sources and sinks,\n  supporting different file formats (ORC/DWRF, Parquet, Nimble), and storage\n  adapters (S3, HDFS, GCS, ABFS, local files) to be used.\n* **Network Serializers**: an interface where different wire protocols can be\n  implemented, used for network communication, supporting\n  [PrestoPage](https://prestodb.io/docs/current/develop/serialized-page.html)\n  and Spark's UnsafeRow.\n* **Resource Management**: a collection of primitives for handling\n  computational resources, such as [memory\n  arenas](https://facebookincubator.github.io/velox/develop/arena.html) and\n  buffer management, tasks, drivers, and thread pools for CPU and thread\n  execution, spilling, and caching.\n\nVelox is extensible and allows developers to define their own engine-specific\nspecializations, including:\n\n1. Custom types\n2. [Simple and vectorized functions](https://facebookincubator.github.io/velox/develop/scalar-functions.html)\n3. [Aggregate functions](https://facebookincubator.github.io/velox/develop/aggregate-functions.html)\n4. Window functions\n5. Operators\n6. File formats\n7. Storage adapters\n8. Network serializers\n\n## Examples\n\nExamples of extensibility and integration with different component APIs [can be\nfound here](velox/examples)\n\n## Documentation\n\nDeveloper guides detailing many aspects of the library, in addition to the list\nof available functions [can be found here.](https://facebookincubator.github.io/velox)\n\nBlog posts are available [here](https://velox-lib.io/blog).\n\n## Community\n\nVelox is an open source project supported by a community of individual\ncontributors and organizations. The project's technical governance mechanics is\ndescribed [in this\ndocument.](https://velox-lib.io/docs/community/technical-governance).\n\nProject maintainers [are listed\nhere](https://velox-lib.io/docs/community/components-and-maintainers).\n\nThe main communication channel with the Velox OSS community is through the [the\nVelox-OSS Slack workspace](http://velox-oss.slack.com), github Issues, and\nDiscussions.\n\nFor access to the Velox Slack workspace, please add a comment [to this\nDiscussion](https://github.com/facebookincubator/velox/discussions/11348)\n\n## Contributing\n\nCheck our [contributing guide](CONTRIBUTING.md) to learn about how to\ncontribute to the project.\n\n## License\n\nVelox is licensed under the Apache 2.0 License. A copy of the license\n[can be found here.](LICENSE)\n\n\n## Getting Started\n\n### Get the Velox Source\n```\ngit clone https://github.com/facebookincubator/velox.git\ncd velox\n```\nOnce Velox is checked out, the first step is to install the dependencies.\nDetails on the dependencies and how Velox manages some of them for you\n[can be found here](CMake/resolve_dependency_modules/README.md).\n\nVelox also provides the following scripts to help developers setup and install Velox\ndependencies for a given platform.\n\n### Setting up dependencies\n\nThe following setup scripts use the `DEPENDENCY_DIR` environment variable to set the\nlocation to download and build packages. This defaults to `deps-download` in the current\nworking directory.\n\nUse `INSTALL_PREFIX` to set the install directory of the packages. This defaults to\n`deps-install` in the current working directory on macOS and to the default install\nlocation (eg. `/usr/local`) on linux.\nUsing the default install location `/usr/local` on macOS is discouraged since this\nlocation is used by certain Homebrew versions.\n\nManually add the `INSTALL_PREFIX` value in the IDE or bash environment,\nsay `export INSTALL_PREFIX=/Users/$USERNAME/velox/deps-install` to `~/.zshrc` so that\nsubsequent Velox builds can use the installed packages.\n\n*You can reuse `DEPENDENCY_INSTALL` and `INSTALL_PREFIX` for Velox clients such as Prestissimo\nby specifying a common shared directory.`*\n\nThe build parallelism for dependencies can be controlled by the `BUILD_THREADS` environment\nvariable and overrides the default number of parallel processes used for compiling and linking.\nThe default value is the number of cores on your machine.\nThis is useful if your machine has lots of cores but no matching memory to process all\ncompile and link processes in parallel resulting in OOM kills by the kernel.\n\n### Setting up on macOS\n\nOn a macOS machine (either Intel or Apple silicon) you can setup and then build like so:\n\n```shell\n$ ./scripts/setup-macos.sh\n$ make\n```\n\nWith macOS 14.4 and XCode 15.3 where `m4` is missing, you can either\n1. install `m4` via `brew`:\n```shell\n$ brew install m4\n$ export PATH=/opt/homebrew/opt/m4/bin:$PATH\n```\n\n2. or use `gm4` instead:\n```shell\n$ M4=/usr/bin/gm4 make\n```\n\n### Setting up on Ubuntu (20.04 or later)\n\nThe supported architectures are x86_64 (avx, sse), and AArch64 (apple-m1+crc, neoverse-n1).\nYou can build like so:\n\n```shell\n$ ./scripts/setup-ubuntu.sh\n$ make\n```\n\n### Setting up on Centos 9 Stream with adapters\n\nVelox adapters include file-systems such as AWS S3, Google Cloud Storage,\nand Azure Blob File System. These adapters require installation of additional\nlibraries. Once you have checked out Velox, you can setup and build like so:\n\n```shell\n$ ./scripts/setup-centos9.sh\n$ ./scripts/setup-adapters.sh\n$ make\n```\n\nNote that `setup-adapters.sh` supports macOS and Ubuntu 20.04 or later.\n\n### Using Clang on Linux\n\nClang 15 can be additionally installed during the setup step for Ubuntu 22.04/24.04\nand CentOS 9 by setting the `USE_CLANG` environment variable prior to running the platform specific setup script.\n```shell\n$ export USE_CLANG=true\n```\nThis will install and use Clang 15 to build the dependencies instead of using the default GCC compiler.\n\nOnce completed, and before running any `make` command, set the compiler to be used:\n```shell\n$ export CC=/usr/bin/clang-15\n$ export CXX=/usr/bin/clang++-15\n$ make\n```\n\n### Building Velox\n\nRun `make` in the root directory to compile the sources. For development, use\n`make debug` to build a non-optimized debug version, or `make release` to build\nan optimized version.  Use `make unittest` to build and run tests.\n\nNote that,\n* Velox requires a compiler at the minimum GCC 11.0 or Clang 15.0.\n* Velox requires the CPU to support instruction sets:\n  * bmi\n  * bmi2\n  * f16c\n* Velox tries to use the following (or equivalent) instruction sets where available:\n  * On Intel CPUs\n    * avx\n    * avx2\n    * sse\n  * On ARM\n    * Neon\n    * Neon64\n\nBuild metrics for Velox are published at \u003chttps://facebookincubator.github.io/velox/bm-report/\u003e\n\n### Building Velox with docker-compose\n\nIf you don't want to install the system dependencies required to build Velox,\nyou can also build and run tests for Velox on a docker container\nusing [docker-compose](https://docs.docker.com/compose/).\nUse the following commands:\n\n```shell\n$ docker-compose build ubuntu-cpp\n$ docker-compose run --rm ubuntu-cpp\n```\nIf you want to increase or decrease the number of threads used when building Velox\nyou can override the `NUM_THREADS` environment variable by doing:\n```shell\n$ docker-compose run -e NUM_THREADS=\u003cNUM_THREADS_TO_USE\u003e --rm ubuntu-cpp\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookincubator%2Fvelox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookincubator%2Fvelox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookincubator%2Fvelox/lists"}