{"id":18743084,"url":"https://github.com/kxsystems/arrowkdb","last_synced_at":"2025-04-12T21:23:14.562Z","repository":{"id":48368596,"uuid":"313324596","full_name":"KxSystems/arrowkdb","owner":"KxSystems","description":"kdb+ integration with Apache Arrow and Parquet","archived":false,"fork":false,"pushed_at":"2024-05-21T17:31:44.000Z","size":476,"stargazers_count":30,"open_issues_count":5,"forks_count":14,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T03:14:12.459Z","etag":null,"topics":["arrow","kdb","parquet","q"],"latest_commit_sha":null,"homepage":"https://code.kx.com/q/interfaces","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KxSystems.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-16T14:16:43.000Z","updated_at":"2025-04-10T06:01:35.000Z","dependencies_parsed_at":"2024-04-08T01:59:05.854Z","dependency_job_id":null,"html_url":"https://github.com/KxSystems/arrowkdb","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Farrowkdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Farrowkdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Farrowkdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KxSystems%2Farrowkdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KxSystems","download_url":"https://codeload.github.com/KxSystems/arrowkdb/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248632702,"owners_count":21136743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","kdb","parquet","q"],"created_at":"2024-11-07T16:09:58.351Z","updated_at":"2025-04-12T21:23:14.528Z","avatar_url":"https://github.com/KxSystems.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# arrowkdb\n\n![Arrow](apache_arrow.png)\n\n[![GitHub release (latest by date)](https://img.shields.io/github/v/release/kxsystems/arrowkdb?include_prereleases)](https://github.com/kxsystems/arrowkdb/releases) [![Travis (.com) branch](https://travis-ci.com/KxSystems/arrowkdb.svg?branch=main)](https://travis-ci.com/KxSystems/arrowkdb)\n\n\n## Introduction\n\nThis interface allows kdb+ to users read and write Apache Arrow data stored in:\n\n- Apache Parquet file format\n- Arrow IPC record batch file format\n- Arrow IPC record batch stream format\n\nThis is part of the [*Fusion for kdb+*](http://code.kx.com/q/interfaces/fusion/) interface collection.\n\n\n\n## New to kdb+ ?\n\nKdb+ is the world's fastest time-series database, optimized for  ingesting, analyzing and storing massive amounts of structured data. To  get started with kdb+, please visit https://code.kx.com/q/ for downloads and developer information. For general information, visit https://kx.com/\n\n\n\n## New to Apache Arrow?\n\nApache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to both improve the performance of analytical algorithms and the efficiency of moving data from one system (or programming language to another).\n\nA critical component of Apache Arrow is its **in-memory columnar format**, a standardized, language-agnostic specification for representing structured, table-like datasets in-memory. This data format has a rich data type system (included nested data types) designed to support the needs of analytic database systems, data frame libraries, and more.\n\n\n\n## What is the difference between Apache Arrow and Apache Parquet?\n\nParquet is a storage format designed for maximum space efficiency, using advanced compression and encoding techniques. It is ideal when wanting  to minimize disk usage while storing gigabytes of data, or perhaps more. This efficiency comes at the cost of relatively expensive reading into memory, as Parquet data cannot be directly operated on but must be  decoded in large chunks.\n\nConversely, Arrow is an in-memory format meant for direct and efficient use for computational purposes. Arrow data is not compressed but laid out in  natural format for the CPU, so that data can be accessed at arbitrary places at full speed.  Therefore, Arrow and Parquet complement each other with Arrow being used as the in-memory data structure for deserializing Parquet data.\n\n\n\n## Installation\n\n### Requirements\n\n- kdb+ ≥ 3.5 64-bit (Linux/MacOS/Windows)\n- Apache Arrow ≥ 9.0.0 (or ≥ 6.0.0 if building `arrowkdb` from source)\n- C++14 or later\n- CMake ≥ 3.1.3\n\n\u003e :warning: If using the packaged version of `arrowkdb` you should install version 9.0.0 of Apache Arrow\n\n\n### Third-party library installation\n\n#### Linux\n\nFollow the instructions [here](https://arrow.apache.org/install/#c-and-glib-c-packages-for-debian-gnulinux-ubuntu-and-centos) to install `libarrow-dev` and `libparquet-dev` from Apache's APT or Yum repositories.\n\nNote: If using the packaged version of `arrowkdb` you should install version 9.0.0 of both:\n\n```bash\nsudo apt install -y -V libarrow-dev=9.0.0-1\nsudo apt install -y -V libparquet-dev=9.0.0-1\n```\n\n#### macOS\n\nFollow the instructions [here](https://arrow.apache.org/install/#c-and-glib-c-packages-on-homebrew) to install `apache-arrow` using Homebrew.\n\n#### Windows\n\nOn Windows it is necessary to build Arrow from source.  Full details are provided [here](https://arrow.apache.org/docs/developers/cpp/windows.html) but the basic steps are as follows.\n\nFrom a Visual Studio command prompt, clone the Arrow source from github:\n\n```bash\nC:\\Git\u003e git clone https://github.com/apache/arrow.git\nC:\\Git\u003e cd arrow\n```\n\nSwitch to the `9.0.0` tag:\n\n```bash\nC:\\Git\\arrow\u003e git checkout refs/tags/apache-arrow-9.0.0 --\nC:\\Git\u003e cd cpp\n```\n\nCreate an install directory and set an environment variable to this directory (substituting the correct absolute path as appropriate).  This environment variable is used again later when building `arrowkdb`:\n\n```bash\nC:\\Git\\arrow\\cpp\u003e mkdir install\nC:\\Git\\arrow\\cpp\u003e set ARROW_INSTALL=C:\\Git\\arrow\\cpp\\install\n```\n\nCreate the CMake build directory and generate the build files (this will default to using the Visual Studio CMake generator when run from a VS command prompt):\n\n```bash\nC:\\Git\\arrow\\cpp\u003e mkdir build\nC:\\Git\\arrow\\cpp\u003e cd build\nC:\\Git\\arrow\\cpp\\build\u003e cmake .. -DARROW_PARQUET=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_BROTLI=ON -DARROW_BUILD_STATIC=OFF -DARROW_COMPUTE=OFF -DARROW_DEPENDENCY_USE_SHARED=OFF -DCMAKE_INSTALL_PREFIX=%ARROW_INSTALL%\n```\n\nBuild and install Arrow:\n\n```bash\nC:\\Git\\arrow\\cpp\\build\u003e cmake --build . --config Release\nC:\\Git\\arrow\\cpp\\build\u003e cmake --build . --config Release --target install\n```\n\nCopy the Arrow, Parquet and compression DLLs to the `%QHOME%\\w64` directory:\n\n```bash\nC:\\Git\\arrow\\cpp\\build\u003e copy release\\Release\\*.dll %QHOME%\\w64\n```\n\n\n\n### Installing a release\n\nIt is recommended that a user install this interface through a release. This is completed in a number of steps:\n\n1. Ensure you have downloaded/installed the Arrow C++ API following the [instructions](#third-party-library-installation).\n2. [Download a release](https://github.com/KxSystems/arrowkdb/releases) for your system architecture.\n3. Install script `arrowkdb.q` to `$QHOME`, and binary file `lib/arrowkdb.(so|dll)` to `$QHOME/[mlw](64)`, by executing the following from the Release directory:\n\n```bash\n## Linux/macOS\nchmod +x install.sh \u0026\u0026 ./install.sh\n\n## Windows\ninstall.bat\n```\n\n\n\n### Building and installing from source\n\nIn order to successfully build and install this interface from source, the following environment variables must be set:\n\n1. `ARROW_INSTALL` = Location of the Arrow C++ API release (only required if Arrow is not installed globally on the system, e.g. on Windows where Arrow was built from source)\n2. `QHOME` = Q installation directory (directory containing `q.k`)\n\nFrom a shell prompt (on Linux/macOS) or Visual Studio command prompt (on Windows), clone the `arrowkdb` source from github:\n\n```bash\ngit clone https://github.com/KxSystems/arrowkdb.git\ncd arrowkdb\n```\n\nCreate the CMake build directory and generate the build files (this will use the system's default CMake generator):\n\n```bash\nmkdir build\ncd build\n\n## Linux/MacOS\ncmake ..\n\n## Windows (using the Arrow installation which was build from source as above)\ncmake .. -DARROW_INSTALL=%ARROW_INSTALL%\n```\n\nStart the build:\n\n```bash\ncmake --build . --config Release\n```\n\nCreate the install package and deploy:\n\n```bash\ncmake --build . --config Release --target install\n```\n\n\n\n## Documentation\n\nDocumentation outlining the functionality available for this interface can be found in the [`docs`](docs/index.md) folder.\n\n\n\n## Status\n\nThe arrowkdb interface is provided here under an Apache 2.0 license.\n\nIf you find issues with the interface or have feature requests, please consider [raising an issue](https://github.com/KxSystems/arrowkdb/issues).\n\nIf you wish to contribute to this project, please follow the [contribution guide](CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkxsystems%2Farrowkdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkxsystems%2Farrowkdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkxsystems%2Farrowkdb/lists"}