Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/apache/arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://github.com/apache/arrow

arrow

Last synced: 3 days ago
JSON representation

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

Awesome Lists containing this project

README

        

# Apache Arrow

[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/arrow.svg)](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:arrow)
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow/blob/main/LICENSE.txt)
[![Twitter Follow](https://img.shields.io/twitter/follow/apachearrow.svg?style=social&label=Follow)](https://twitter.com/apachearrow)

## Powering In-Memory Analytics

Apache Arrow is a universal columnar format and multi-language toolbox for fast
data interchange and in-memory analytics. It contains a set of technologies that
enable data systems to efficiently store, process, and move data.

Major components of the project include:

- [The Arrow Columnar In-Memory Format](https://arrow.apache.org/docs/dev/format/Columnar.html):
a standard and efficient in-memory representation of various datatypes, plain or nested
- [The Arrow IPC Format](https://arrow.apache.org/docs/dev/format/Columnar.html#serialization-and-interprocess-communication-ipc):
an efficient serialization of the Arrow format and associated metadata,
for communication between processes and heterogeneous environments
- [The Arrow Flight RPC protocol](https://github.com/apache/arrow/tree/main/format/Flight.proto):
based on the Arrow IPC format, a building block for remote services exchanging
Arrow data with application-defined semantics (for example a storage server or a database)
- [C++ libraries](https://github.com/apache/arrow/tree/main/cpp)
- [C bindings using GLib](https://github.com/apache/arrow/tree/main/c_glib)
- [C# .NET libraries](https://github.com/apache/arrow/tree/main/csharp)
- [Gandiva](https://github.com/apache/arrow/tree/main/cpp/src/gandiva):
an [LLVM](https://llvm.org)-based Arrow expression compiler, part of the C++ codebase
- [Go libraries](https://github.com/apache/arrow-go)
- [Java libraries](https://github.com/apache/arrow/tree/main/java)
- [JavaScript libraries](https://github.com/apache/arrow/tree/main/js)
- [Python libraries](https://github.com/apache/arrow/tree/main/python)
- [R libraries](https://github.com/apache/arrow/tree/main/r)
- [Ruby libraries](https://github.com/apache/arrow/tree/main/ruby)
- [Rust libraries](https://github.com/apache/arrow-rs)

Arrow is an [Apache Software Foundation](https://www.apache.org) project. Learn more at
[arrow.apache.org](https://arrow.apache.org).

## What's in the Arrow libraries?

The reference Arrow libraries contain many distinct software components:

- Columnar vector and table-like containers (similar to data frames) supporting
flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers
library)
- Reference-counted off-heap buffer memory management, for zero-copy memory
sharing and handling memory-mapped files
- IO interfaces to local and remote filesystems
- Self-describing binary wire formats (streaming and batch/file-like) for
remote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the
implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
- Readers and writers for various widely-used file formats (such as Parquet, CSV)

## Implementation status

The official Arrow libraries in this repository are in different stages of
implementing the Arrow format and related features. See our current
[feature matrix](https://arrow.apache.org/docs/dev/status.html)
on git main.

## How to Contribute

Please read our latest [project contribution guide][5].

## Getting involved

Even if you do not plan to contribute to Apache Arrow itself or Arrow
integrations in other projects, we'd be happy to have you involved:

- Join the mailing list: send an email to
[[email protected]][1]. Share your ideas and use cases for the
project.
- Follow our activity on [GitHub issues][3]
- [Learn the format][2]
- Contribute code to one of the reference implementations

[1]: mailto:[email protected]
[2]: https://github.com/apache/arrow/tree/main/format
[3]: https://github.com/apache/arrow/issues
[4]: https://github.com/apache/arrow
[5]: https://arrow.apache.org/docs/dev/developers/index.html