Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/apache/arrow
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://github.com/apache/arrow
arrow
Last synced: 3 days ago
JSON representation
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
- Host: GitHub
- URL: https://github.com/apache/arrow
- Owner: apache
- License: apache-2.0
- Created: 2016-02-17T08:00:23.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2024-12-03T05:05:27.000Z (9 days ago)
- Last Synced: 2024-12-03T22:34:30.134Z (8 days ago)
- Topics: arrow
- Language: C++
- Homepage: https://arrow.apache.org/
- Size: 193 MB
- Stars: 14,683
- Watchers: 351
- Forks: 3,559
- Open Issues: 4,396
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- stars - apache/arrow - language toolbox for fast data interchange and in-memory analytics (HarmonyOS / Windows Manager)
- awesome-dataframes - Arrow - A cross-language development platform for in-memory data. (Other)
- Fuchsia-Guide - Apache Arrow - memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. Arrow's libraries are available for C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust. (Rust Tools and Frameworks)
- jimsghstars - apache/arrow - Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics (C++)
- Firmware-Guide - Apache Arrow - memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. Arrow's libraries are available for C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust. (Rust Tools)
- awesome-starred - apache/arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing (others)
- awesome-python-machine-learning-resources - GitHub - 6% open · ⏱️ 25.08.2022): (数据容器和结构)
- StarryDivineSky - apache/arrow - copy)的方式进行共享和交换,从而提高数据处理的效率。Arrow 的核心数据结构是统一的列式内存格式,该格式采用了内存连续布局和零复制策略,以减少数据传输的开销。它支持对连续的列式数据使用现代处理器中包SIMD(单指令、多数据)进行向量化操作。此外,Arrow 还提供了一套丰富的数据操作接口,如过滤、转换、聚合等,以支持高效的数据分析和处理。随着时间的推移,Apache Arrow 在逐渐扩展和发展,到现在Apache Arrow已经发展成为一个用于构建处理和传输大型数据集的高性能应用程序软件开发平台,它不仅支持多种编程语言(如C++, Java, Python, R等),还与许多主流的数据处理框架集成,如 Apache Spark、Pandas、TensorFlow 等。 (数据库管理系统 / 网络服务_其他)
- awesome-production-machine-learning - Apache Arrow - In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc.. (Data Storage Optimisation)
- my-awesome - apache/arrow - 12 star:14.7k fork:3.6k Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics (C++)
- AwesomeCppGameDev - arrow - language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for effic… (C++)
- AiTreasureBox - apache/arrow - 12-07_14702_1](https://img.shields.io/github/stars/apache/arrow.svg)|Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing| (Repos)
README
# Apache Arrow
[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/arrow.svg)](https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:arrow)
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](https://github.com/apache/arrow/blob/main/LICENSE.txt)
[![Twitter Follow](https://img.shields.io/twitter/follow/apachearrow.svg?style=social&label=Follow)](https://twitter.com/apachearrow)## Powering In-Memory Analytics
Apache Arrow is a universal columnar format and multi-language toolbox for fast
data interchange and in-memory analytics. It contains a set of technologies that
enable data systems to efficiently store, process, and move data.Major components of the project include:
- [The Arrow Columnar In-Memory Format](https://arrow.apache.org/docs/dev/format/Columnar.html):
a standard and efficient in-memory representation of various datatypes, plain or nested
- [The Arrow IPC Format](https://arrow.apache.org/docs/dev/format/Columnar.html#serialization-and-interprocess-communication-ipc):
an efficient serialization of the Arrow format and associated metadata,
for communication between processes and heterogeneous environments
- [The Arrow Flight RPC protocol](https://github.com/apache/arrow/tree/main/format/Flight.proto):
based on the Arrow IPC format, a building block for remote services exchanging
Arrow data with application-defined semantics (for example a storage server or a database)
- [C++ libraries](https://github.com/apache/arrow/tree/main/cpp)
- [C bindings using GLib](https://github.com/apache/arrow/tree/main/c_glib)
- [C# .NET libraries](https://github.com/apache/arrow/tree/main/csharp)
- [Gandiva](https://github.com/apache/arrow/tree/main/cpp/src/gandiva):
an [LLVM](https://llvm.org)-based Arrow expression compiler, part of the C++ codebase
- [Go libraries](https://github.com/apache/arrow-go)
- [Java libraries](https://github.com/apache/arrow/tree/main/java)
- [JavaScript libraries](https://github.com/apache/arrow/tree/main/js)
- [Python libraries](https://github.com/apache/arrow/tree/main/python)
- [R libraries](https://github.com/apache/arrow/tree/main/r)
- [Ruby libraries](https://github.com/apache/arrow/tree/main/ruby)
- [Rust libraries](https://github.com/apache/arrow-rs)Arrow is an [Apache Software Foundation](https://www.apache.org) project. Learn more at
[arrow.apache.org](https://arrow.apache.org).## What's in the Arrow libraries?
The reference Arrow libraries contain many distinct software components:
- Columnar vector and table-like containers (similar to data frames) supporting
flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers
library)
- Reference-counted off-heap buffer memory management, for zero-copy memory
sharing and handling memory-mapped files
- IO interfaces to local and remote filesystems
- Self-describing binary wire formats (streaming and batch/file-like) for
remote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the
implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
- Readers and writers for various widely-used file formats (such as Parquet, CSV)## Implementation status
The official Arrow libraries in this repository are in different stages of
implementing the Arrow format and related features. See our current
[feature matrix](https://arrow.apache.org/docs/dev/status.html)
on git main.## How to Contribute
Please read our latest [project contribution guide][5].
## Getting involved
Even if you do not plan to contribute to Apache Arrow itself or Arrow
integrations in other projects, we'd be happy to have you involved:- Join the mailing list: send an email to
[[email protected]][1]. Share your ideas and use cases for the
project.
- Follow our activity on [GitHub issues][3]
- [Learn the format][2]
- Contribute code to one of the reference implementations[1]: mailto:[email protected]
[2]: https://github.com/apache/arrow/tree/main/format
[3]: https://github.com/apache/arrow/issues
[4]: https://github.com/apache/arrow
[5]: https://arrow.apache.org/docs/dev/developers/index.html