https://github.com/apache/datafusion-java
Java bindings for Apache DataFusion
https://github.com/apache/datafusion-java
apache arrow datafusion java jni jvm query-engine sql
Last synced: 3 days ago
JSON representation
Java bindings for Apache DataFusion
- Host: GitHub
- URL: https://github.com/apache/datafusion-java
- Owner: apache
- License: apache-2.0
- Created: 2026-05-12T22:03:12.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-10T19:48:39.000Z (8 days ago)
- Last Synced: 2026-06-10T21:16:34.733Z (7 days ago)
- Topics: apache, arrow, datafusion, java, jni, jvm, query-engine, sql
- Language: Java
- Homepage: https://datafusion.apache.org/
- Size: 2.99 MB
- Stars: 25
- Watchers: 1
- Forks: 12
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Notice: NOTICE.txt
Awesome Lists containing this project
- awesome-java - Apache DataFusion Java
README
# Apache DataFusion Java
Java bindings for [Apache DataFusion]. Queries run in native Rust and results
return to the JVM as [Apache Arrow] batches via the Arrow C Data Interface.
[Apache DataFusion]: https://datafusion.apache.org/
[Apache Arrow]: https://arrow.apache.org/
> Early development: the API will change between releases. Bug reports
> and contributions welcome.
## Install
Released to [Maven Central](https://central.sonatype.com/artifact/org.apache.datafusion/datafusion-java).
The JAR bundles the native library for Linux and macOS on x86_64 and
aarch64. Windows users need to build from source.
Maven:
```xml
org.apache.datafusion
datafusion-java
0.1.0
```
Gradle:
```kotlin
implementation("org.apache.datafusion:datafusion-java:0.1.0")
```
Arrow needs `--add-opens=java.base/java.nio=ALL-UNNAMED` on the JVM
command line. See the [installation guide](docs/source/user-guide/installation.md)
for details and for building from source.
## Quickstart
```java
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;
try (var allocator = new RootAllocator();
var ctx = new SessionContext()) {
ctx.registerParquet("orders", "/path/to/orders.parquet");
try (DataFrame df = ctx.sql(
"SELECT o_orderpriority, COUNT(*) AS n " +
"FROM orders GROUP BY o_orderpriority");
ArrowReader reader = df.collect(allocator)) {
while (reader.loadNextBatch()) {
var batch = reader.getVectorSchemaRoot();
// ...
}
}
}
```
`SessionContext` and `DataFrame` are `AutoCloseable` and not thread-safe.
## Documentation
The full documentation lives under [`docs/source/`](docs/source/index.md)
and is built with Sphinx (see [`docs/README.md`](docs/README.md) for the
build steps):
- [User guide](docs/source/user-guide/index.md) — installation, the
DataFrame and SQL APIs, Parquet ingestion.
- [Contributor guide](docs/source/contributor-guide/index.md) — build,
test, code style, and how to bump the DataFusion version.
## Requirements
JDK 17+. Building from source: see
[`docs/source/contributor-guide/development.md`](docs/source/contributor-guide/development.md).
## Contributing
Open an issue to discuss non-trivial changes before sending a PR. See the
[contributor guide](docs/source/contributor-guide/index.md).
## License
Apache License 2.0. See [LICENSE.txt](LICENSE.txt) and [NOTICE.txt](NOTICE.txt).