{"id":37016312,"url":"https://github.com/datafusion-contrib/datafusion-java","last_synced_at":"2026-01-14T01:50:19.110Z","repository":{"id":40249333,"uuid":"416528203","full_name":"datafusion-contrib/datafusion-java","owner":"datafusion-contrib","description":"Java binding to Apache DataFusion","archived":false,"fork":false,"pushed_at":"2025-01-26T14:28:06.000Z","size":490,"stargazers_count":74,"open_issues_count":14,"forks_count":13,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-26T15:26:52.985Z","etag":null,"topics":["arrow","ballista","datafusion","java"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datafusion-contrib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-12T23:31:46.000Z","updated_at":"2025-01-26T14:28:09.000Z","dependencies_parsed_at":"2023-01-19T19:17:36.894Z","dependency_job_id":"87aaf614-30ec-4ccc-a7a9-82ba401cbbad","html_url":"https://github.com/datafusion-contrib/datafusion-java","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/datafusion-contrib/datafusion-java","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datafusion-contrib","download_url":"https://codeload.github.com/datafusion-contrib/datafusion-java/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datafusion-contrib%2Fdatafusion-java/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28408691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T00:40:43.272Z","status":"ssl_error","status_checked_at":"2026-01-14T00:40:42.636Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","ballista","datafusion","java"],"created_at":"2026-01-14T01:50:18.215Z","updated_at":"2026-01-14T01:50:19.089Z","avatar_url":"https://github.com/datafusion-contrib.png","language":"Java","readme":"# datafusion-java\n\n[![Build](https://github.com/datafusion-contrib/datafusion-java/actions/workflows/build.yml/badge.svg)](https://github.com/datafusion-contrib/datafusion-java/actions/workflows/build.yml)\n[![Release](https://github.com/datafusion-contrib/datafusion-java/actions/workflows/release.yml/badge.svg)](https://github.com/datafusion-contrib/datafusion-java/actions/workflows/release.yml)\n[![Maven metadata URL](https://img.shields.io/maven-metadata/v?metadataUrl=https%3A%2F%2Frepo.maven.apache.org%2Fmaven2%2Fio%2Fgithub%2Fdatafusion-contrib%2Fdatafusion-java%2Fmaven-metadata.xml)](https://repo.maven.apache.org/maven2/io/github/datafusion-contrib/datafusion-java/)\n\nA Java binding to [Apache DataFusion][1]\n\n## Status\n\nThis project is still a work in progress, and it currently works with Arrow 14.0 and DataFusion version 25.0.\nIt is built and verified in CI against Java 11 and 21. You may check out the [docker run instructions](#how-to-run-the-interactive-demo)\nwhere Java 21 `jshell` is used to run interactively.\n\n## How to use in your code\n\nThe artifacts are [published][2] to maven central, so you can use datafusion-java like any normal Java library:\n\n```groovy\ndependencies {\n    implementation(\n        group = \"io.github.datafusion-contrib\",\n        name = \"datafusion-java\",\n        version = \"0.16.0\" // or latest version, checkout https://github.com/datafusion-contrib/datafusion-java/releases\n    )\n}\n```\n\nTo test it out, you can use this piece of demo code:\n\n\u003cdetails\u003e\n\u003csummary\u003eDataFusionDemo.java\u003c/summary\u003e\n\n```java\npackage com.me;\n\nimport org.apache.arrow.datafusion.DataFrame;\nimport org.apache.arrow.datafusion.SessionContext;\nimport org.apache.arrow.datafusion.SessionContexts;\n\npublic class DataFusionDemo {\n\n    public static void main(String[] args) throws Exception {\n        try (SessionContext sessionContext = SessionContexts.create()) {\n            sessionContext.sql(\"select sqrt(65536)\").thenCompose(DataFrame::show).join();\n        }\n    }\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ebuild.gradle.kts\u003c/summary\u003e\n\n```kotlin\nplugins {\n  java\n  application\n}\n\nrepositories {\n  mavenCentral()\n  google()\n}\n\ntasks {\n  application {\n    mainClass.set(\"com.me.DataFusionDemo\")\n  }\n}\n\ndependencies {\n  implementation(\n    group = \"io.github.datafusion-contrib\",\n    name = \"datafusion-java\",\n    version = \"0.16.0\"\n  )\n}\n\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eRun result\u003c/summary\u003e\n\n```\n$ ./gradlew run\n...\n\u003e Task :compileKotlin UP-TO-DATE\n\u003e Task :compileJava UP-TO-DATE\n\u003e Task :processResources NO-SOURCE\n\u003e Task :classes UP-TO-DATE\n\n\u003e Task :run\nsuccessfully created tokio runtime\n+--------------------+\n| sqrt(Int64(65536)) |\n+--------------------+\n| 256                |\n+--------------------+\nsuccessfully shutdown tokio runtime\n\nBUILD SUCCESSFUL in 2s\n3 actionable tasks: 1 executed, 2 up-to-date\n16:43:34: Execution finished 'run'.\n```\n\n\u003c/details\u003e\n\n## How to run the interactive demo\n\n### 1. Run using Docker (with `jshell`)\n\nFirst build the docker image:\n\n```\ndocker build -t datafusion-example .\n```\n\nThen you can run the example program using Docker:\n\n```\ndocker run --rm -it datafusion-example\n```\n\nOr start an interactive jshell session:\n\n```\ndocker run --rm -it datafusion-example jshell\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eExample jshell session\u003c/summary\u003e\n\n```text\nJan 11, 2024 1:49:28 AM java.util.prefs.FileSystemPreferences$1 run\nINFO: Created user preferences directory.\n|  Welcome to JShell -- Version 21\n|  For an introduction type: /help intro\n\njshell\u003e import org.apache.arrow.datafusion.*\n\njshell\u003e var context = SessionContexts.create()\n01:41:05.586 [main] DEBUG org.apache.arrow.datafusion.JNILoader -- successfully loaded datafusion_jni from library path\n01:41:05.589 [main] DEBUG org.apache.arrow.datafusion.JNILoader -- datafusion_jni already loaded, returning\n01:41:05.590 [main] DEBUG org.apache.arrow.datafusion.AbstractProxy -- Obtaining DefaultSessionContext@7f58383b8db0\n01:41:05.591 [main] DEBUG org.apache.arrow.datafusion.AbstractProxy -- Obtaining TokioRuntime@7f58383ce110\ncontext ==\u003e org.apache.arrow.datafusion.DefaultSessionContext@2d209079\n\njshell\u003e var df = context.sql(\"select 1.1 + cos(2.0)\").join()\n01:41:10.961 [main] DEBUG org.apache.arrow.datafusion.AbstractProxy -- Obtaining DefaultDataFrame@7f5838209100\ndf ==\u003e org.apache.arrow.datafusion.DefaultDataFrame@34ce8af7\n\njshell\u003e import org.apache.arrow.memory.*\n\njshell\u003e var allocator = new RootAllocator()\n01:41:22.521 [main] INFO org.apache.arrow.memory.BaseAllocator -- Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.\n01:41:22.525 [main] INFO org.apache.arrow.memory.DefaultAllocationManagerOption -- allocation manager type not specified, using netty as the default type\n01:41:22.525 [main] INFO org.apache.arrow.memory.CheckAllocator -- Using DefaultAllocationManager at memory-unsafe-14.0.2.jar!/org/apache/arrow/memory/DefaultAllocationManagerFactory.class\n01:41:22.531 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil -- Constructor for direct buffer found and made accessible\n01:41:22.536 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil -- direct buffer constructor: available\n01:41:22.537 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy -- -Dorg.apache.memory.allocator.pageSize: 8192\n01:41:22.537 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy -- -Dorg.apache.memory.allocator.maxOrder: 11\nallocator ==\u003e Allocator(ROOT) 0/0/0/9223372036854775807 (res/actual/peak/limit)\n\n\njshell\u003e var r = df.collect(allocator).join()\n01:41:29.635 [main] INFO org.apache.arrow.datafusion.DefaultDataFrame -- successfully completed with arr length=610\nr ==\u003e org.apache.arrow.vector.ipc.ArrowFileReader@7ac7a4e4\n\njshell\u003e var root = r.getVectorSchemaRoot()\n01:41:34.658 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel -- Reading buffer with size: 10\n01:41:34.661 [main] DEBUG org.apache.arrow.vector.ipc.ArrowFileReader -- Footer starts at 416, length: 184\n01:41:34.661 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel -- Reading buffer with size: 184\nroot ==\u003e org.apache.arrow.vector.VectorSchemaRoot@6cd28fa7\n\njshell\u003e r.loadNextBatch()\n01:41:39.421 [main] DEBUG org.apache.arrow.vector.ipc.ArrowFileReader -- RecordBatch at 200, metadata: 192, body: 16\n01:41:39.423 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel -- Reading buffer with size: 208\n01:41:39.424 [main] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch -- Buffer in RecordBatch at 0, length: 1\n01:41:39.425 [main] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch -- Buffer in RecordBatch at 8, length: 8\n$8 ==\u003e true\n\njshell\u003e var v = root.getVector(0)\nv ==\u003e [0.6838531634528577]\n```\n\n\u003c/details\u003e\n\n### 2. Build from source\n\nNote you must have a local Rust and Java environment setup.\n\nRun the example in one line:\n\n```bash\n./gradlew run\n```\n\nOr roll your own test example:\n\n```java\nimport org.apache.arrow.datafusion.DataFrame;\nimport org.apache.arrow.datafusion.SessionContext;\nimport org.apache.arrow.datafusion.SessionContexts;\nimport org.apache.arrow.memory.BufferAllocator;\nimport org.apache.arrow.memory.RootAllocator;\nimport org.apache.arrow.vector.Float8Vector;\nimport org.apache.arrow.vector.VectorSchemaRoot;\nimport org.apache.arrow.vector.ipc.ArrowReader;\nimport org.slf4j.Logger;\nimport org.slf4j.LoggerFactory;\n\nimport java.io.IOException;\n\npublic class ExampleMain {\n\n    private static final Logger logger = LoggerFactory.getLogger(ExampleMain.class);\n\n    public static void main(String[] args) throws Exception {\n        try (SessionContext sessionContext = SessionContexts.create(); BufferAllocator allocator = new RootAllocator()) {\n            DataFrame dataFrame = sessionContext.sql(\"select 1.5 + sqrt(2.0)\").get();\n            dataFrame.collect(allocator).thenAccept(ExampleMain::onReaderResult).get();\n        }\n    }\n\n    private static void onReaderResult(ArrowReader reader) {\n        try {\n            VectorSchemaRoot root = reader.getVectorSchemaRoot();\n            while (reader.loadNextBatch()) {\n                Float8Vector vector = (Float8Vector) root.getVector(0);\n                for (int i = 0; i \u003c root.getRowCount(); i += 1) {\n                    logger.info(\"value {}={}\", i, vector.getValueAsDouble(i));\n                }\n            }\n            // close to release resource\n            reader.close();\n        } catch (IOException e) {\n            logger.warn(\"got IO Exception\", e);\n        }\n    }\n}\n```\n\nTo build the library:\n\n```bash\n./gradlew build\n```\n\n[1]: https://github.com/apache/datafusion\n[2]: https://repo.maven.apache.org/maven2/io/github/datafusion-contrib/datafusion-java/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatafusion-contrib%2Fdatafusion-java/lists"}