{"id":21800304,"url":"https://github.com/apache/arrow-java","last_synced_at":"2025-05-15T15:07:36.721Z","repository":{"id":264619354,"uuid":"893682219","full_name":"apache/arrow-java","owner":"apache","description":"Official Java implementation of Apache Arrow","archived":false,"fork":false,"pushed_at":"2025-05-10T06:21:36.000Z","size":24643,"stargazers_count":47,"open_issues_count":392,"forks_count":50,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-05-10T17:16:14.917Z","etag":null,"topics":["apache-arrow","java"],"latest_commit_sha":null,"homepage":"https://arrow.apache.org/java/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-25T02:39:25.000Z","updated_at":"2025-05-09T17:18:28.000Z","dependencies_parsed_at":"2025-01-17T06:24:13.434Z","dependency_job_id":"48dcd739-bf07-452b-94ac-08acf37b19e1","html_url":"https://github.com/apache/arrow-java","commit_stats":null,"previous_names":["apache/arrow-java"],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Farrow-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Farrow-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Farrow-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Farrow-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/arrow-java/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254364270,"owners_count":22058878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-arrow","java"],"created_at":"2024-11-27T10:44:16.511Z","updated_at":"2025-05-15T15:07:31.696Z","avatar_url":"https://github.com/apache.png","language":"Java","readme":"\u003c!---\n  Licensed to the Apache Software Foundation (ASF) under one\n  or more contributor license agreements.  See the NOTICE file\n  distributed with this work for additional information\n  regarding copyright ownership.  The ASF licenses this file\n  to you under the Apache License, Version 2.0 (the\n  \"License\"); you may not use this file except in compliance\n  with the License.  You may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\n  Unless required by applicable law or agreed to in writing,\n  software distributed under the License is distributed on an\n  \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n  KIND, either express or implied.  See the License for the\n  specific language governing permissions and limitations\n  under the License.\n--\u003e\n\n# Arrow Java\n\n## Getting Started\n\nThe following guides explain the fundamental data structures used in the Java implementation of Apache Arrow.\n\n- [ValueVector](https://arrow.apache.org/docs/java/vector.html) is an abstraction that is used to store a sequence of values having the same type in an individual column.\n- [VectorSchemaRoot](https://arrow.apache.org/docs/java/vector_schema_root.html) is a container that can hold multiple vectors based on a schema.\n- The [Reading/Writing IPC formats](https://arrow.apache.org/docs/java/ipc.html) guide explains how to stream record batches as well as serializing record batches to files.\n\nGenerated javadoc documentation is available [here](https://arrow.apache.org/docs/java/).\n\n## Building from source\n\nRefer to [Building Apache Arrow](https://arrow.apache.org/docs/dev/developers/java/building.html) for documentation of environment setup and build instructions.\n\n## Flatbuffers dependency\n\nArrow uses Google's Flatbuffers to transport metadata.  The java version of the library\nrequires the generated flatbuffer classes can only be used with the same version that\ngenerated them.  Arrow packages a version of the arrow-vector module that shades flatbuffers\nand arrow-format into a single JAR.  Using the classifier \"shade-format-flatbuffers\" in your\n`pom.xml` will make use of this JAR, you can then exclude/resolve the original dependency to\na version of your choosing.\n\n### Updating the flatbuffers generated code\n\n1. Verify that your version of flatc matches the declared dependency:\n\n```bash\n$ flatc --version\nflatc version 25.1.24\n\n$ grep \"dep.fbs.version\" java/pom.xml\n    \u003cdep.fbs.version\u003e25.1.24\u003c/dep.fbs.version\u003e\n```\n\n2. Generate the flatbuffer java files by performing the following:\n\n```bash\ncd $ARROW_HOME\n\n# remove the existing files\nrm -rf java/format/src\n\n# regenerate from the .fbs files\nflatc --java -o java/format/src/main/java format/*.fbs\n\n# prepend license header\nmvn spotless:apply -pl :arrow-format\n```\n\n## Performance Tuning\n\nThere are several system/environmental variables that users can configure.  These trade off safety (they turn off checking) for speed.  Typically they are only used in production settings after the code has been thoroughly tested without using them.\n\n* Bounds Checking for memory accesses: Bounds checking is on by default.  You can disable it by setting either the\nsystem property(`arrow.enable_unsafe_memory_access`) or the environmental variable\n(`ARROW_ENABLE_UNSAFE_MEMORY_ACCESS`) to `true`. When both the system property and the environmental\nvariable are set, the system property takes precedence.\n\n* null checking for gets: `ValueVector` get methods (not `getObject`) methods by default verify the slot is not null.  You can disable it by setting either the\nsystem property(`arrow.enable_null_check_for_get`) or the environmental variable\n(`ARROW_ENABLE_NULL_CHECK_FOR_GET`) to `false`. When both the system property and the environmental\nvariable are set, the system property takes precedence.\n\n## Java Properties\n\n * `-Dio.netty.tryReflectionSetAccessible=true` should be set.\nThis fixes `java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available`. thrown by Netty.\n * To support duplicate fields in a `StructVector` enable `-Darrow.struct.conflict.policy=CONFLICT_APPEND`.\nDuplicate fields are ignored (`CONFLICT_REPLACE`) by default and overwritten. To support different policies for\nconflicting or duplicate fields set this JVM flag or use the correct static constructor methods for `StructVector`s.\n\n## Java Code Style Guide\n\nArrow Java follows the Google style guide [here][3] with the following\ndifferences:\n\n* Imports are grouped, from top to bottom, in this order: static imports,\nstandard Java, org.\\*, com.\\*\n* Line length can be up to 120 characters\n* Operators for line wrapping are at end-of-line\n* Naming rules for methods, parameters, etc. have been relaxed\n* Disabled `NoFinalizer`, `OverloadMethodsDeclarationOrder`, and\n`VariableDeclarationUsageDistance` due to the existing code base. These rules\nshould be followed when possible.\n\nRefer to [checkstyle.xml](dev/checkstyle/checkstyle.xml) for rule specifics.\n\n## Test Logging Configuration\n\nWhen running tests, Arrow Java uses the Logback logger with SLF4J. By default,\nit uses the `logback.xml` present in the corresponding module's `src/test/resources`\ndirectory, which has the default log level set to `INFO`.\nArrow Java can be built with an alternate logback configuration file using the\nfollowing command run in the project root directory:\n\n```bash\nmvn -Dlogback.configurationFile=file:\u003cpath-of-logback-file\u003e\n```\n\nSee [Logback Configuration][1] for more details.\n\n## Integration Tests\n\nIntegration tests which require more time or more memory can be run by activating\nthe `integration-tests` profile. This activates the [maven failsafe][4] plugin\nand any class prefixed with `IT` will be run during the testing phase. The integration\ntests currently require a larger amount of memory (\u003e4GB) and time to complete. To activate\nthe profile:\n\n```bash\nmvn -Pintegration-tests \u003crest of mvn arguments\u003e\n```\n\n[1]: https://logback.qos.ch/manual/configuration.html\n[2]: https://github.com/apache/arrow/blob/main/cpp/README.md\n[3]: http://google.github.io/styleguide/javaguide.html\n[4]: https://maven.apache.org/surefire/maven-failsafe-plugin/\n","funding_links":[],"categories":["大数据"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Farrow-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Farrow-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Farrow-java/lists"}