{"id":13440702,"url":"https://github.com/apache/orc","last_synced_at":"2026-01-08T12:09:12.750Z","repository":{"id":31579419,"uuid":"35144191","full_name":"apache/orc","owner":"apache","description":"Apache ORC - the smallest, fastest columnar storage for Hadoop workloads","archived":false,"fork":false,"pushed_at":"2025-05-07T22:08:33.000Z","size":58842,"stargazers_count":726,"open_issues_count":35,"forks_count":490,"subscribers_count":44,"default_branch":"main","last_synced_at":"2025-05-10T17:16:39.818Z","etag":null,"topics":["apache","big-data","cpp","java","orc"],"latest_commit_sha":null,"homepage":"https://orc.apache.org/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-05-06T07:00:05.000Z","updated_at":"2025-05-09T21:28:59.000Z","dependencies_parsed_at":"2023-10-17T04:59:17.372Z","dependency_job_id":"bad4bb38-e7f3-4f22-9c4c-feb361d57785","html_url":"https://github.com/apache/orc","commit_stats":{"total_commits":1838,"total_committers":161,"mean_commits":"11.416149068322982","dds":0.794885745375408,"last_synced_commit":"bba33c99b0c6566c616967653ab9a28e04266e4c"},"previous_names":[],"tags_count":151,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Forc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Forc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Forc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Forc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/orc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253852271,"owners_count":21973892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","big-data","cpp","java","orc"],"created_at":"2024-07-31T03:01:25.309Z","updated_at":"2026-01-08T12:09:12.744Z","avatar_url":"https://github.com/apache.png","language":"Java","readme":"# [Apache ORC](https://orc.apache.org/)\n\nORC is a self-describing type-aware columnar file format designed for\nHadoop workloads. It is optimized for large streaming reads, but with\nintegrated support for finding required rows quickly. Storing data in\na columnar format lets the reader read, decompress, and process only\nthe values that are required for the current query. Because ORC files\nare type-aware, the writer chooses the most appropriate encoding for\nthe type and builds an internal index as the file is written.\nPredicate pushdown uses those indexes to determine which stripes in a\nfile need to be read for a particular query and the row indexes can\nnarrow the search to a particular set of 10,000 rows. ORC supports the\ncomplete set of types in Hive, including the complex types: structs,\nlists, maps, and unions.\n\n## ORC File Library\n\nThis project includes both a Java library and a C++ library for reading and writing the _Optimized Row Columnar_ (ORC) file format. The C++ and Java libraries are completely independent of each other and will each read all versions of ORC files.\n\nReleases:\n\n* Latest: [Apache ORC releases](https://orc.apache.org/releases)\n* Maven Central: [![Maven Central](https://maven-badges.herokuapp.com/maven-central/org.apache.orc/orc/badge.svg)](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.orc%22)\n* Downloads: [Apache ORC downloads](https://orc.apache.org/downloads)\n* Release tags: [Apache ORC release tags](https://github.com/apache/orc/releases)\n* Plan: [Apache ORC future release plan](https://github.com/apache/orc/milestones)\n\nThe current build status:\n\n* Main branch [![main build status](https://github.com/apache/orc/actions/workflows/build_and_test.yml/badge.svg?branch=main)](https://github.com/apache/orc/actions/workflows/build_and_test.yml?query=branch%3Amain)\n\nBug tracking: [Apache Jira](https://orc.apache.org/bugs)\n\nThe subdirectories are:\n\n* c++ - the c++ reader and writer\n* cmake_modules - the cmake modules\n* docker - docker scripts to build and test on various linuxes\n* examples - various ORC example files that are used to test compatibility\n* java - the java reader and writer\n* site - the website and documentation\n* tools - the c++ tools for reading and inspecting ORC files\n\n### Building\n\n* Install java 17 or higher\n* Install maven 3.9.9 or higher\n* Install cmake 3.12 or higher\n* Install meson 1.3.0 or higher (Optional)\n\nTo build a release version with debug information:\n\n```shell\n% mkdir build\n% cd build\n% cmake ..\n% make package\n% make test-out\n\n```\n\nTo build a debug version:\n\n```shell\n% mkdir build\n% cd build\n% cmake .. -DCMAKE_BUILD_TYPE=DEBUG\n% make package\n% make test-out\n\n```\n\nTo build a release version without debug information:\n\n```shell\n% mkdir build\n% cd build\n% cmake .. -DCMAKE_BUILD_TYPE=RELEASE\n% make package\n% make test-out\n\n```\n\nTo build only the Java library:\n\n```shell\n% cd java\n% ./mvnw package\n\n```\n\nTo build only the C++ library:\n\n```shell\n% mkdir build\n% cd build\n% cmake .. -DBUILD_JAVA=OFF\n% make package\n% make test-out\n\n```\n\nTo build the C++ library with AVX512 enabled:\n\n```shell\nexport ORC_USER_SIMD_LEVEL=AVX512\n% mkdir build\n% cd build\n% cmake .. -DBUILD_JAVA=OFF -DBUILD_ENABLE_AVX512=ON\n% make package\n% make test-out\n```\n\nCmake option BUILD_ENABLE_AVX512 can be set to \"ON\" or (default value)\"OFF\" at the compile time. At compile time, it defines the SIMD level(AVX512) to be compiled into the binaries.\n\nEnvironment variable ORC_USER_SIMD_LEVEL can be set to \"AVX512\" or (default value)\"NONE\" at the run time. At run time, it defines the SIMD level to dispatch the code which can apply SIMD optimization.\n\nNote that if ORC_USER_SIMD_LEVEL is set to \"NONE\" at run time, AVX512 will not take effect at run time even if BUILD_ENABLE_AVX512 is set to \"ON\" at compile time.\n\n### Building with Meson\n\nWhile CMake is the official build system for orc, there is unofficial support for using Meson to build select parts of the project. To build a debug version of the library and test it using Meson, from the project root you can run:\n\n```shell\nmeson setup build\nmeson compile -C build\nmeson test -C build\n```\n\nBy default, Meson will build unoptimized libraries with debug symbols. By contrast, the CMake build system generates release libraries by default. If you would like to create release libraries ala CMake, you should set the buildtype option. You must either remove the existing build directory before changing that setting, or alternatively pass the ``--reconfigure`` flag:\n\n```shell\nmeson setup build -Dbuildtype=release --reconfigure\nmeson compile -C build\nmeson test -C build\n```\n\nMeson supports running your test suite through valgrind out of the box:\n\n```shell\nmeson test -C build --wrap=valgrind\n```\n\nIf you'd like to enable sanitizers, you can leverage the ``-Db_sanitize=`` option. For example, to enable both ASAN and UBSAN, you can run:\n\n```shell\nmeson setup build -Dbuildtype=debug -Db_sanitize=address,undefined --reconfigure\nmeson compile -C build\nmeson test\n```\n\nMeson takes care of detecting all dependencies on your system, and downloading missing ones as required through its [Wrap system](https://mesonbuild.com/Wrap-dependency-system-manual.html). The dependencies for the project are all stored in the ``subprojects`` directory in individual wrap files. The majority of these are system generated files created by running:\n\n```shell\nmeson wrap install \u003cdepencency_name\u003e\n```\n\nFrom the project root. If you are developing orc and need to add a new dependency in the future, be sure to check Meson's [WrapDB](https://mesonbuild.com/Wrapdb-projects.html) to check if a pre-configured wrap entry exists. If not, you may still manually configure the dependency as outlined in the aforementioned Wrap system documentation.\n","funding_links":[],"categories":["HarmonyOS","Data Serialization","File Formats","大数据"],"sub_categories":["Windows Manager"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Forc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Forc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Forc/lists"}