{"id":20207719,"url":"https://github.com/exasol/parquet-io-java","last_synced_at":"2025-04-10T12:42:28.639Z","repository":{"id":38195323,"uuid":"358501657","full_name":"exasol/parquet-io-java","owner":"exasol","description":"Java library to read Parquet files.","archived":false,"fork":false,"pushed_at":"2025-02-26T15:28:36.000Z","size":200,"stargazers_count":17,"open_issues_count":1,"forks_count":1,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-24T11:37:49.796Z","etag":null,"topics":["exasol","exasol-integration","foundation-library","java","parquet"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/exasol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-16T06:41:38.000Z","updated_at":"2025-02-13T02:58:11.000Z","dependencies_parsed_at":"2022-08-26T09:41:29.388Z","dependency_job_id":"e0510ccc-d4af-442e-9211-efa73fe44485","html_url":"https://github.com/exasol/parquet-io-java","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exasol%2Fparquet-io-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exasol%2Fparquet-io-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exasol%2Fparquet-io-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exasol%2Fparquet-io-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/exasol","download_url":"https://codeload.github.com/exasol/parquet-io-java/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248217159,"owners_count":21066634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["exasol","exasol-integration","foundation-library","java","parquet"],"created_at":"2024-11-14T05:31:36.565Z","updated_at":"2025-04-10T12:42:28.605Z","avatar_url":"https://github.com/exasol.png","language":"Scala","funding_links":[],"categories":["大数据"],"sub_categories":["微服务框架"],"readme":"# parquet-io-java\n\n[![Build Status](https://github.com/exasol/parquet-io-java/actions/workflows/ci-build.yml/badge.svg)](https://github.com/exasol/parquet-io-java/actions/workflows/ci-build.yml)\n[![Maven Central \u0026ndash; Parquet for Java](https://img.shields.io/maven-central/v/com.exasol/parquet-io-java)](https://search.maven.org/artifact/com.exasol/parquet-io-java)\n\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n\n[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=security_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=reliability_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=sqale_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Technical Debt](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=sqale_index)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n\n[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=code_smells)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=coverage)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Duplicated Lines (%)](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=duplicated_lines_density)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java\u0026metric=ncloc)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)\n\nThis project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects.\n\n## Installation\n\nAdd this library as a dependency to your project's `pom.xml` file.\n\n```xml\n\u003cdependencies\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003ecom.exasol\u003c/groupId\u003e\n        \u003cartifactId\u003eparquet-io-java\u003c/artifactId\u003e\n        \u003cversion\u003eLATEST VERSION\u003c/version\u003e\n    \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\nPlease use the latest version of the library.\n\n## Usage\n\nHere is a small example code showing the usage of the library.\n\n```java\nfinal Path path = new Path(\"/data/parquet/part-0000.parquet\");\nfinal Configuration conf = new Configuration();\ntry (final ParquetReader\u003cRow\u003e reader = RowParquetReader\n        .builder(HadoopInputFile.fromPath(path, conf)).build()) {\n    Row row = reader.read();\n    while (row != null) {\n        List\u003cObject\u003e values = row.getValues();\n        System.out.println(values);\n        row = reader.read();\n    }\n} catch (final IOException exception) {\n    //\n}\n```\n\n## Data Type Mapping\n\nThe following table shows how each Parquet data type is mapped into Java data\ntypes.\n\n| Parquet Data Type    | Parquet Logical Type | Java Data Type |\n|:---------------------|:---------------------|:---------------|\n| boolean              |                      | Boolean        |\n| int32                |                      | Integer        |\n| int32                | date                 | Date           |\n| int32                | decimal(p, s)        | BigDecimal     |\n| int64                |                      | Long           |\n| int64                | timestamp_millis     | Timestamp      |\n| int64                | timestamp_micros     | Timestamp      |\n| int64                | decimal(p, s)        | BigDecimal     |\n| float                |                      | Float          |\n| double               |                      | Double         |\n| binary               |                      | String         |\n| binary               | utf8                 | String         |\n| binary               | decimal(p, s)        | BigDecimal     |\n| fixed_len_byte_array |                      | String         |\n| fixed_len_byte_array | decimal(p, s)        | BigDecimal     |\n| fixed_len_byte_array | uuid                 | UUID           |\n| int96                |                      | Timestamp      |\n| group                |                      | Map            |\n| group                | LIST                 | List           |\n| group                | MAP                  | Map            |\n| group                | REPEATED             | List           |\n\n### Parquet Repeated Types\n\nParquet data type can repeat a single field or the group of fields. The\nparquet-io-java (PIOJ) reads these data types into Java `List` type.\n\nFor example, given the following Parquet schemas:\n\n```\nmessage parquet_schema {\n  repeated binary name (UTF8);\n}\n```\n\n```\nmessage parquet_schema {\n  repeated group person {\n    required binary name (UTF8);\n  }\n}\n```\n\nThe PIOJ reads both of these Parquet types into Java list of `[\"John\", \"Jane\"]`.\n\nOn the other hand, you can import a repeated group with multiple fields as a\nlist of maps.\n\n```\nmessage parquet_schema {\n  repeated group person {\n    required binary name (UTF8);\n    optional int32 age;\n  }\n}\n```\n\nThe PIOJ reads it into a list of person maps:\n\n```\n[ Map(\"name\" -\u003e \"John\", \"age\" -\u003e 24), Map(\"name\" -\u003e \"Jane\", \"age\" -\u003e 22) ]\n```\n\n## Information for Users\n\n- [Changelog](doc/changes/changelog.md)\n- [Dependencies](dependencies.md)\n\n## Information for Developers\n\n* [System Requirement Specification](doc/system_requirements.md)\n* [Design](doc/design.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexasol%2Fparquet-io-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexasol%2Fparquet-io-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexasol%2Fparquet-io-java/lists"}