https://github.com/exasol/parquet-io-java
Java library to read Parquet files.
https://github.com/exasol/parquet-io-java
exasol exasol-integration foundation-library java parquet
Last synced: about 1 year ago
JSON representation
Java library to read Parquet files.
- Host: GitHub
- URL: https://github.com/exasol/parquet-io-java
- Owner: exasol
- License: mit
- Created: 2021-04-16T06:41:38.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2025-02-26T15:28:36.000Z (over 1 year ago)
- Last Synced: 2025-03-24T11:37:49.796Z (over 1 year ago)
- Topics: exasol, exasol-integration, foundation-library, java, parquet
- Language: Scala
- Homepage:
- Size: 195 KB
- Stars: 17
- Watchers: 8
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-java - Parquet IO Java
README
# parquet-io-java
[](https://github.com/exasol/parquet-io-java/actions/workflows/ci-build.yml)
[](https://search.maven.org/artifact/com.exasol/parquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
This project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects.
## Installation
Add this library as a dependency to your project's `pom.xml` file.
```xml
com.exasol
parquet-io-java
LATEST VERSION
```
Please use the latest version of the library.
## Usage
Here is a small example code showing the usage of the library.
```java
final Path path = new Path("/data/parquet/part-0000.parquet");
final Configuration conf = new Configuration();
try (final ParquetReader reader = RowParquetReader
.builder(HadoopInputFile.fromPath(path, conf)).build()) {
Row row = reader.read();
while (row != null) {
List values = row.getValues();
System.out.println(values);
row = reader.read();
}
} catch (final IOException exception) {
//
}
```
## Data Type Mapping
The following table shows how each Parquet data type is mapped into Java data
types.
| Parquet Data Type | Parquet Logical Type | Java Data Type |
|:---------------------|:---------------------|:---------------|
| boolean | | Boolean |
| int32 | | Integer |
| int32 | date | Date |
| int32 | decimal(p, s) | BigDecimal |
| int64 | | Long |
| int64 | timestamp_millis | Timestamp |
| int64 | timestamp_micros | Timestamp |
| int64 | decimal(p, s) | BigDecimal |
| float | | Float |
| double | | Double |
| binary | | String |
| binary | utf8 | String |
| binary | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | | String |
| fixed_len_byte_array | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | uuid | UUID |
| int96 | | Timestamp |
| group | | Map |
| group | LIST | List |
| group | MAP | Map |
| group | REPEATED | List |
### Parquet Repeated Types
Parquet data type can repeat a single field or the group of fields. The
parquet-io-java (PIOJ) reads these data types into Java `List` type.
For example, given the following Parquet schemas:
```
message parquet_schema {
repeated binary name (UTF8);
}
```
```
message parquet_schema {
repeated group person {
required binary name (UTF8);
}
}
```
The PIOJ reads both of these Parquet types into Java list of `["John", "Jane"]`.
On the other hand, you can import a repeated group with multiple fields as a
list of maps.
```
message parquet_schema {
repeated group person {
required binary name (UTF8);
optional int32 age;
}
}
```
The PIOJ reads it into a list of person maps:
```
[ Map("name" -> "John", "age" -> 24), Map("name" -> "Jane", "age" -> 22) ]
```
## Information for Users
- [Changelog](doc/changes/changelog.md)
- [Dependencies](dependencies.md)
## Information for Developers
* [System Requirement Specification](doc/system_requirements.md)
* [Design](doc/design.md)