https://github.com/jerolba/parquet-carpet
Java Parquet serialization and deserialization library using Java 17 Records
https://github.com/jerolba/parquet-carpet
java parquet
Last synced: 5 months ago
JSON representation
Java Parquet serialization and deserialization library using Java 17 Records
- Host: GitHub
- URL: https://github.com/jerolba/parquet-carpet
- Owner: jerolba
- License: apache-2.0
- Created: 2023-02-22T18:14:11.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2026-01-08T19:42:41.000Z (5 months ago)
- Last Synced: 2026-01-11T18:30:13.484Z (5 months ago)
- Topics: java, parquet
- Language: Java
- Homepage:
- Size: 1.65 MB
- Stars: 85
- Watchers: 8
- Forks: 12
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-parquet - parquet-carpet - A Java library for serializing and deserializing Parquet files efficiently using Java records. (Libraries / Java)
- awesome-java - Carpet
README
[](https://github.com/jerolba/parquet-carpet/actions)
[](https://maven-badges.herokuapp.com/maven-central/com.jerolba/carpet-record)
[](http://www.apache.org/licenses/LICENSE-2.0.html)
[](https://javadoc.io/doc/com.jerolba/carpet-record)
[](https://codecov.io/gh/jerolba/parquet-carpet)
# Carpet: Parquet Serialization and Deserialization Library for Java
A Java library for serializing and deserializing Parquet files efficiently using Java records. This library provides a simple and user-friendly API for working with Parquet files, making it easy to read and write data in the Parquet format in your Java applications.
**For comprehensive documentation, please visit our [full documentation site](https://carpet.jerolba.com/).**
## Features
- Serialize Java records to Parquet files
- Deserialize Parquet files to Java records
- Support nested data structures
- Support nested Collections and Maps
- Very simple API
- Low level configuration of Parquet properties
- Low overhead procesing files
- Minimized `parquet-java` and hadoop transitive dependencies
## Table of Contents
- [Installation](#installation)
- [Basic Usage](#basic-usage)
- [Advanced Usage (Overview)](#advanced-usage)
- [Full Documentation Site](https://carpet.jerolba.com/)
- [Contribute](#contribute)
- [Build](#build)
- [License](#license)
## Installation
You can include this library in your Java project using Maven:
```xml
com.jerolba
carpet-record
0.6.0
```
or using Gradle:
```gradle
implementation 'com.jerolba:carpet-record:0.6.0'
```
Carpet includes only the essential transitive dependencies required for file read and write operations.
## Basic Usage
To serialize and deserialize Parquet files in your Java application, you just need Java records. You don't need to generate classes or inherit from Carpet classes.
```java
record MyRecord(long id, String name, int size, double value, double percentile)
```
Carpet provides a writer and a reader with a default configuration and convenience methods.
### Serialization
Using reflection, Carpet defines Parquet file schema, and writes all the content of your objects into the file:
```java
List data = calculateDataToPersist();
try (OutputStream outputStream = new FileOutputStream("my_file.parquet")) {
try (CarpetWriter writer = new CarpetWriter<>(outputStream, MyRecord.class)) {
writer.write(data);
}
}
```
### Deserialization
You just need to provide a File and Record class that match parquet schema to read:
```java
List data = new CarpetReader<>(new File("my_file.parquet"), MyRecord.class).toList();
```
If you don't know the schema of the file, or a Map is valid, you can deserialize to `Map`:
```java
List data = new CarpetReader<>(new File("my_file.parquet"), Map.class).toList();
```
## Advanced Usage
Carpet offers a rich set of features for advanced scenarios. For detailed explanations, API references, and examples, please refer to our [comprehensive documentation site](https://carpet.jerolba.com/).
Key advanced topics include:
- **API Details**:
- [CarpetWriter API](https://carpet.jerolba.com/advanced/configuration/)
- [CarpetReader API](https://carpet.jerolba.com/advanced/configuration/)
- **Schema and Data Handling**:
- [Column Name Mapping & Conversion](https://carpet.jerolba.com/advanced/column-mapping/)
- [Supported Data Types (including nested structures, collections, maps)](https://carpet.jerolba.com/advanced/data-types/)
- [Projections](https://carpet.jerolba.com/advanced/projections/)
- [Nullability](https://carpet.jerolba.com/advanced/nullability/)
- [Handling Read Schema Mismatches](https://carpet.jerolba.com/advanced/configuration/#read-schema-mismatch)
- **Configuration & Low-Level Access**:
- [Parquet Configuration Tuning (compression, page sizes, etc.)](https://carpet.jerolba.com/advanced/configuration/)
- [BigDecimal Precision and Scale](https://carpet.jerolba.com/advanced/configuration/#bigdecimal-precision-and-scale)
- [Time Unit Configuration](https://carpet.jerolba.com/advanced/configuration/#time-unit-configuration)
- [Low-Level Parquet Classes Integration](https://carpet.jerolba.com/advanced/low-level-parquet/)
- [Local File System File Handling](https://carpet.jerolba.com/advanced/input-output-files/)
## Build
To run the unit tests:
```bash
./gradlew test
```
To build the jars:
```bash
./gradlew assemble
```
The build runs in [GitHub Actions](https://github.com/jerolba/parquet-carpet/actions):
[](https://github.com/jerolba/parquet-carpet/actions)
## Contribute
Feel free to dive in! [Open an issue](https://github.com/jerolba/parquet-carpet/issues/new) or submit PRs.
Any contributor and maintainer of this project follows the [Contributor Covenant Code of Conduct](https://github.com/jerolba/parquet-carpet/blob/master/CODE_OF_CONDUCT.md).
## License
[Apache 2](https://github.com/jerolba/parquet-carpet/blob/master/LICENSE.txt) © Jerónimo López