Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sksamuel/centurion
Kotlin Bigdata Toolkit
https://github.com/sksamuel/centurion
bigdata java kotlin orc parquet
Last synced: 33 minutes ago
JSON representation
Kotlin Bigdata Toolkit
- Host: GitHub
- URL: https://github.com/sksamuel/centurion
- Owner: sksamuel
- License: apache-2.0
- Created: 2013-10-16T17:10:44.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2024-07-17T18:05:09.000Z (5 months ago)
- Last Synced: 2024-12-16T05:43:18.681Z (6 days ago)
- Topics: bigdata, java, kotlin, orc, parquet
- Language: Kotlin
- Homepage:
- Size: 841 KB
- Stars: 329
- Watchers: 21
- Forks: 44
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
Awesome Lists containing this project
README
# Centurion
![master](https://github.com/sksamuel/centurion/workflows/master/badge.svg)
[](http://search.maven.org/#search%7Cga%7C1%7Ccenturion)
[](https://s01.oss.sonatype.org/content/repositories/snapshots/com/sksamuel/centurion/)
![License](https://img.shields.io/github/license/sksamuel/centurion.svg?style=plastic)## Introduction
Centurion is a JVM (written in Kotlin) toolkit for columnar and streaming formats.
This library allows you to read, write and convert between the following formats:
* [Apache Parquet](https://parquet.apache.org)
* [Apache Orc](https://orc.apache.org)
* [Apache Arrow IPC](https://arrow.apache.org)
* [Apache Avro](https://avro.apache.org)See [changelog](changelog.md) for release notes.
## Schema Conversions
Centurion allows easy conversion of schemas between any of the supported formats, via Centurion's own internal format.
This internal format is a superset of the functionality of all the supported formats, and is intended as an intermediate
format only to allow for conversions.The following table shows how types map between each of the formats.
| Centurion Type | Avro | Parquet | Orc | Arrow |
|-----------------|------------------------------------------|---------------------------|-------------|---------------------|
| Strings | String | Binary (String) | String | Utf8 |
| UUID | String (UUID) | Binary (String) | String | Utf8 |
| Booleans | Boolean | Boolean | Boolean | Bool |
| Int64 | Long | Int64 | Long | Int64 Signed |
| Int32 | Int | Int32 | Int | Int32 Signed |
| Int16 | N/A (Int) | Int32 (Signed Int16) | Short | Int16 Signed |
| Int8 | N/A (Int) | Int32 (Signed Int8) | Byte | Int8 Signed |
| Float64 | Double | Double | Double | FloatingPointDouble |
| Float32 | Float | Float | Float | FloatingPointSingle |
| Enum | Enum | Enum | String | String |
| Decimal | Binary / Fixed with annotation _Decimal_ | Decimal(precision, scale) | Decimal) | Decimal |
| Varchar | Fixed) | N/A (String) | Varchar | N/A (String) |
| TimestampMillis | Long (TimestampMillis) | Int64 (Timestamp) | Timestamp | Timestamp (Millis) |
| TimestampMicros | Long (TimestampMicros) | Int64 (Timestamp) | Unsupported | Timestamp (Micros) |
| Map | Map | Map | Map | Map |