Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nathanhowell/parquet-tools
https://github.com/nathanhowell/parquet-tools
Last synced: about 6 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/nathanhowell/parquet-tools
- Owner: NathanHowell
- License: mit
- Created: 2018-01-18T17:07:10.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2023-04-26T13:56:51.000Z (over 1 year ago)
- Last Synced: 2024-10-11T23:56:05.764Z (26 days ago)
- Language: Dockerfile
- Size: 401 KB
- Stars: 9
- Watchers: 2
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Parquet Tools in Docker
This is a small container image containing the AdoptOpenJDK 8 JRE and the parquet-tools library.
I originally created this to run on a machine with Java 9 because it wouldn't load some Hadoop libraries.
This is no longer be a limitation on Java 11 however it still comes in handy.## Commands
All parquet-tools commands are available.
Below are a few examples of how to use this image:### Cat
```console
$ docker run --rm -it -v $(PWD)/users.parquet:/tmp/file.parquet nathanhowell/parquet-tools cat /tmp/file.parquetname = Alyssa
favorite_numbers:
.array = 3
.array = 9
.array = 15
.array = 20name = Ben
favorite_color = red
favorite_numbers:
```### Schema
```console
$ docker run --rm -it -v $(PWD)/users.parquet:/tmp/file.parquet nathanhowell/parquet-tools schema /tmp/file.parquetmessage example.avro.User {
required binary name (STRING);
optional binary favorite_color (STRING);
required group favorite_numbers (LIST) {
repeated int32 array;
}
}
```### Metadata
```console
$ docker run --rm -it -v $(PWD)/users.parquet:/tmp/file.parquet nathanhowell/parquet-tools meta /tmp/file.parquetfile: file:/tmp/file.parquet
creator: parquet-mr version 1.4.3
extra: avro.schema = {"type":"record","name":"User","namespace":"example.avro","fields":[{"name":"name","type":"string"},{"name":"favorite_color","type":["string","null"]},{"name":"favorite_numbers","type":{"type":"array","items":"int"}}]}file schema: example.avro.User
--------------------------------------------------------------------------------
name: REQUIRED BINARY L:STRING R:0 D:0
favorite_color: OPTIONAL BINARY L:STRING R:0 D:1
favorite_numbers: REQUIRED F:1
.array: REPEATED INT32 R:1 D:1row group 1: RC:2 TS:109 OFFSET:4
--------------------------------------------------------------------------------
name: BINARY SNAPPY DO:0 FPO:4 SZ:36/34/0.94 VC:2 ENC:PLAIN,BIT_PACKED ST:[no stats for this column]
favorite_color: BINARY SNAPPY DO:0 FPO:40 SZ:32/30/0.94 VC:2 ENC:RLE,PLAIN,BIT_PACKED ST:[no stats for this column]
favorite_numbers:
.array: INT32 SNAPPY DO:0 FPO:72 SZ:45/45/1.00 VC:5 ENC:RLE,PLAIN ST:[no stats for this column]
```## Notes
There is an experimental native image available.
Run `make native` to compile parquet-tools with the GraalVM AOT compiler.
This reduces startup time by over a second on my development machine.