https://github.com/marcelmay/parquet-cli-standalone
Provides a standalone Parquet CLI JAR
https://github.com/marcelmay/parquet-cli-standalone
Last synced: 8 months ago
JSON representation
Provides a standalone Parquet CLI JAR
- Host: GitHub
- URL: https://github.com/marcelmay/parquet-cli-standalone
- Owner: marcelmay
- License: apache-2.0
- Created: 2024-11-03T11:40:26.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-06T18:08:39.000Z (9 months ago)
- Last Synced: 2025-10-06T20:23:28.996Z (9 months ago)
- Language: Java
- Size: 48.8 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-parquet - parquet-cli-standalone - A JAR file for the parquet-cli tool which can be run without any dependencies. (Tools / Command-line)
README
# parquet-cli-standalone
[][license]
[][maven_repo_search]
[](https://github.com/marcelmay/parquet-cli-standalone/actions/workflows/ci.yml)
Provides a standalone [Apache Parquet](https://parquet.apache.org/) [CLI JAR](https://github.com/apache/parquet-java/tree/master/parquet-cli) by including [required Hadoop dependencies](https://github.com/apache/parquet-java/tree/master/parquet-cli#running),
for simple distribution and usage.
The versioning aligns with [Parquet releases](https://github.com/apache/parquet-java/releases).
For example, `parquet-cli-standalone-1.15.0-shaded.jar` would be based on [Parquet 1.15.0](https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.15.0).
## Usage
See the [Apache Parquet CLI documentation](https://github.com/apache/parquet-java/tree/master/parquet-cli#help) for details on the CLI arguments.
```bash
# General
java -jar
# Concrete example
java -jar target/parquet-cli-standalone-*-shaded.jar meta target/test-classes/part-m-00000.gz.parquet
File path: target/test-classes/part-m-00000.gz.parquet
Created by: parquet-mr version 1.15.0-SNAPSHOT (build 73a4430af6c40f8eb246ad4911eb6d103c9a2abe)
Properties:
pig.schema: name: chararray,age: chararray,gender: chararray
writer.model.name: thrift
thrift.descriptor: {
"id" : "STRUCT",
"children" : [ {
"name" : "name",
"fieldId" : 1,
"requirement" : "REQUIRED",
"type" : {
"id" : "STRING",
"logicalTypeAnnotation" : null,
"binary" : false
}
...
Schema:
message ParquetSchema {
required binary name (STRING) = 1;
optional binary age (STRING) = 2;
optional binary gender (STRING) = 3;
}
Row group 0: count: 3 92,33 B records start: 4 total(compressed): 277 B total(uncompressed):181 B
--------------------------------------------------------------------------------
type encodings count avg size nulls min / max
name BINARY G _ 3 23,67 B 0 "Alice3" / "Charles3"
age BINARY G _ R 3 33,67 B 0 "average" / "average"
gender BINARY G _ R 3 35,00 B 0 "unavailable" / "unavailable"
```
[license]: https://www.apache.org/licenses/LICENSE-2.0
[maven_repo_search]: http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22de.m3y.parquet%22%20AND%20a%3A%22parquet-cli-standalone%22