Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chhantyal/parquet-cli
Command line (CLI) tool to inspect Apache Parquet files on the go
https://github.com/chhantyal/parquet-cli
Last synced: 5 days ago
JSON representation
Command line (CLI) tool to inspect Apache Parquet files on the go
- Host: GitHub
- URL: https://github.com/chhantyal/parquet-cli
- Owner: chhantyal
- License: bsd-3-clause
- Created: 2018-04-07T10:29:21.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-11-09T22:24:42.000Z (about 1 year ago)
- Last Synced: 2024-10-12T07:25:15.178Z (3 months ago)
- Language: Python
- Size: 64.5 KB
- Stars: 172
- Watchers: 5
- Forks: 10
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# parquet-cli
Command line (CLI) tool to inspect Apache Parquet files on the goApache Parquet is a columnar storage format commonly used in the Hadoop ecosystem.
`parq` is small, easy to install, Python utility to view and get basic information from Parquet files.
Current features set are what I need, please use Github issues for any requests/suggestions.
## Install
`pip install parquet-cli`
An executable script called `parq` will be installed.
# Use
Once installed, you can use `parq` command.
View Parquet file metadata:
`$ parq input.parquet`
```
# Metadatacreated_by: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
num_columns: 13
num_rows: 1000
num_row_groups: 1
format_version: 1.0
serialized_size: 1125
```Get schema information:
`$ parq input.parquet --schema`
```
# Schemaregistration_dttm: INT96
id: INT32
name: BYTE_ARRAY UTF8
email: BYTE_ARRAY UTF8
...
ip_address: BYTE_ARRAY UTF8
country: BYTE_ARRAY UTF8```
Get total rows count:
`$ parq input.parquet --count`
```
1025
```Get top N records (head)
`$ parq input.parquet --head 10`
Get bottom N records (tail)
`$ parq input.parquet --tail 10`
## Help
`$ parq --help`
```
usage: usage: parq file [-s [SCHEMA] | --head [HEAD] | --tail [TAIL] | -c [COUNT]]positional arguments:
file Parquet fileoptional arguments:
-h, --help show this help message and exit
-s [SCHEMA], --schema [SCHEMA]
get schema information
--head [HEAD] get first N rows from file
--tail [TAIL] get last N rows from file
-c [COUNT], --count [COUNT]
get total rows count
```