Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slub/ldjstructurestats
a commandline command (Python3 program) that determines the structure of given line-delimited JSON records
https://github.com/slub/ldjstructurestats
command-line-tool json line-delimited-json python statistics
Last synced: 12 days ago
JSON representation
a commandline command (Python3 program) that determines the structure of given line-delimited JSON records
- Host: GitHub
- URL: https://github.com/slub/ldjstructurestats
- Owner: slub
- License: apache-2.0
- Created: 2019-08-06T11:32:33.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-08-30T11:44:04.000Z (over 5 years ago)
- Last Synced: 2024-11-25T09:44:47.235Z (2 months ago)
- Topics: command-line-tool, json, line-delimited-json, python, statistics
- Language: Python
- Size: 14.6 KB
- Stars: 2
- Watchers: 8
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# ldjstructurestats - line-delimited JSON structure statistics
ldjstructurestats is a commandline command (Python3 program) that determines the structure of given line-delimited JSON records.
This tool can be utilised to discover the structure (+ schema) of given line-delimited JSON records and possible differences/variants at single field paths. Furthermore, it also illustrates, whether a certain field path can contain multiple values or not.
## Usage
It eats line-delimited JSON records from *stdin*.
It puts structure statistics calculated from the given line-delimited JSON records as pure CSV to *stdout*.
```
ldjstructurestatsoptional arguments:
-h, --help show this help message and exit
```* example:
```
ldjstructurestats < [INPUT LINE-DELIMITED JSON RECORDS] > [PATH TO THE OUTPUT CSV FILE]
```
## Run* clone this git repo or just download the [ldjstructurestats.py](ldjstructurestats/ldjstructurestats.py) file
* run ./ldjstructurestats.py
* for a hackish way to use ldjstructurestats system-wide, copy to /usr/local/bin### Install system-wide via pip
```
sudo -H pip3 install --upgrade [ABSOLUTE PATH TO YOUR LOCAL GIT REPOSITORY OF LDJSTRUCTURESTATS]
```
(which provides you ```ldjstructurestats``` as a system-wide commandline command)## Description
#### Statistics Header
#### path_number
* a number/count for each (simple) path/field
#### field_path
* the field path
* rows that have a number in the column 'path_number', contain the simple field path
* field names (/keys) in simple field paths are separated by a `.`
* rows below a line with a simple field path, contain the structure variants/mutations of this simple field path
* field names (/keys) in structure variant field paths are separated by ` > `
* field names (/keys) in structure variant field paths are enclosed in quotation marks (`"`)
* indent = 1 tab + '↳' at the beginning of the field path
* rows below structure variants/mutations that end with a *JSON array*, contain the structure variant field paths incl. the object types that can occur in this *JSON array*
* indent = 2 tabs + '↳' at the beginning of the field path#### multiple_paths
* indicates whether a simple field path occurs multiple times in a record or not (**only** at simple field paths)
* `True` = this simple field path occurs multiple times in at least one record of the input record set
* this value is simply determined by a comparison of **path_existing** and **path_occurrence**, if **path_occurrence** is larger than **path_existing**, then there must be at least on record that contains this simple field path multiple times#### multiple_values
* **only** if, the field path (i.e. structure variant field path) ends with a *JSON array*, this column is either filled with `True` or `False`
* `True` = this field path has *JSON arrays* with multiple values
* `False` = this field path only has *JSON arrays* that do not contain multiple values, i.e., they are single-valued#### path_existing
* the number of records where this (simple) field path exists, i.e., records where this (simple) field path occurs at least once#### path_occurrence
* the occurrence count of the simple field path or its structure variant in the line-delimited JSON records
note: if **multiple_paths** is `True` (at the simple field path) and/or **multiple_values** is `True` (at a related structure variant field path that ends with a *JSON array*), then it's an indicator that a mapping from such a field path could produces multiple values in the output
### Structure Field Path Notation
* behind each field name in a structure field path is a notation of the object type of the value of this field (/key)
* these object types include all possible JSON object types, which are|Object Type Notation|JSON Object Type|
|--------------------|----------------|
| {} | JSON object |
| [] | JSON array |
| (Integer) | integer number |
| (String) | string value |
| (Decimal) | decimal number |
| (Boolean) | boolean value |
| (no object type) | `null` value, i.e., no value |* the notation of the *JSON array* object types is `[]` + `[ARRAY VALUE OBJECT TYPE]`, i.e. `[ARRAY VALUE OBJECT TYPE]` can be one of `{}`, `(Integer)`, `(String)`, `(Decimal)`, `(Boolean)` or `(no object type)`, e.g. `[] {}` or `[] (String)`