https://github.com/cmungall/json-flattener

Python library for denormalizing nested dicts or json objects to tables and back
https://github.com/cmungall/json-flattener

dataframes denormalization json linkml pandas yaml

Last synced: 12 months ago
JSON representation

Python library for denormalizing nested dicts or json objects to tables and back

Host: GitHub
URL: https://github.com/cmungall/json-flattener
Owner: cmungall
License: bsd-3-clause
Created: 2021-07-29T01:31:08.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-03-21T03:02:31.000Z (over 2 years ago)
Last Synced: 2025-07-11T06:49:57.506Z (about 1 year ago)
Topics: dataframes, denormalization, json, linkml, pandas, yaml
Language: Python
Homepage:
Size: 98.6 KB
Stars: 10
Watchers: 1
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # json-flattener

Python library for denormalizing/flattening lists of complex objects to tables/data frames, with roundtripping

## Notebook Example

[EXAMPLE.ipynb](https://github.com/cmungall/json-flattener/blob/main/EXAMPLE.ipynb)

## Description

Given YAML/JSON/JSON-Lines such as:

```yaml

- id: S001

  name: Lord of the Rings

  genres:

    - fantasy

  creator:

    name: JRR Tolkein

    from_country: England

  books:

    - id: S001.1

      name: Fellowship of the Ring

      price: 5.99

      summary: Hobbits

    - id: S001.2

      name: The Two Towers

      price: 5.99

      summary: More hobbits

    - id: S001.3

      name: Return of the King

      price: 6.99

      summary: Yet more hobbits

- id: S002

  name: The Culture Series

  genres:

    - scifi

  creator:

    name: Ian M Banks

    from_country: Scotland

  books:

    - id: S002.1

      name: Consider Phlebas

      price: 5.99

    - id: S002.2

      name: Player of Games

      price: 5.99

```

Denormalize using `jfl` command:

```bash

jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv

```

|id|name|genres|creator_name|creator_from_country|books_name|books_summary|books_price|books_id|creator_genres

|---|---|---|---|---|---|---|---|---|---|

|S001|Lord of the Rings|[fantasy]|JRR Tolkein|England|[Fellowship of the Ring\|The Two Towers\|Return of the King]|[Hobbits\|More hobbits\|Yet more hobbits]|[5.99\|5.99\|6.99]|[S001.1\|S001.2\|S001.3]|

|S002|The Culture Series|[scifi]|Ian M Banks|Scotland|[Consider Phlebas\|Player of Games]||[5.99\|5.99]|[S002.1\|S002.2]|

To convert back to JSON/YAML we must first cache the generated mappings when we do the flatten with `-O`:

```bash

jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -O examples/conf.yaml -o examples/books1-flattened.tsv

```

Then pass this as an argument

```bash

jfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -c examples/conf.yaml -o examples/books1.yaml

```

This library also allows complex fields to be directly serialized as json or yaml (the default is to append `_json` to the key). For example:

```bash

jfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv

```

|id|name|genres|creator_json|books_json|

|---|---|---|---|---|

|S001|Lord of the Rings|[fantasy]|{\"name\": \"JRR Tolkein\", \"from_country\": \"England\"}|[{\"id\": \"S001.1\", \"name\": \"Fellowship of the Ring\", \"summary\": \"Hobbits\", \"price\": 5.99}, {\"id\": \"S001.2\", \"name\": \"The Two Towers\", \"summary\": \"More hobbits\", \"price\": 5.99}, {\"id\": \"S001.3\", \"name\": \"Return of the King\", \"summary\": \"Yet more hobbits\", \"price\": 6.99}]|

|S002|The Culture Series|[scifi]|{\"name\": \"Ian M Banks\", \"from_country\": \"Scotland\"}|[{\"id\": \"S002.1\", \"name\": \"Consider Phlebas\", \"price\": 5.99}, {\"id\": \"S002.2\", \"name\": \"Player of Games\", \"price\": 5.99}]|

|S003|Book of the New Sun|[scifi, fantasy]|{\"name\": \"Gene Wolfe\", \"genres\": [\"scifi\", \"fantasy\"], \"from_country\": \"USA\"}|[{\"id\": \"S003.1\", \"name\": \"Shadow of the Torturer\"}, {\"id\": \"S003.2\", \"name\": \"Claw of the Conciliator\", \"price\": 6.99}]|

|S004|Example with single book||{\"name\": \"Ms Writer\", \"genres\": [\"romance\"], \"from_country\": \"USA\"}|[{\"id\": \"S004.1\", \"name\": \"Blah\"}]|

|S005|Example with no books||{\"name\": \"Mr Unproductive\", \"genres\": [\"romance\", \"scifi\", \"fantasy\"], \"from_country\": \"USA\"}||

See

The primary use case is to go from a rich *normalized* data model (as python objects, JSON, or YAML) to a flatter representation that is amenable to processing with:

 * Solr/Lucene

 * Pandas/R Dataframes

 * Excel/Google sheets

 * Unix cut/grep/cat/etc

 * Simple denormalized SQL database representations

The target denormalized format is a list of rows / a data matrix, where each cell is either an atom or a list of atoms.

## Usage from Python

```python

dict = {

            "id": "A1",

            "subject": {"id": "G1", "name": "gene1", "category": "gene"},

            "object": {"id": "T1", "name": "term1", "category": "term"},

            "publications": ["PMID1", "PMID2"],

            "closure": [

                {"id": "X1", "name": "x1"},

                {"id": "X2", "name": "x2"},

                {"id": "X3", "name": "x3"},

            ],

        }

kconfig = {

            "subject": KeyConfig(delete=True, serializers="yaml"),

            "object": KeyConfig(delete=True, flatten=True),

            "closure": KeyConfig(delete=True, is_list=True, flatten=True),

        }

config = GlobalConfig(key_configs=kconfig)

flattened_objs = flatten(objs, config)

```

## Method

 * Each top level key becomes a column

 * if the key value is a dict/object, then flatten

     * by default a '_' is used to separate the parent key from the inner key

     * e.g. the composition of `creator` and `from_country` becomes `creator_from_country`

     * currently one level of flattening is supported

 * if the key value is a list of atomic entities, then leave as is

 * if the key value is a list of dicts/objects, then flatten each key of this inner dict into a list

     * e.g. if `books` is a list of book objects, and `name` is a key on book, then `books_name` is a list of names of each book

     * order is significant - the first element of `books_name` is matched to the first element of `books_price`, etc

 * Allow any key to be serialized as yaml/json/pickle if configured

## Comparison

### Pandas json_normalize

 - https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.io.json.json_normalize.html

### Java json-flattener

 https://github.com/wnameless/json-flattener

### Python

### csvjson

https://csvjson.com/json2csv

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cmungall/json-flattener

Awesome Lists containing this project

README