https://github.com/tkluck/pandas-nesteddata

Transform hierarchical data (nested arrays/hashes) to a pandas DataFrame according to a compact, readable, user-specified pattern
https://github.com/tkluck/pandas-nesteddata

Last synced: over 1 year ago
JSON representation

Transform hierarchical data (nested arrays/hashes) to a pandas DataFrame according to a compact, readable, user-specified pattern

Host: GitHub
URL: https://github.com/tkluck/pandas-nesteddata
Owner: tkluck
License: gpl-3.0
Created: 2017-01-04T22:33:29.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-01-07T21:32:37.000Z (over 9 years ago)
Last Synced: 2025-01-22T03:32:58.944Z (over 1 year ago)
Language: Python
Homepage:
Size: 25.4 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          pandas-nesteddata version 0.1

=============================

This module transforms hierarchical data (nested arrays/hashes) to

a pandas DataFrame according to a compact, readable, user-specified pattern.

For example, the pattern `..*` transforms a data structure

of the form

    >>> data = [{ 'a': 1, 'b': 2 }, { 'a': 3, 'b': 4 }]

to the DataFrame

           a  b

    index

    0      1  2

    1      3  4

Or, in code:

    >>> from nesteddata import to_dataframe

    >>> to_dataframe('..*', data)

           a  b

    index      

    0      1  2

    1      3  4

The pattern `.*.*` applied to the same data gives the output

    >>> to_dataframe('.*.*', data)

       0_a  0_b  1_a  1_b

    0    1    2    3    4

The pattern `.*.` gives the output

    >>> to_dataframe('.*.', data)

         0  1

    key      

    a    1  3

    b    2  4

It is hoped that the pattern specification is sufficiently powerful for this

module to replace a lot of simple boiler-plate data transformations.

PATTERN SPECIFICATION

---------------------

The dot-separated components represent the following:

- `` represents that the keys at that position should be put in a column

  named name in the csv output. The values belonging to those keys become rows;

- `*` represents that the keys at that position in the pattern should be

  interpreted as column names; their values should be the values for that

  column, all beloning to the same row;

- `{column_name}` or `{column_name_1,column_name_2,...}` is similar to `*`, but

  instead of capturing all the keys at that level of the hierarchy, it only

  captures the named columns.

- `[]` represents a numerical literal key, for indexing arrays or

  dictionaries with keys of type `int`.

- anything else represents a literal key name.

- If your pattern does not contain `*` or `{...}`, you need to pass an

  additional `column_name=` parameter to `to_dataframe` to specify the name

  for the single column where the value will go.

For the purposes of this description, an array should be seen as a collection

of index => value pairs.

It is possible to specify several dot-separated paths in a single pattern,

separated by spaces. In that case, all the paths need to have the same primary

key (that is, the same set of names in `<...>`). Rows will be formed by joining

the columns resulting from the different paths.

ESCAPING SPECIAL CHARACTERS

---------------------------

The characters `<>{}*[].` have a special meaning and as such, cannot be part

of a literal key. More precisely, if they are in such position that they can

be interpreted with their special meaning, this takes precedence.

Allowing a way to escape these special characters will be part of a future

release. For now, look at 'Building the pattern from data structures' below.

BUILDING THE PATTERN FROM DATA STRUCTURES

-----------------------------------------

As an alternative to passing the pattern as a string that needs to be parsed,

it is also possible to pass the pattern as a data structure. For example, the

pattern

    .*.

can also be represented as

    >>> from nesteddata import Glob, Index

    >>> pattern = Glob() + Index('key')

    >>> pattern

    Glob() + Index('key')

    >>> pattern.to_dataframe(data)

         0  1

    key      

    a    1  3

    b    2  4

The constructor functions are:

- `Index(name)` (correponds to ``)

- `Glob()` (corresponds to `*`)

- `Columns(*column_names)` (corresponds to `{column_name_1,..,column_name_n}`)

- `Literal(key)` (correponds to a literal string key or a `[]` integer key)

- `Join(*chunks)` (corresponds to space-separated pattern chunks)

INSTALLATION

------------

To install this module type the following:

    python setup.py

    sudo python setup.py install

DEPENDENCIES

------------

This module requires these other modules and libraries:

    pandas

COPYRIGHT AND LICENCE

---------------------

Copyright (C) 2017 by Timo Kluck

This library is free software; you can redistribute it and/or modify

it under the terms of the General Public License, version 3 or later.

A copy of this license can be found in LICENSE.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tkluck/pandas-nesteddata

Awesome Lists containing this project

README