An open API service indexing awesome lists of open source software.

https://github.com/chrisjsewell/jsonextended

Extending the python json package functionality
https://github.com/chrisjsewell/jsonextended

json physical-quantities python-json

Last synced: 2 months ago
JSON representation

Extending the python json package functionality

Awesome Lists containing this project

README

          

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Build Status](https://travis-ci.org/chrisjsewell/jsonextended.svg?branch=master)](https://travis-ci.org/chrisjsewell/jsonextended)\n",
"[![Coverage Status](https://coveralls.io/repos/github/chrisjsewell/jsonextended/badge.svg?branch=master)](https://coveralls.io/github/chrisjsewell/jsonextended?branch=master)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# JSON Extended\n",
"\n",
"A module to extend the python json package functionality:\n",
"\n",
"- Treat a directory structure like a nested dictionary:\n",
"\n",
" - **lightweight plugin system**: define bespoke classes for **parsing** different file extensions and **encoding/decoding** objects\n",
" \n",
" - **lazy loading**: read files only when they are indexed into \n",
" \n",
" - **tab completion**: index as tabs for quick exploration of directory\n",
" \n",
"- Manipulation of nested dictionaries:\n",
"\n",
" - enhanced pretty printer\n",
" \n",
" - Javascript rendered, expandable tree in the Jupyter Notebook\n",
" \n",
" - functions including; filter, merge, flatten, unflatten, diff\n",
" \n",
" - output to directory structure (of n folder levels)\n",
"\n",
"- On-disk indexing option for large json files (using the ijson package)\n",
"\n",
"- Units schema concept to apply and convert physical units (using the\n",
" pint package)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Example"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from jsonextended import edict, plugins, example_mockpaths"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take a directory structure, potentially containing multiple file types: "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mFolder\u001b[0m(\"dir1\") \n",
" \u001b[34mFile\u001b[0m(\"file1.json\") Contents:\n",
" {\"key2\": {\"key3\": 4, \"key4\": 5}, \"key1\": [1, 2, 3]}\n",
" \u001b[32mFolder\u001b[0m(\"subdir1\") \n",
" \u001b[34mFile\u001b[0m(\"file1.csv\") Contents:\n",
" # a csv file\n",
" header1,header2,header3\n",
" val1,val2,val3\n",
" val4,val5,val6\n",
" val7,val8,val9\n",
" \u001b[34mFile\u001b[0m(\"file1.literal.csv\") Contents:\n",
" # a csv file with numbers\n",
" header1,header2,header3\n",
" 1,1.1,string1\n",
" 2,2.2,string2\n",
" 3,3.3,string3\n",
" \u001b[32mFolder\u001b[0m(\"subdir2\") \n",
" \u001b[32mFolder\u001b[0m(\"subsubdir21\") \n",
" \u001b[34mFile\u001b[0m(\"file1.keypair\") Contents:\n",
" # a key-pair file\n",
" key1 val1\n",
" key2 val2\n",
" key3 val3\n",
" key4 val4\n"
]
}
],
"source": [
"datadir = example_mockpaths.directory1\n",
"print(datadir.to_string(indentlvl=3,file_content=True,color=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plugins can be defined for parsing each file type (see [Creating Plugins](#creating-and-loading-plugins) section):"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'csv.basic': 'read *.csv delimited file with headers to {header:[column_values]}',\n",
" 'csv.literal': 'read *.literal.csv delimited files with headers to {header:column_values}',\n",
" 'json.basic': 'read *.json files using json.load',\n",
" 'keypair': \"read *.keypair, where each line should be; ' '\"}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.load_builtin_plugins('parsers')\n",
"plugins.view_plugins('parsers')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"LazyLoad then takes a path name, path-like object or dict-like object, which will lazily load each file with a compatible plugin."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{file1.json:..,subdir1:..,subdir2:..}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lazy = edict.LazyLoad(datadir)\n",
"lazy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lazyload can then be treated like a dictionary, or indexed by tab completion:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['subdir1', 'subdir2', 'file1.json']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(lazy.keys())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lazy[['file1.json','key1']]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.1, 2.2, 3.3]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lazy.subdir1.file1_literal_csv.header2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For pretty printing of the dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mfile1.json\u001b[0m: \n",
" \u001b[32mkey1\u001b[0m: [1, 2, 3]\n",
" \u001b[32mkey2\u001b[0m: {...}\n",
"\u001b[32msubdir1\u001b[0m: \n",
" \u001b[32mfile1.csv\u001b[0m: {...}\n",
" \u001b[32mfile1.literal.csv\u001b[0m: {...}\n",
"\u001b[32msubdir2\u001b[0m: \n",
" \u001b[32msubsubdir21\u001b[0m: {...}\n"
]
}
],
"source": [
"edict.pprint(lazy,depth=2,keycolor='green')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numerous functions exist to manipulate the nested dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{('file1.csv', 'header1'): ['val1', 'val4', 'val7'],\n",
" ('file1.csv', 'header2'): ['val2', 'val5', 'val8'],\n",
" ('file1.csv', 'header3'): ['val3', 'val6', 'val9'],\n",
" ('file1.literal.csv', 'header1'): [1, 2, 3],\n",
" ('file1.literal.csv', 'header2'): [1.1, 2.2, 3.3],\n",
" ('file1.literal.csv', 'header3'): ['string1', 'string2', 'string3']}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"edict.flatten(lazy.subdir1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"LazyLoad parses the `plugins.decode` function to parser plugin's `read_file` method (keyword 'object_hook'). Therefore, bespoke decoder plugins can be set up for specific dictionary key signatures: "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File(\"file2.json\") Contents:\n",
"{\"key1\":{\"_python_set_\": [1, 2, 3]},\"key2\":{\"_numpy_ndarray_\": {\"dtype\": \"int64\", \"value\": [1, 2, 3]}}}\n"
]
}
],
"source": [
"print(example_mockpaths.jsonfile2.to_string())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{u'key1': {u'_python_set_': [1, 2, 3]},\n",
" u'key2': {u'_numpy_ndarray_': {u'dtype': u'int64', u'value': [1, 2, 3]}}}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decimal.Decimal': 'encode/decode Decimal type',\n",
" 'numpy.ndarray': 'encode/decode numpy.ndarray',\n",
" 'pint.Quantity': 'encode/decode pint.Quantity object',\n",
" 'python.set': 'decode/encode python set'}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.load_builtin_plugins('decoders')\n",
"plugins.view_plugins('decoders')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{u'key1': {1, 2, 3}, u'key2': array([1, 2, 3])}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dct = edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()\n",
"dct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This process can be reversed, using encoder plugins: "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decimal.Decimal': 'encode/decode Decimal type',\n",
" 'numpy.ndarray': 'encode/decode numpy.ndarray',\n",
" 'pint.Quantity': 'encode/decode pint.Quantity object',\n",
" 'python.set': 'decode/encode python set'}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.load_builtin_plugins('encoders')\n",
"plugins.view_plugins('encoders')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'{\"key2\": {\"_numpy_ndarray_\": {\"dtype\": \"int64\", \"value\": [1, 2, 3]}}, \"key1\": {\"_python_set_\": [1, 2, 3]}}'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import json\n",
"json.dumps(dct,default=plugins.encode)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" pip install jsonextended\n",
"\n",
"jsonextended has no import dependancies, on Python 3.x and only `pathlib2` on 2.7 but,\n",
"for full functionallity, it is advised to install the following packages:\n",
"\n",
" conda install -c conda-forge ijson numpy pint \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating and Loading Plugins"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from jsonextended import plugins, utils"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plugins are recognised as classes with a minimal set of attributes matching the plugin category interface:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decoders': ['plugin_name', 'plugin_descript', 'dict_signature'],\n",
" 'encoders': ['plugin_name', 'plugin_descript', 'objclass'],\n",
" 'parsers': ['plugin_name', 'plugin_descript', 'file_regex', 'read_file']}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.view_interfaces()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decoders': {}, 'encoders': {}, 'parsers': {}}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.unload_all_plugins()\n",
"plugins.view_plugins()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, a simple parser plugin would be:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"class ParserPlugin(object):\n",
" plugin_name = 'example'\n",
" plugin_descript = 'a parser for *.example files, that outputs (line_number:line)'\n",
" file_regex = '*.example'\n",
" def read_file(self, file_obj, **kwargs):\n",
" out_dict = {}\n",
" for i, line in enumerate(file_obj):\n",
" out_dict[i] = line.strip()\n",
" return out_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plugins can be loaded as a class:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decoders': {},\n",
" 'encoders': {},\n",
" 'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)'}}"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plugins.load_plugin_classes([ParserPlugin],'parsers')\n",
"plugins.view_plugins()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or by directory (loading all .py files):"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'decoders': {},\n",
" 'encoders': {},\n",
" 'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)',\n",
" 'example.other': 'a parser for *.example.other files, that outputs (line_number:line)'}}"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fobj = utils.MockPath('example.py',is_file=True,content=\"\"\"\n",
"class ParserPlugin(object):\n",
" plugin_name = 'example.other'\n",
" plugin_descript = 'a parser for *.example.other files, that outputs (line_number:line)'\n",
" file_regex = '*.example.other'\n",
" def read_file(self, file_obj, **kwargs):\n",
" out_dict = {}\n",
" for i, line in enumerate(file_obj):\n",
" out_dict[i] = line.strip()\n",
" return out_dict\n",
"\"\"\")\n",
"dobj = utils.MockPath(structure=[fobj])\n",
"plugins.load_plugins_dir(dobj,'parsers')\n",
"plugins.view_plugins()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a more complex example of a parser, see `jsonextended.complex_parsers`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interface details"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Parsers:\n",
"\n",
" - *file_regex* attribute, a str denoting what files to apply it to. A file will be parsed by the longest regex it matches.\n",
" - *read_file* method, which takes an (open) file object and kwargs as parameters \n",
" \n",
"- Decoders:\n",
"\n",
" - *dict_signature* attribute, a tuple denoting the keys which the dictionary must have, e.g. dict_signature=('a','b') decodes {'a':1,'b':2}\n",
" - *from_...* method(s), which takes a dict object as parameter. The `plugins.decode` function will use the method denoted by the intype parameter, e.g. if intype='json', then *from_json* will be called.\n",
" \n",
"- Encoders:\n",
"\n",
" - *objclass* attribute, the object class to apply the encoding to, e.g. objclass=decimal.Decimal encodes objects of that type\n",
" - *to_...* method(s), which takes a dict object as parameter. The `plugins.encode` function will use the method denoted by the outtype parameter, e.g. if outtype='json', then *to_json* will be called.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extended Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more information, all functions contain docstrings with tested examples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Folders JSONisation"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from jsonextended import ejson, edict, utils"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['dir1', 'dir2', 'dir3']"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"path = utils.get_test_path()\n",
"ejson.jkeys(path)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dir1: \n",
" dir1_1: {...}\n",
" file1: {...}\n",
" file2: {...}\n",
"dir2: \n",
" file1: {...}\n",
"dir3: \n"
]
}
],
"source": [
"jdict1 = ejson.to_dict(path)\n",
"edict.pprint(jdict1,depth=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"edict.to_html(jdict1,depth=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To try the rendered JSON tree, output in the Jupyter Notebook, go to : https://chrisjsewell.github.io/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Nested Dictionary Manipulation"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"initial: {...}\n",
"meta: {...}\n",
"optimised: {...}\n",
"units: {...}\n"
]
}
],
"source": [
"jdict2 = ejson.to_dict(path,['dir1','file1'])\n",
"edict.pprint(jdict2,depth=1)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"initial: \n",
" crystallographic: \n",
" volume: 924.62752781\n",
" primitive: \n",
" volume: 462.313764\n",
"optimised: \n",
" crystallographic: \n",
" volume: 1063.98960509\n",
" primitive: \n",
" volume: 531.994803\n"
]
}
],
"source": [
"filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)\n",
"edict.pprint(filtered)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(initial, crystallographic, volume): 924.62752781\n",
"(initial, primitive, volume): 462.313764\n",
"(optimised, crystallographic, volume): 1063.98960509\n",
"(optimised, primitive, volume): 531.994803\n"
]
}
],
"source": [
"edict.pprint(edict.flatten(filtered))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Units Schema"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"initial: \n",
" crystallographic: \n",
" volume: 924.62752781 angstrom ** 3\n",
" primitive: \n",
" volume: 462.313764 angstrom ** 3\n",
"optimised: \n",
" crystallographic: \n",
" volume: 1063.98960509 angstrom ** 3\n",
" primitive: \n",
" volume: 531.994803 angstrom ** 3\n"
]
}
],
"source": [
"from jsonextended.units import apply_unitschema, split_quantities\n",
"withunits = apply_unitschema(filtered,{'volume':'angstrom^3'})\n",
"edict.pprint(withunits)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"initial: \n",
" crystallographic: \n",
" volume: 0.92462752781 nanometer ** 3\n",
" primitive: \n",
" volume: 0.462313764 nanometer ** 3\n",
"optimised: \n",
" crystallographic: \n",
" volume: 1.06398960509 nanometer ** 3\n",
" primitive: \n",
" volume: 0.531994803 nanometer ** 3\n"
]
}
],
"source": [
"newunits = apply_unitschema(withunits,{'volume':'nm^3'})\n",
"edict.pprint(newunits)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"initial: \n",
" crystallographic: \n",
" volume: \n",
" magnitude: 0.92462752781\n",
" units: nanometer ** 3\n",
" primitive: \n",
" volume: \n",
" magnitude: 0.462313764\n",
" units: nanometer ** 3\n",
"optimised: \n",
" crystallographic: \n",
" volume: \n",
" magnitude: 1.06398960509\n",
" units: nanometer ** 3\n",
" primitive: \n",
" volume: \n",
" magnitude: 0.531994803\n",
" units: nanometer ** 3\n"
]
}
],
"source": [
"edict.pprint(split_quantities(newunits),depth=4)"
]
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4.0,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}