Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/pudo/graphkit

Process data based on JSON schema
https://github.com/pudo/graphkit

Last synced: 3 months ago
JSON representation

Process data based on JSON schema

Host: GitHub
URL: https://github.com/pudo/graphkit
Owner: pudo
License: mit
Archived: true
Created: 2015-08-06T14:26:43.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2015-09-20T14:32:10.000Z (almost 9 years ago)
Last Synced: 2024-01-20T07:32:59.361Z (5 months ago)
Language: Python
Size: 295 KB
Stars: 12
Watchers: 5
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

my-awesome-stars - pudo/graphkit - Process data based on JSON schema (Python)

README

# graphkit [![Build Status](https://travis-ci.org/pudo/graphkit.svg?branch=master)](https://travis-ci.org/pudo/graphkit)

GraphKit is a pipeline processing tool for graph-based data extraction,
transformation and analysis. The tool's graph model is based on annotated
[JSON schema](http://json-schema.org/) definitions.

A typical pipeline might extract data from a set of CSV files or database
tables, translate them to JSON using a given schema, combine them into an
RDF graph, perform de-duplication and data integration, and eventually run
a set of queries on the resulting graph.

## Stages

The following stages / operations should be supported in the graph processing
pipeline:

* ``csv:read``: Generate an iterator from a CSV file.
* ``readtable``: Generate an iterator from a SQL database table.
* ``json:map``: Apply a JSON schema mapping to the data coming from a source.
* ``rdf:load``: Import the data from a JSON stream into a triple store.
* ``rdf:dedupe``: Apply sameAs mappings based on some external mapping file.
* ``rdf:sparql``: Run a SPARQL query.
* ``mql:query``: Run an MQL query.
* ``rdf:dump``: Export RDF data to a file.
* ``json:unmap``: Apply a JSON schema mapping to convert objects to a flat table.
* ``csv:write``: Export data to a CSV file.

To link flat data structures to nested object graphs matching JSON schema
definitions, ``jsonmapping`` is used.

## Tests

The test suite will usually be executed in it's own ``virtualenv`` and perform a
coverage check as well as the tests. To execute on a system with ``virtualenv``
and ``make`` installed, type:

```bash
$ make test
```