An open API service indexing awesome lists of open source software.

https://github.com/ontodev/ldtab

Linked Data Tables: General documentation, specifications, and tests
https://github.com/ontodev/ldtab

Last synced: 5 months ago
JSON representation

Linked Data Tables: General documentation, specifications, and tests

Awesome Lists containing this project

README

          

# ldtab: Linked Data Tables

`ldtab` reads an RDF graph and generates a `statements` table like this:

assertion | retraction | graph | subject | predicate | object | datatype | annotation
----------|------------|-------|-------------|-----------------|----------|----------|------------
1 | 0 | graph | pizza:Pizza | skos:prefLabel | Pizza | @en |
1 | 0 | graph | pizza:Pizza | rdfs:seeAlso | | _IRI |
1 | 0 | graph | pizza:Pizza | rdfs:label | Pizza | @en |
1 | 0 | graph | pizza:Pizza | rdfs:subClassOf | {"owl:onProperty":[{"datatype":"_IRI","object":"pizza:hasBase"}],"owl:someValuesFrom":[{"datatype":"_IRI","object":"pizza:PizzaBase"}],"rdf:type":[{"datatype":"_IRI","object":"owl:Restriction"}]} | _JSON |
1 | 0 | graph | pizza:Pizza | rdfs:subClassOf | pizza:Food | _IRI |
1 | 0 | graph | pizza:Pizza | rdf:type | owl:Class | _IRI |

The design of `ldtab` is still in development.
A prototype implementation is available in Clojure: [ldtab.clj](https://github.com/ontodev/ldtab.clj).
This implementation uses Jena to parse input RDF graphs and supports SQLite and PostgreSQL databases.

## Motivation

The motivation for `ldtab` is threefold:

1. facilitate work with *large RDF graphs*,
2. *simplify* certain SPARQL queries for complex RDF structures involving blank nodes,
3. enable text-based *diffs* between different versions of an RDF graph.

The following provides more details and examples for each of these goals.

### 1. Querying large RDF Graphs

RDF data consists of subject-predicate-object triples that form a graph.
With SPARQL we can perform queries over that graph.
However, loading a large RDF graph into a triplestore for SPARQL can be slow and require a lot of memory (similar issues exist with tools for OWL ontologies).

Yet, in many cases the queries we want to run are actually quite simple.
We often just want all the triples associated with a set of terms,
or all the subjects that match a given predicate and object.
In these cases, SQLite is both efficient and effective.
Consider the following examples:


Task
SQL
SPARQL


Get subjects with labels

SELECT subject, object AS label

FROM statements
WHERE predicate = "rdfs:label";



SELECT ?subject, ?label

WHERE {
?subject rdfs:label ?label .
}



Get OWL classes with labels

SELECT s1.subject, s2.object AS label

FROM statements s1
JOIN statements s2 ON s2.subject = s1.subject
WHERE s1.predicate = "rdf:type"
AND s1.object = "owl:Class"
AND s2.predicate = "rdfs:label";



SELECT ?subject, ?label

WHERE {
?subject
rdf:type owl:Class ;
rdfs:label ?label .
}


### 2. Simplify Complex Queries

Querying RDF data for an entity can be annoying and error-prone
if the entities representation involves complex structures, such as compound OWL class expressions or OWL annotation axioms.
In `ldtab`, such queries can be constructed in a straightforward manner:


Task
SQL
SPARQL


Get all relevant RDF triples for a subject (including nested anonymous structures such as OWL class expressions)

SELECT *

FROM statements
WHERE subject = "pizza:Pizza";



Annoying...

### 3. Text-based Diffs between RDF Graphs

An RDF graph can be serialized in many equivalent ways.
Even for a given concrete syntax, the serialization of an RDF graph is not uniquely determined.
In practice, existing tools rarely guarantee to output the exact same serialization (using a single concrete syntax) of a given RDF graph.
This makes tracking changes in RDF graphs (or OWL ontologies) using popular version control systems, e.g., git, challenging.

`ldtab` provides support to serialize an RDF graph in a uniquely determined manner, enabling text-based `diff`s in version control systems.