Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/semantifyit/RocketRML
https://github.com/semantifyit/RocketRML
rdf rml rml-mapper
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/semantifyit/RocketRML
- Owner: semantifyit
- License: cc-by-sa-4.0
- Created: 2018-11-20T15:24:47.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-12T10:46:38.000Z (about 2 years ago)
- Last Synced: 2024-10-31T17:35:32.357Z (3 months ago)
- Topics: rdf, rml, rml-mapper
- Language: JavaScript
- Homepage: https://semantify.it
- Size: 925 KB
- Stars: 25
- Watchers: 7
- Forks: 12
- Open Issues: 16
-
Metadata Files:
- Readme: Readme.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-kgc-tools - RocketRML - RML processor to generate RDF knowledge graphs from heterogeneous data sources, implemented in JavaScript. (KGC Materializers)
README
![Screenshot](https://raw.githubusercontent.com/semantifyit/RocketRML/master/rocket-rml.jpg)
# RocketRML
###### For the legacy version with the different behavior of the iterator please see [this](https://github.com/semantifyit/RML-mapper/tree/legacy) version.
This is a javascript RML-mapper implementation for the RDF mapping language ([RML](http://rml.io/spec.html)).## Try it out
If you want to try the mapper without installing you can also see a working demo with a graphical interface [here](https://semantifyit.github.io/rml/)
## Install
To install it via npm:npm install rocketrml
## Quick-start
After installation, you can to copy [index.js](https://github.com/semantifyit/RocketRML/blob/master/docs/node/index.js) into your current working directory.
Also, the [mapfile.ttl](https://github.com/semantifyit/RocketRML/blob/master/docs/node/mapping.ttl) and the [input](https://github.com/semantifyit/RocketRML/blob/master/docs/node/input.json) is needed.Then you can execute your first example via:
node index.js
which starts the execution and the output is then written to ./out.n3.
Also, an example Dockerfile can be seen [here](https://github.com/semantifyit/RocketRML/blob/master/docs/docker).## What does it support
### RML Vocabulary
the following list contains the current supported classes.rr:TriplesMap is the class of triples maps as defined by R2RML.
rml:LogicalSource is the class of logical sources.
rr:SubjectMap is the class of subject maps.
rr:PredicateMap is the class of predicate maps.
rr:ObjectMap is the class of object maps.
rr:PredicateObjectMap is the class of predicate-object maps.
rr:RefObjectMap is the class of referencing object maps.
rml:referenceFormulation is the class of supported reference formulations.
rr:Join is the class of join conditions.
Missing:rr:R2RMLView is the class of R2RML views.
rr:BaseTableOrView is the class of SQL base tables or views.
rr:GraphMap is the class of graph maps.### Querying languages
The mapper supports XML, JSON and CSV as input format. For querying the data, [JSONPath](https://www.npmjs.com/package/jsonpath-plus) (json), [XPath](https://www.npmjs.com/package/xpath) (xml) and [csv-parse](https://www.npmjs.com/package/csv-parse) (csv) are used. Since JSON is supported natively by javascript, it has a huge speed benefit compared to XML.
Therefore, the mapper also contains a [C++ version](https://github.com/ThibaultGerrier/XpathIterator) (which uses [pugixml](https://pugixml.org/)) of the XML-parser which is disabled by default, but can be enabled via the options parameter `xpathLib: 'pugixml'`.
XPath 3.1 is available through [fontoxpath](https://www.npmjs.com/package/fontoxpath) and must be enabled through the option: `xpathLib: 'fontoxpath'`
## How does it work
The parseFile function in index.js is the entry point.
It takes an input path (the mapping.ttl file) and an output path (where the json output is written).
The function returns a promise, which resolves in the resulting output, but the output is also written to the file system.### The options parameter
```javascript
{
/*
compact jsonld document with provided context
{ http://schema.org/name:"Tom" }
->
{
@context:"http://schema.org/",
name:"Tom"
}
*/
compress: {
'@vocab': "http://schema.org/"
},
/*
If you want n-quads instead of json as output,
you need to define toRDF to true in the options parameter
*/
toRDF: true,
/*
If you want to insert your all objects with their regarding @id's (to get a nesting in jsonld), "Un-flatten" jsonld
*/
replace: true,
/*
You can delete namespaces to make the xpath simpler.
*/
removeNameSpace: {xmlns:"https://xmlnamespace.xml"},
/*
Choose xpath evaluator library, available options: default | xpath (same as default) | pugixml (cpp xpath implementation, previously xmlPerformanceMode:true) | fontoxpath (xpath 3.1 engine)
*/
xpathLib: "default",
/*
ignore input values that are empty string (or whitespace only) (only use a value from the input if value.trim() !== '') (default false)
*/
ignoreEmptyStrings: true,
/*
values that are to be ignored from the input. E.g ignore all input values that are "-"
*/
ignoreValues: ["-"],
/*
You can also use functions to manipulate the data while parsing. (E.g. Change a date to a ISO format, ..)
*/
functions : {**See the Functions section**}
/*
Any options to parse the csv. available: delimiter - default ","
*/
csv: {
// any options from https://csv.js.org/parse/options/
// defaults already set:
columns: true,
skip_empty_lines: true,
}
}
```### How to call the function
```javascript
const parser = require('rocketrml');const doMapping = async () => {
const options = {
toRDF: true,
verbose: true,
xmlPerformanceMode: false,
replace: false,
};
const result = await parser.parseFile('./mapping.ttl', './out.n3', options).catch((err) => { console.log(err); });
console.log(result);
};doMapping();
```
If you do not want to use the file system, you can use
```javascript
const result = await parser.parseFileLive(mapfile, inputFiles, options);
```
Where mapfile is the mapping.ttl as string,
inputFiles is an object where the keys are the file names and the value is the file content as string.
E.g:
```javascript
let inputFiles={
mydata: '{name:test}',
mydata2: 'testdata'
};
```
## Example
Below there is shown a very simple example with no nesting and no array.More can be seen in the tests folder
### Input
```json
{
"name":"Tom A.",
"age":15
}
```### Turtle mapfile
The mapfile must also specify the input source path.
```ttl
@prefix rr: .
@prefix rml: .
@prefix schema: .
@prefix ql: .
@base . #the base for the classes
<#LOGICALSOURCE>
rml:source "./tests/straightMapping/input.json";
rml:referenceFormulation ql:JSONPath;
rml:iterator "$".
<#Mapping>
rml:logicalSource <#LOGICALSOURCE>;
rr:subjectMap [
rr:termType rr:BlankNode;
rr:class schema:Person;
];
rr:predicateObjectMap [
rr:predicate schema:name;
rr:objectMap [ rml:reference "name" ];
];
rr:predicateObjectMap [
rr:predicate schema:age;
rr:objectMap [ rml:reference "age" ];
].```
### Output
```json
[{
"@type": "http://schema.org/Person",
"http://schema.org/name": "Tom A.",
"http://schema.org/age": 15
}]
```## Functions:
To fit our needs, we also had to implement the functionality to programmatically evaluate data during the predicateObjectMap.
Therefore, we also allow the user to write use javascript functions, they defined beforehand and passes through the options parameter.
An example how this works can be seen below:
### Input```json
{
"name":"Tom A.",
"age":15
}
```### Turtle mapfile
The mapfile must also specify the input source path.
```ttl
@prefix rr: .
@prefix xsd: .
@prefix rml: .
@prefix schema: .
@prefix ql: .
@prefix fnml: .
@prefix fno: .
@prefix grel: .
@base . #the base for the classes
<#LOGICALSOURCE>
rml:source "./tests/straightMapping/input.json";
rml:referenceFormulation ql:JSONPath;
rml:iterator "$".
<#Mapping>
rml:logicalSource <#LOGICALSOURCE>;
rr:subjectMap [
rr:termType rr:BlankNode;
rr:class schema:Person;
];
rr:predicateObjectMap [
rr:predicate schema:name;
rr:objectMap [ rml:reference "name" ];
];
rr:predicateObjectMap [
rr:predicate schema:age;
rr:objectMap [ rml:reference "age" ];
];
rr:predicateObjectMap [
rr:predicate schema:description;
rr:objectMap <#FunctionMap>;
].
<#FunctionMap>
fnml:functionValue [
rml:logicalSource <#LOGICALSOURCE> ;
rr:predicateObjectMap [
rr:predicate fno:executes ;
rr:objectMap [ rr:constant grel:createDescription ]
] ;
rr:predicateObjectMap [
rr:predicate grel:inputString ;
rr:objectMap [ rml:reference "name" ]
];
rr:predicateObjectMap [
rr:predicate grel:inputString ;
rr:objectMap [ rml:reference "age" ]
];
] .```
where the option parameter looks like this:
```javascript
let options={
functions: {
'http://users.ugent.be/~bjdmeest/function/grel.ttl#createDescription': function (data) {
let result=data[0]+' is '+data[1]+ ' years old.';
return result;
}
}
};
```### Output
```json[{
"@type": "http://schema.org/Person",
"http://schema.org/name": "Tom A.",
"http://schema.org/age": 15,
"http://schema.org/description": "Tom A. is 15 years old."
}]
```### Description
The <#FunctionMap> has an array of predicateObjectMaps. One of them defines the function with fno:executes and the name of the function in rr:constant.
The others are for the function parameters. The first parameter (rml:reference:"name") is then stored in data[0], the second in data[1] and so on.