https://github.com/smartdataanalytics/sparqlify

Sparql -> SQL Rewriter enabling virtual RDB -> RDF mappings
https://github.com/smartdataanalytics/sparqlify

Last synced: 24 days ago
JSON representation

Sparql -> SQL Rewriter enabling virtual RDB -> RDF mappings

Host: GitHub
URL: https://github.com/smartdataanalytics/sparqlify
Owner: SmartDataAnalytics
Created: 2012-03-04T20:02:46.000Z (about 13 years ago)
Default Branch: develop
Last Pushed: 2024-05-21T15:41:40.000Z (12 months ago)
Last Synced: 2024-05-29T00:03:16.096Z (11 months ago)
Language: Java
Homepage: http://aksw.org/Projects/Sparqlify
Size: 38.5 MB
Stars: 120
Watchers: 13
Forks: 13
Open Issues: 52
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Sparqlify SPARQL->SQL rewriter
[![Build Status](http://ci.aksw.org/jenkins/job/Sparqlify/badge/icon)](http://ci.aksw.org/jenkins/job/Sparqlify/)

## Introduction

Sparqlify is a scalable SPARQL-SQL rewriter whose development began in April 2011 in the course of the [LinkedGeoData](http://linkedgeodata.org) project.

This system's features/traits are:
* Support of the ['Sparqlification Mapping Language' (SML)](http://sparqlify.org/wiki/SML), an intuitive language for expressing RDB-RDF mappings with only very little syntactic noise.
* Scalability: Sparqlify does not evaluate expressions in memory. All SPARQL filters end up in the corresponding SQL statement, giving the underlying RDBMS has maximum control over query planning.
* A powerful rewriting engine that analyzes filter expressions in order to eleminate self joins and joins with unsatisfiable conditions.
* Initial support for spatial datatypes and predicates.
* A subset of the SPARQL 1.0 query language plus sub queries are supported.
* Tested with PostgreSQL/Postgis and H2. Support for further databases is planned.
* CSV support
* R2RML will be supported soon

## Functions
SPARQL-to-SQL function mappings are specified in the file [functions.xml](sparqlify-core/src/main/resources/functions.xml).

Standard SPARQL functions

| SPARQL function | SQL Definition |
|-----------------| ----------------|
| boolean strstarts(string, string) | strpos($1$, $2$) = 1|
| TODO | |

Spatial Function Extensions

| SPARQL function | SQL Definition |
|-----------------| ----------------|
| TODO | |

## Supported SPARQL language features
* Join, LeftJoin (i.e. Optional), Union, Sub queries
* Filter predicates: comparison: (<=, <, =, >, >=) logical: (!, &&; ||) arithmetic: (+, -) spatial: st\_intersects, geomFromText; other: regex, lang, langMatches
* Aggregate functions: Count(\*)
* Order By is pushed into the SQL

## Debian packages

Sparqlify Debian packages can be obtained by following means:
* Via the [Linked Data Stack](http://stack.linkeddata.org) (recommended)
* Download from the [Sparqlify website's download section](http://sparqlify.org/downloads/releases).
* Directly from source using maven (read down the README)

### Public repositories

After setting up any of the repositories below, you can install sparqlify with apt using

* apt: `sudo apt-get install sparqlify-cli

#### Linked Data Stack (this is what you want)

Sparqlify is distributed at the [Linked Data Stack](http://stack.linkeddata.org), which offers many great tools done by various contributors of the Semantic Web community.

* The repository is available in the flavors `nightly`, `testing` and `stable` [here](http://stack.linkeddata.org/download/repo.php).

```bash
# !!! Replace stable with nightly or testing as needed !!!

# Download the repository package
wget http://stack.linkeddata.org/ldstable-repository.deb

# Install the repository package
sudo dpkg -i ldstable-repository.deb

# Update the repository database
sudo apt-get update
```

#### Bleeding Edge (Not recommended for production)
For the latest development version (built on every commit) perform the following steps

Import the public key with

wget -qO - http://cstadler.aksw.org/repos/apt/conf/packages.precise.gpg.key | sudo apt-key add -

Add the repository

echo 'deb http://cstadler.aksw.org/repos/apt precise main contrib non-free' | sudo tee -a /etc/apt/sources.list.d/cstadler.aksw.org.list

Note that this also works with distros other than "precise" (ubuntu 12.04) such as ubuntu 14.04 or 16.04.

## Building
Building the repository creates the JAR files providing the `sparqlify-*` tool suite.

One of the plugins requires the `xjc` command (for compiling an XML schema to Java classes) which is no longer part of the jdk. The following package provides it:
```bash
sudo apt install jaxb
```

### Debian package
Building debian packages from this repo relies on the [Debian Maven Plugin](http://debian-maven.sourceforge.net]) plugin, which requires a debian-compatible environment.
If such an environment is present, the rest is simple:

# Install all shell scripts necessary for creating deb packages
sudo apt-get install devscripts

# Execute the follwing from the `/sparqlify-core` folder:
mvn clean install deb:package

# Upon sucessful completion, the debian package is located under `/sparqlify-core/target`
# Install using `dpkg`
sudo dpkg -i sparqlify_.deb

# Uninstall using dpkg or apt:
sudo dpkg -r sparqlify
sudo apt-get remove sparqlify

### Assembly based
Another way to build the project is run the following commands at ``

mvn clean install

cd sparqlify-cli
mvn assembly:assembly

This will generate a single stand-alone jar containing all necessary dependencies.
Afterwards, the shell scripts under `sparqlify-core/bin` should work.

## Tool suite

If Sparqlify was installed from the debian package, the following commands are available system-wide:

* `sparqlify`: This is the main executable for running individual SPARQL queries, creating dumps and starting a stand-alone server.
* `sparqlify-csv`: This tool can create RDF dumps from CSV file based on SML view definitions.
* `sparqlify-platform`: A stand-alone server component integrating additional projects.

These tools write their output (such as RDF data in the N-TRIPLES format) to STDOUT. Log output goes to STDERR.

### sparqlify
Usage: `sparqlify [options]`

Options are:

* Setup
* -m SML view definition file

* Database Connectivity Settings
* -h Hostname of the database (e.g. localhost or localhost:5432)
* -d Database name
* -u User name
* -p Password
* -j JDBC URI (mutually exclusive with both -h and -d)

* Quality of Service
* -n Maximum result set size
* -t Maximum query execution time in seconds (excluding rewriting time)

* Stand-alone Server Configuration
* -P Server port [default: 7531]

* Run-Once (these options prevent the server from being started and are mutually exclusive with the server configuration)
* -D Create an N-TRIPLES RDF dump on STDOUT
* -Q [SPARQL query] Runs a SPARQL query against the configured database and view definitions

#### Example
The following command will start the Sparqlify HTTP server on the default port.

sparqlify -h localhost -u postgres -p secret -d mydb -m mydb-mappings.sml -n 1000 -t 30

Agents can now access the SPARQL endpoint at `http://localhost:7531/sparql`

### sparqlify-csv
Usage: `sparqlify-csv [options]`

* Setup
* -m SML view definition file
* -f Input data file
* -v View name (can be omitted if the view definition file only contains a single view)

* CSV Parser Settings
* -d CSV field delimiter (default is '"')
* -e CSV field escape delimiter (escapes the field delimiter) (default is '\')
* -s CSV field separator (default is ',')
* -h Use first row as headers. This option allows one to reference columns by name additionally to its index.

### sparqlify-platform (Deprecated; about to be superseded by sparqlify-web-admin)
The Sparqlify Platform (under /sparqlify-platform) bundles Sparqlify with the Linked Data wrapper [Pubby](https://github.com/cygri/pubby) and the SPARQL Web interface [Snorql](https://github.com/kurtjx/SNORQL).

Usage: `sparqlify-platform config-dir [port]`

* `config-dir` Path to the configuration directory, e.g. ``
* `port` Port on which to run the platform, default 7531.

For building, at the root of the project (outside of the sparqlify-\* directories), run `mvn compile` to build all modules.
Afterwards, lauch the platform using:

cd sparqlify-platform/bin
./sparqlify-platform

Assuming the platform runs under `http://localhost:7531`, you can access the following services relative to this base url:
* `/sparql` is Sparqlify's SPARQL endpoint
* `/snorql` shows the SNORQL web frontend
* `/pubby` is the entry point to the Linked Data interface

#### Configuration
The configDirectory argument is mandatory and must contain a *sub-directory* for the context-path (i.e. `sparqlify-platform`) in turn contains the files:
* `platform.properties` This file contains configuration parameters that can be adjusted, such as the database connection.
* `views.sparqlify` The set of Sparqlify view definition to use.

I recommend to first create a copy of the files in `/sparqlify-platform/config/example` under a different location, then adjust the parameters and finally launch the platform with `-DconfigDirectory=...` set appropriately.

The platform *applies autoconfiguration to Pubby and Snorql*:
* Snorql: Namespaces are those of the views.sparqlify file.
* Pubby: The host name of all resources generated in the Sparqlify views is replaced with the URL of the platform (currently still needs to be configured via `platform.properties`)

Additionally you probably want to make the URIs nice by e.g. configuring an apache reverse proxy:

Enable the apache `proxy_http` module:

sudo a2enmod proxy_http

Then in your `/etc/apache2/sites-available/default` add lines such as

ProxyRequest Off
ProxyPass /resource http://localhost:7531/pubby/bizer/bsbm/v01/ retry=1
ProxyPassReverse /resource http://localhost:7531/pubby/bizer/bsbm/v01/

These entries will enable requests to `http://localhost/resource/...` rather than `http//localhost:7531/pubby/bizer/bsbm/v01/`.

The `retry=1` means, that apache only waits 1 seconds before retrying again when it encounters an error (e.g. HTTP code 500) from the proxied resource.

*IMPORTANT: ProxyRequests are off by default; DO NOT ENABLE THEM UNLESS YOU KNOW WHAT YOU ARE DOING. Simply enabling them potentially allows anyone to use your computer as a proxy.*

## SML Mapping Syntax:
A Sparqlification Mapping Language (SML) configuration is essentially a set of CREATE VIEW statements, somewhat similar to the CREATE VIEW statement from SQL.
Probably the easiest way to learn to syntax is to look at the following resources:

* The [SML documentation](http://sparqlify.org/wiki/SML)
* The [SML test suite](https://github.com/AKSW/Sparqlify/tree/master/sparqlify-core/src/test/resources/org/aksw/sml/r2rml_tests) which is derived from the [R2RML test suite](https://github.com/AKSW/Sparqlify/tree/master/sparqlify-core/src/test/resources/org/w3c/r2rml_tests).

Two more examples are from

Additionally, for convenience, prefixes can be declared, which are valid throughout the config file.
As comments, you can use //, /\* \*/, and #.

For a first impression, here is a quick example:

/* This is a comment
* /* You can even nest them! */
*/
// Prefixes are valid throughout the file
Prefix dbp:
Prefix ex:

Create View myFirstView As
Construct {
?s a dbp:Person .
?s ex:workPage ?w .
}
With
?s = uri('http://mydomain.org/person', ?id) // Define ?s to be an URI generated from the concatenation of a prefix with mytable's id-column.
?w = uri(?work_page) // ?w is assigned the URIs in the column 'work_page' of 'mytable'
Constrain
?w prefix "http://my-organization.org/user/" // Constraints can be used for optimization, e.g. to prune unsatisfiable join conditions
From
mytable; // If you want to use an SQL query, the query (without trailing semicolon) must be enclosed in double square brackets: [[SELECT id, work_page FROM mytable]]

### Notes for sparqlify-csv
For `sparqlify-csv` view definition syntax is almost the same as above; the differences being:

* Instead of `Create View viewname As Construct` start your views with `CREATE VIEW TEMPLATE viewname As Construct`
* There is no FROM and CONSTRAINT clause

Colums can be referenced either by name (see the -h option) or by index (1-based).

#### Example

// Assume a CSV file with the following columns (osm stands for OpenStreetMap)
(city\_name, country\_name, osm\_entity\_type, osm\_id, longitude, latitude)

Prefix fn: //Needed for urlEncode and urlDecode.
Prefix rdfs:
Prefix owl:
Prefix xsd:
Prefix geo:

Create View Template geocode As
Construct {
?cityUri
owl:sameAs ?lgdUri .

?lgdUri
rdfs:label ?cityLabel ;
geo:long ?long ;
geo:lat ?lat .
}
With
?cityUri = uri(concat("http://fp7-pp.publicdata.eu/resource/city/", fn:urlEncode(?2), "-", fn:urlEncode(?1)))
?cityLabel = plainLiteral(?1)
?lgdUri = uri(concat("http://linkedgeodata.org/triplify/", ?4, ?5))
?long = typedLiteral(?6, xsd:float)
?lat = typedLiteral(?7, xsd:float)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/smartdataanalytics/sparqlify

Awesome Lists containing this project

README