Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/SANSA-Stack/SANSA-Stack

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
https://github.com/SANSA-Stack/SANSA-Stack

apache-jena apache-spark distributed-computing flink rdf semantic-web spark

Last synced: 2 months ago
JSON representation

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

Awesome Lists containing this project

README

        

# SANSA-Stack

[![Build Status](https://github.com/SANSA-Stack/SANSA-Stack/workflows/CI/badge.svg)](https://github.com/SANSA-Stack/SANSA-Stack/actions?query=workflow%3ACI)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Twitter](https://img.shields.io/twitter/follow/SANSA_Stack.svg?style=social)](https://twitter.com/SANSA_Stack)

This project comprises the whole Semantic Analytics Stack (SANSA). At a glance, it features the following functionality:

* Ingesting RDF and OWL data in various formats into RDDs
* Operators for working with RDDs and data frames of RDF data at various levels (triples, bindings, graphs, etc)
* *Transformation* of RDDs to data frames and *partitioning* of RDDs into R2RML-mapped data frames
* Distributed SPARQL querying over R2RML-mapped data frame partitions using RDB2RDF engines (Sparqlify & Ontop)
* Enrichment of RDDs with inferences
* Application of machine learning algorithms

For a detailed description of SANSA, please visit http://sansa-stack.net.

## Layers
The SANSA project is structured in the following five layers developed in their respective sub-folders:

* [RDF](sansa-rdf)
* [OWL](sansa-owl)
* [Query](sansa-query)
* [Inference](sansa-inference)
* [ML](sansa-ml)

## Release Cycle
A SANSA stack release is done every six months and consists of the latest stable versions of each layer at this point. This repository is used for organising those joint releases.

## Usage

### Spark

#### Requirements

We currently require a Spark 3.x.x with Scala 2.12 setup. A Spark 2.x version can be built from source based on the [spark2](https://github.com/SANSA-Stack/SANSA-Stack/tree/spark2) branch.

#### Release Version
Some of our dependencies are not in Maven central (yet), so you need to add following Maven repository to your project POM file `repositories` section:
```xml

maven.aksw.internal
AKSW Release Repository
http://maven.aksw.org/archiva/repository/internal

true


false

```

If you want to import the full SANSA Stack, please add the following Maven dependency to your project POM file:
```xml

net.sansa-stack
sansa-stack-spark_2.12
$LATEST_RELEASE_VERSION$

```
If you only want to use particular layers, just replace `$LAYER_NAME$` with the corresponding name of the layer
```xml

net.sansa-stack
sansa-$LAYER_NAME$-spark_2.12
$LATEST_RELEASE_VERSION$

```

#### SNAPSHOT Version
While the release versions are available on Maven Central, latest SNAPSHOT versions have to be installed from source code:
```bash
git clone https://github.com/SANSA-Stack/SANSA-Stack.git
cd SANSA-Stack
```
Then to build and install the full SANSA Spark stack you can do
```bash
./dev/mvn_install_stack_spark.sh
```
or for a single layer `$LAYER_NAME$` you can do
```bash
mvn -am -DskipTests -pl :sansa-$LAYER_NAME$-spark_2.12 clean install
```

Alternatively, you can use the following Maven repository and add it to your project POM file `repositories` section:
```xml

maven.aksw.snapshots
AKSW Snapshot Repository
http://maven.aksw.org/archiva/repository/snapshots

false


true

```
Then do the same as for the release version and add the dependency:
```xml

net.sansa-stack
sansa-stack-spark_2.12
$LATEST_SNAPSHOT_VERSION$

```

## How to Contribute
We always welcome new contributors to the project! Please see [our contribution guide](http://sansa-stack.net/contributing-to-sansa/) for more details on how to get started contributing to SANSA.