Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vemonet/shapes-of-you

💠 An index for linked open data & standard knowledge descriptions (ontologies, vocabularies, shapes, queries, mappings)
https://github.com/vemonet/shapes-of-you

git grlc index mappings obofoundry ontologies owl registry shacl shacl-shapes shapes shex skos sparql

Last synced: 3 days ago
JSON representation

💠 An index for linked open data & standard knowledge descriptions (ontologies, vocabularies, shapes, queries, mappings)

Awesome Lists containing this project

README

        

### Access standard knowledge indexed from code repositories, connected to the Linked Open Data access points

[![Deploy to GitHub Pages](https://github.com/vemonet/shapes-of-you/workflows/Deploy%20website/badge.svg)](https://github.com/vemonet/shapes-of-you/actions?query=workflow%3A%22Deploy+website%22) [![CodeQL analysis](https://github.com/vemonet/shapes-of-you/workflows/CodeQL%20analysis/badge.svg)](https://github.com/vemonet/shapes-of-you/actions?query=workflow%3A%22CodeQL+analysis%22)

đŸ–Ĩ Access the web app at **[index.semanticscience.org](http://index.semanticscience.org)**

đŸ“Ŧ Query our knowledge graph using the OpenAPI at **[grlc.io/api-git/vemonet/shapes-of-you/subdir/api](http://grlc.io/api-git/vemonet/shapes-of-you/subdir/api)** (powered by [grlc.io](http://grlc.io) and [SPARQL](https://www.w3.org/TR/sparql11-query/))

✨ Directly query the **SPARQL endpoint on YASGUI** at https://graphdb.dumontierlab.com/repositories/shapes-registry.

The SPARQL endpoint is also conveniently accessible in the webapp **Active endpoints** tab, since Shapes of You indexes its own SPARQL query files, and computes metadata for its SPARQL endpoint.

**Shapes of you** is a global index for semantically descriptive files published to public Git repositories ([GitHub](https://github.com), [GitLab](https://gitlab.com), and [Gitee](https://gitee.com/)), it enables semantic web enthusiast to connect those standard knowledge definitions to active Linked Open Data access points (SPARQL endpoints).

To be found by our indexer, make sure your repository description, or topics, on [GitHub](https://github.com), [GitLab](https://gitlab.com), or [Gitee](https://gitee.com) includes one of the resources mentionned below, we automatically index files from public repositories every week on saturday at 1:00 GMT+1 🕐

* **SHACL shapes**: we index RDF files such as `.ttl`, `.rdf`, `.jsonld`, etc), with all `sh:NodeShape` they contain
* **ShEx expressions**: we index `.shex` files, and ShEx shapes defined in RDF files
* **SPARQL queries**: we index `.rq` and `.sparql` files, and parse [grlc.io](http://grlc.io) APIs metadata
* **OWL ontologies**: we index all RDF files with all `owl:Class` they contain
* **SKOS vocabularies**: we index all RDF files with all `skos:Concept` they contain
* **RML mappings**: we index RDF files, with all `r2rml:SubjectMap` and `rml:LogicalSource` they contain
* **R2RML mappings**: we index RDF files, with all `r2rml:SubjectMap` they contain
* **[CSVW](https://www.w3.org/TR/tabular-data-primer/) metadata**: we index RDF files, with all `csvw:Column` they contain
* **Nanopublication templates**: we index RDF files, with all `nt:AssertionTemplates` and inputs they contain
* **OBO ontologies**: we index all `.obo` files with all terms they contain
* **OpenAPI specifications**: we index `.yml`, `.yaml` and `.json` files, and parse the spec to retrieve API metadata
* **DCAT datasets**: we index RDF files, with all `dcat:Dataset` they contain

If your repository or endpoint is missed by our indexer:

* Additional GitHub repositories in the file [`EXTRAS_GITHUB_REPOSITORIES.txt`](https://github.com/vemonet/shapes-of-you/blob/main/EXTRAS_GITHUB_REPOSITORIES.txt)

* Additional SPARQL endpoints in the file [`EXTRAS_SPARQL_ENDPOINTS.txt`](https://github.com/vemonet/shapes-of-you/blob/main/EXTRAS_SPARQL_ENDPOINTS.txt)

## Technical overview 🧭

This web service is composed of those 4 main parts, described more in details below:

* A python script to retrieve SPARQL queries, SHACL & ShEx Shapes files with some metadata from GitHub repositories. The retrieved data is defined using [RDF](https://www.w3.org/RDF/).
* A [GitHub Actions workflow](https://github.com/vemonet/shapes-of-you/actions?query=workflow%3A%22Deploy+to+GitHub+Pages%22) runs every week on saturday night to execute the python script, and publish the RDF output to the triplestore
* A React web app written in TypeScript, which displays the files and metadata from the SPARQL endpoint with filters, and search
* The website is automatically deployed by [GitHub Actions workflows](https://github.com/vemonet/shapes-of-you/actions?query=workflow%3A%22Deploy+to+GitHub+Pages%22) to [GitHub Pages](https://index.semanticscience.org) at each push to the `main` branch.
* We use [expo](https://expo.io/) to build this [Progressive Web App](https://web.dev/progressive-web-apps/) (aka. PWA), it can be installed as a native app on any computer desktop (using Chrome is recommended), or smartphones.
* A triplestore with a publicly available SPARQL endpoint at https://graphdb.dumontierlab.com/repositories/shapes-registry
* A grlc.io powered OpenAPI to query the SPARQL endpoint at http://grlc.io/api-git/vemonet/shapes-of-you
* Most SPARQL queries used by the webapp are also provided as API calls

![Shapes of You architecture](/website/assets/shapes-of-you-architecture.png)

---

## Data model 📋

We defined and published a simple schema for our data as a OWL ontology, mainly re-using schema.org concepts.

Checkout the OWL ontology in [`website/assets/shapes-of-you-ontology.ttl` đŸĻ‰](/website/assets/shapes-of-you-ontology.ttl)

Here is an overview of the ontology (generated by [gra.fo](https://gra.fo/)):

![Ontology overview](/website/assets/shapes-of-you-ontology.png)

### Prefixes

Just copy/paste this if you are missing some prefixes to query the Shapes of You knowledge graph:

```SPARQL
PREFIX rdfs:
PREFIX dc:
PREFIX dct:
PREFIX dcat:
PREFIX owl:
PREFIX skos:
PREFIX sio:
PREFIX schema:
PREFIX sh:
PREFIX shex:
PREFIX void:
PREFIX void-ext:
PREFIX sdm:
PREFIX r2rml:
PREFIX rml:
PREFIX nt:
PREFIX csvw:
PREFIX foaf:
```

### Classes

* "Shape" files: `schema:SoftwareSourceCode`
* Properties:
* `dcterms:hasPart`
* `rdfs:comment`
* `schema:codeRepository` > `schema:DataCatalog`
* Subclasses:
* `sh:Shape` (SHACL shape)
* `shex:Schema` (ShEX schema)
* `sh:SPARQLFunction` (SPARQL query) - additional properties: `void:sparqlEndpoint`, `schema:query`
* `owl:Ontology` (OWL ontology)
* `skos:ConceptScheme` (SKOS vocabulary)
* `sio:000623` (OBO ontology)
* `schema:APIReference` (OpenAPI)
* `rml:LogicalSource` (RML and YARRRML mappings)
* `r2rml:TriplesMap` (R2RML mappings)
* `nt:AssertionTemplate` (Nanopublication templates)
* `dcat:Dataset` (DCAT datasets)
* Git repositories: `schema:DataCatalog`
* Properties:
* `rdfs:comment`
* Active SPARQL endpoints:`schema:EntryPoint`

---

## Run the web app 🛩ī¸

Requirements: [npm](https://www.npmjs.com/get-npm) and [yarn](https://classic.yarnpkg.com/en/docs/install/#debian-stable) installed.

### In development 🏗

Clone the repository:

```bash
git clone https://github.com/vemonet/shapes-of-you
cd shapes-of-you
```

Install dependencies :inbox_tray:

```bash
yarn
```

Run the web app on http://localhost:19006, it should reload automatically at each changes to the code :arrows_clockwise:

```bash
yarn dev
```

Upgrade the packages versions in `yarn.lock` 🔒

```bash
yarn upgrade
```

### In production 🌍

This website is automatically deployed by a [GitHub Actions workflow](https://github.com/vemonet/shapes-of-you/actions?query=workflow%3A%22Deploy+to+GitHub+Pages%22) to GitHub Pages which is accessed from http://index.semanticscience.org

You can also build locally in the `/web-build` folder and serve on http://localhost:5000 (checkout the `Dockerfile`)

```bash
yarn build
yarn serve
```

## Deploy the backend

Deploy the Oxigraph triplestore and ElasticSearch index using [Docker :whale:](https://docs.docker.com/get-docker/) (requires [docker installed](https://docs.docker.com/get-docker/))

1. Make sure the folder for ElasticSearch has the right permissions

```bash
mkdir -p /data/shapes-of-you/elasticsearch
sudo chown -R 1000:0 /data/shapes-of-you/elasticsearch
```

2. Deploy the stack

```bash
docker-compose up -d
```

> Checkout the [docker-compose.yml](/docker-compose.yml) file to see how we run the Docker image.

---

## ⛏ī¸ Index structured and semantic files

Requirements: Python 3.6+, git

### 🗃ī¸ Index files from code repositories

This script is run every day by the mighty [`.github/workflows/index-shapes.yml`](https://github.com/vemonet/shapes-of-you/blob/main/.github/workflows/index-shapes.yml) workflow

The Python script retrieves shapes files from various popular Git services API (GitHub GraphQL API, GitLab API , Gitee API), and generates RDF data. The RDF data is then automatically published to the publicly available triplestore by the GitHub workflow.

You can find the python scripts and requirements in the [`etl`](https://github.com/vemonet/shapes-of-you/tree/main/etl) folder.

Use this command to locally define the `API_GITHUB_TOKEN`, `GITLAB_TOKEN` and `GITEE_TOKEN` **environment variables required** to run the script (you might need to adapt on Windows, but you should know better than me):

```bash
export API_GITHUB_TOKEN=MYGITHUBTOKEN000
export GITLAB_TOKEN=MYGITLABTOKEN000
export GITEE_TOKEN=MYGITEETOKEN000
```

> Add those commands to your `.zshrc` or `.bashrc` to make it permanent

For GitHub you can create a new GitHub API key (aka. personal access token) at https://github.com/settings/tokens

Go to the `etl` folder:

```bash
cd etl
```

Install the requirements:

```bash
pip install -e .
```

Retrieve shapes files from search the [GitHub GraphQL API](https://developer.github.com/v4/explorer) (you can also use a topic to search, e.g. `topic:sparql`):

```bash
python3 main.py github vemonet/shapes-of-you
```

Retrieve shapes files from [GitLab API](https://docs.gitlab.com/ee/api/) using the [`python-gitlab` package](https://pypi.org/project/python-gitlab/):

```bash
python3 main.py gitlab sparql
```

Retrieve shapes files from [Gitee API](https://gitee.com/api/v5/swagger#/getV5SearchRepositories):

```bash
python3 main.py gitee ontology
```

### ✨ Generate SPARQL endpoints metadata

This task is performed every day by the swifty [`.github/workflows/analyze-endpoints.yml`](https://github.com/vemonet/shapes-of-you/blob/main/.github/workflows/analyze-endpoints.yml) workflow

We use the [`d2s`](https://github.com/MaastrichtU-IDS/d2s-cli) tool (aka. data2services) to generate [HCLS metadata](https://www.w3.org/TR/hcls-dataset/) for a SPARQL endpoint:

```bash
pip install d2s
d2s metadata analyze https://graphdb.dumontierlab.com/repositories/shapes-registry -o metadata.ttl
```

We commit the generated metadata file to the `metadata` branch, to experiment using git to version and keep track of changes of the metadata generated for the SPARQL endpoints over time.

### Enable Virtuoso Linked Data Platform

**Enable WebDAV LDP** on Virtuoso 7 (from the [official Virtuoso documentation](http://vos.openlinksw.com/owiki/wiki/VOS/VirtLDP))

Start the `virtuoso-opensource-7` docker image

```bash
docker-compose up -d
```

The first time you start Virtuoso, or after you reset the database, you will need to run this script to prepare the Linked Data Platform:

```bash
./prepare_virtuoso.sh
```

To prepare for shapes-of-you, **create folders** `github`, `gitlab`, `gitee`, `apis` and `endpoints` using the same owner and permission as for the `ldp` folder.

**Test** by uploading a turtle file to the LDP (change the password before):

```bash
curl -u ldp:$ENDPOINT_PASSWORD --data-binary @shapes-rdf.ttl -H "Accept: text/turtle" -H "Content-type: text/turtle" -H "Slug: test-shapes-rdf" https://data.index.semanticscience.org/DAV/home/ldp/github
```

**Enable CORS** to query the Virtuoso SPARQL endpoint from JavaScript. See the [Virtuoso CORS documentation](http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksCORsEnableSPARQLURLs).

* Go to **Web Application Server** > **Virtual Domains & Directories**
* Expand **Interface** for the **Default Web Site**
* Locate the `/sparql` Logical Path > click **Edit**
* Enter **`\*`** in the **Cross-Origin Resource Sharing** input field.

## 👩‍đŸ’ģ Contribute

Contributions are welcome! See the [guidelines to contribute](/CONTRIBUTING.md).

## 🤝 Acknowledgements

RDF data hosted in a [Oxigraph](https://github.com/oxigraph/oxigraph) triplestore (open source)

OpenAPI powered by [grlc.io](http://grlc.io)

SPARQL query UI powered by [Triply's YASGUI](https://yasgui.triply.cc/)

Ontology built with [gra.fo](https://gra.fo)

Data processing workflows run for free using [GitHub Actions](https://github.com/features/actions) open source plan

Files parsed using python libraries: `rdflib`, `obonet`, `prance`