https://github.com/incatools/semantic-sql

SQL and SQLite builds of OWL ontologies
https://github.com/incatools/semantic-sql

linkml oaklib obofoundry ontologies owl relation-graph sparql sql

Last synced: 6 months ago
JSON representation

SQL and SQLite builds of OWL ontologies

Host: GitHub
URL: https://github.com/incatools/semantic-sql
Owner: INCATools
License: bsd-3-clause
Created: 2021-05-04T21:31:08.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-02-25T19:18:52.000Z (7 months ago)
Last Synced: 2025-03-29T08:11:14.059Z (6 months ago)
Topics: linkml, oaklib, obofoundry, ontologies, owl, relation-graph, sparql, sql
Language: Python
Homepage: https://incatools.github.io/semantic-sql/
Size: 7.67 MB
Stars: 43
Watchers: 5
Forks: 5
Open Issues: 13
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # SemSQL: standard SQL views for RDF/OWL ontologies

[![PyPI version](https://badge.fury.io/py/semsql.svg)](https://badge.fury.io/py/semsql)

![](https://github.com/incatools/semantic-sql/workflows/Build/badge.svg)

This project provides a standard collection of SQL tables/views for ontologies, such that you can make queries like this,

to find all terms starting with `Abnormality` in [HPO](https://obofoundry.org/ontology/hp).

```sql

$ sqlite db/hp.db

sqlite> SELECT * FROM rdfs_label_statement WHERE value LIKE 'Abnormality of %';

```

|stanza|subject|predicate|object|value|datatype|language|

|---|---|---|---|---|---|---|

|HP:0000002|HP:0000002|rdfs:label||Abnormality of body height|xsd:string||

|HP:0000014|HP:0000014|rdfs:label||Abnormality of the bladder|xsd:string||

|HP:0000022|HP:0000022|rdfs:label||Abnormality of male internal genitalia|xsd:string||

|HP:0000032|HP:0000032|rdfs:label||Abnormality of male external genitalia|xsd:string||

Ready-made SQLite3 builds can also be downloaded for any ontology in [OBO](http://obofoundry.org), using URLs such as https://s3.amazonaws.com/bbop-sqlite/hp.db.gz

[relation-graph](https://github.com/balhoff/relation-graph/) is used to pre-generate tables of [entailed edges](https://incatools.github.io/semantic-sql/EntailedEdge/). For example,

all is-a and part-of ancestors of [finger](http://purl.obolibrary.org/obo/UBERON_0002389) in Uberon:

```sql

$ sqlite db/uberon.db

sqlite> SELECT * FROM entailed_edge WHERE subject='UBERON:0002389' and predicate IN ('rdfs:subClassOf', 'BFO:0000050');

```

|subject, predicate, object|

|---|

|UBERON:0002389, BFO:0000050, UBERON:0015212|

|UBERON:0002389, BFO:0000050, UBERON:5002389|

|UBERON:0002389, BFO:0000050, UBERON:5002544|

|UBERON:0002389, rdfs:subClassOf, UBERON:0000061|

|UBERON:0002389, rdfs:subClassOf, UBERON:0000465|

|UBERON:0002389, rdfs:subClassOf, UBERON:0000475|

SQLite provides many advantages

- files can be downloaded and subsequently queried without network latency

- compared to querying a static rdf, owl, or obo file, there is no startup/parse delay

- robust and performant

- excellent support in many languages

Although the focus is on SQLite, this library can also be used for other DBMSs like PostgreSQL, MySQL, Oracle, etc

## Tutorials

- SemSQL: [notebooks/SemanticSQL-Tutorial.ipynb](https://github.com/INCATools/semantic-sql/blob/main/notebooks/SemanticSQL-Tutorial.ipynb)

- Using OAK: [part 7 of OAK tutorial](https://incatools.github.io/ontology-access-kit/intro/tutorial07.html)

## Installation

SemSQL comes with a helper Python library. Use of this is optional. To install:

```bash

pip install semsql

```

## Download ready-made SQLite databases

Pre-generated SQLite database are created weekly for all OBO ontologies and a selection of others (see [ontologies.yaml](https://github.com/INCATools/semantic-sql/blob/main/src/semsql/builder/registry/ontologies.yaml))

To download:

```bash

semsql download obi -o obi.db

```

Or simply download using URL of the form:

- https://s3.amazonaws.com/bbop-sqlite/hp.db.gz

## Attaching databases

If you are using sqlite3, then databases can be attached to facilitate cross-database joins.

For example, many ontologies use ORCID URIs as the object of `dcterms:contributor` and `dcterms:creator` statements, but these are left "dangling". Metadata about these orcids are available in the semsql orcid database instance (derived from [wikidata-orcid-ontology](https://github.com/cthoyt/wikidata-orcid-ontology)), in the [Orcid table](https://incatools.github.io/semantic-sql/Orcid).

You can use [ATTACH DATABASE](https://www.sqlite.org/lang_attach.html) to connect two databases, for example:

```sql

$ sqlite3 db/cl.dl

sqlite> attach 'db/orcid.db' as orcid_db;

sqlite> select * from contributor inner join orcid_db.orcid on (orcid.id=contributor.object) where orcid.label like 'Chris%';

obo:cl.owl|obo:cl.owl|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

CL:0010001|CL:0010001|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

CL:0010002|CL:0010002|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

CL:0010003|CL:0010003|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

CL:0010004|CL:0010004|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000093|UBERON:0000093|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000094|UBERON:0000094|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000095|UBERON:0000095|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000179|UBERON:0000179|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000201|UBERON:0000201|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000202|UBERON:0000202|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000203|UBERON:0000203|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

UBERON:0000204|UBERON:0000204|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall

```

## Creating a SQLite database from an OWL file

There are two protocols for doing this:

1. install build dependencies

2. use Docker

In either case:

- The input MUST be in RDF/XML serialization and have the suffix `.owl`:

- use robot to convert if format is different

We are planning to simplify this process in future.

### 1. Build a SQLite database directly

This requires some basic technical knowledge about how to install things on your machine

and how to put things in your PATH. It does not require Docker.

Requirements:

- [rdftab.rs](https://github.com/ontodev/rdftab.rs)

- [relation-graph](https://github.com/balhoff/relation-graph) `2.3.1` or higher

After installing these and putting both `relation-graph` and `rdftab.rs` in your path:

```bash

semsql make foo.db

```

This assumes `foo.owl` is in the same folder

### 2. Use Docker

There are two docker images that can be used:

- [ODK](https://hub.docker.com/r/obolibrary/odkfull)

- [semantic-sql](https://hub.docker.com/repository/docker/linkml/semantic-sql)

The ODK image may lag behind

```bash

docker run  -v $PWD:/work -w /work -ti linkml/semantic-sql semsql make foo.db

```

## Schema

See [Schema Documentation](https://incatools.github.io/semantic-sql/)

The [source schema](https://github.com/INCATools/semantic-sql/tree/main/src/semsql/linkml) is in [LinkML](https://linkml.io) - this is then compiled down to SQL Tables and Views

The basic idea is as follows:

There are a small number of "base tables":

* [statements](https://incatools.github.io/semantic-sql/Statements/)

* [prefix](https://incatools.github.io/semantic-sql/Prefix/)

* [entailed_edge](https://incatools.github.io/semantic-sql/EntailedEdge/) - populated by relation-graph

All other tables are actually views (derived tables), and are provided for convenience.

## ORM Layer

A SemSQL relational database can be accessed in exactly the same way as any other SQLdb

For convenience, we provide a Python Object-Relational Mapping (ORM) layer using SQL Alchemy.

This allows for code uchlike the following, which joins [RdfsSubclassOfStatement](https://incatools.github.io/semantic-sql/RdfsSubclassOfStatement) and [existential restrictions](https://incatools.github.io/semantic-sql/OwlSomeValuesFrom):

```python

engine = create_engine(f"sqlite:////path/to/go.db")

SessionClass = sessionmaker(bind=engine)

session = SessionClass()

q = session.query(RdfsSubclassOfStatement)

q = q.add_entity(OwlSomeValuesFrom)

q = q.join(OwlSomeValuesFrom, RdfsSubclassOfStatement.object == OwlSomeValuesFrom.id)

lines = []

for ax, ex in q.all():

    line = f'{ax.subject} subClassOf {ex.on_property} SOME {ex.filler}'

    logging.info(line)

    lines.append(line)

```    

(this example is just for illustration - to do the same thing there is a simpler Edge relation)

## Applications

The semsql python library is intentionally low level - we recommend using the [ontology-access-kit](https://github.com/INCATools/ontology-access-kit)

For example:

```bash

runoak -i db/envo.db search t~biome

```

You can also pass in an OWL file and have the sqlite be made on the fly

```bash

runoak -i sqlite:envo.owl search t~biome

```

Even if using OAK, it can be useful to access SQL tables directly to do complex multi-join queries in a performant way.

## Optimization

```bash

poetry run semsql view2table edge --full-index | sqlite3 $db/mydb.db

```

See [indexes](indexes) for some ready-made indexes

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/incatools/semantic-sql

Awesome Lists containing this project

README