Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rdfostrich/ostrich
🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)
https://github.com/rdfostrich/ostrich
rdf semantic-web triplestore versioning
Last synced: 2 months ago
JSON representation
🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)
- Host: GitHub
- URL: https://github.com/rdfostrich/ostrich
- Owner: rdfostrich
- License: other
- Created: 2017-07-20T09:57:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-09-15T07:59:01.000Z (over 1 year ago)
- Last Synced: 2024-08-01T12:15:58.339Z (5 months ago)
- Topics: rdf, semantic-web, triplestore, versioning
- Language: C++
- Homepage:
- Size: 810 KB
- Stars: 42
- Watchers: 8
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-starred - rdfostrich/ostrich - 🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets) (others)
README
# OSTRICH
_Offset-enabled TRIple store for CHangesets_[![test-on-commit](https://github.com/rdfostrich/ostrich/actions/workflows/ostrich_test.yml/badge.svg)](https://github.com/rdfostrich/ostrich/actions/workflows/ostrich_test.yml)
[![Docker Automated Build](https://img.shields.io/docker/automated/rdfostrich/ostrich.svg)](https://hub.docker.com/r/rdfostrich/ostrich/)
[![DOI](https://zenodo.org/badge/97819866.svg)](https://zenodo.org/badge/latestdoi/97819866)**OSTRICH** is an _RDF triple store_ that allows _multiple versions_ of a dataset to be stored and queried at the same time.
The store is a hybrid between _snapshot_, _delta_ and _timestamp-based_ storage,
which provides a good trade-off between storage size and query time.
It provides several built-in algorithms to enable efficient iterator-based queries _at_ a certain version, _between_ any two versions, and _for_ versions. These queries support limits and offsets for any triple pattern.Insertion is done by first inserting a dataset snapshot, which is encoded in [HDT](rdfhdt.org).
After that, deltas can be inserted, which contain additions and deletions based on the last delta or snapshot.Learn more about the internals of OSTRICH in the following articles:
- [Triple Storage for Random-Access Versioned Querying of RDF Archives](https://rdfostrich.github.io/article-jws2018-ostrich/)
- [Scaling Large RDF Archives To Very Long Histories](http://luisgalarraga.de/docs/ICSC_2023.pdf)
- [OSTRICH: Versioned Random-Access Triple Store](https://rdfostrich.github.io/article-demo/)
- [GLENDA: Querying RDF Archives with full SPARQL](https://2023.eswc-conferences.org/wp-content/uploads/2023/05/paper_Pelgrin_2023_GLENDA.pdf)## Building
OSTRICH requires ZLib, Kyoto Cabinet, Boost, Serd, Raptor2 and CMake (compilation only) to be installed.
Inspect our [CI workflow file](https://github.com/rdfostrich/ostrich/blob/master/.github/workflows/ostrich_test.yml) to see how dependencies are installed on Ubuntu.Compile:
```bash
$ mkdir build
$ cd build
$ cmake ..
$ make
```## Running
The OSTRICH dataset will always be loaded from the current directory.
### Tests
```bash
build/ostrich_test
```### Query
```bash
build/ostrich-query-version-materialized patch_id s p o
build/ostrich-query-delta-materialized patch_id patch_id_end s p o
build/ostrich-query-version patch_id s p o
```### Insert
```bash
build/ostrich-insert [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*
```Input deltas must be sorted in SPO-order.
### Evaluate
Only load changesets from a path structured as `path_to_patch_directory/patch_id/main.nt.additions.txt` and `path_to_patch_directory/patch_id/main.nt.deletions.txt`.
```bash
build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end
```
CSV-formatted insert data will be emitted: `version,added,durationms,rate,accsize`.Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications.
```bash
build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end patch_to_queries/queries.txt s|p|o nr_replications
```
CSV-formatted query data will be emitted (time in microseconds) for all versions for the three query types: `patch,offset,limit,count-ms,lookup-mus,results`.## Docker
Alternatively, OSTRICH can be built and run using Docker.
### Build
```bash
docker build -t ostrich .
```Instead of building the container yourself, you can use the pre-built image from [DockerHub](https://hub.docker.com/r/rdfostrich/ostrich/).
```bash
docker pull rdfostrich/ostrich
```### Test
```bash
docker run --rm -it --entrypoint /opt/patchstore/build/ostrich_test ostrich
```### Query
```bash
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version-materialized ostrich patch_id s p o
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-delta-materialized ostrich patch_id patch_id_end s p o
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version ostrich s p o
```### Insert
```bash
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-insert ostrich [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*
```### Evaluate
Only load changesets from a path structured as `path_to_patch_directory/patch_id/main.nt.additions.txt` and `path_to_patch_directory/patch_id/main.nt.deletions.txt`.
```bash
docker run --rm -it -v path_to_patch_directory:/var/patches ostrich /var/patches patch_id patch_id_end
```Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications.
```bash
docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries ostrich /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications
```Enable debug mode:
```bash
docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries -v path_to_crash_dir:/crash --privileged=true ostrich --debug /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications
```## Compiler variables
`PATCH_INSERT_BUFFER_SIZE`: The size of the triple parser buffer during patch insertion. (default `100`)`FLUSH_POSITIONS_COUNT`: The amount of triples after which the patch positions should be flushed to disk, to avoid memory issues. (default `500000`)
`FLUSH_TRIPLES_COUNT`: The amount of triples after which the store should be flushed to disk, to avoid memory issues. (default `500000`)
`KC_MEMORY_MAP_SIZE`: The KC memory map size per tree. (default `1LL << 27` = 128MB)
`KC_PAGE_CACHE_SIZE`: The KC page cache size per tree. (default `1LL << 25` = 32MB)
`MIN_ADDITION_COUNT`: The minimum addition triple count so that it will be stored in the db. Changing this value only has effect during insertion time. Lookups are compatible with any value. (default `200`)
## Cite
If you are using or extending OSTRICH as part of a scientific publication,
we would appreciate a citation of our [article](https://rdfostrich.github.io/article-jws2018-ostrich/).```bibtex
@article{taelman_jws_ostrich_2018,
author = {Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben},
title = {Triple Storage for Random-Access Versioned Querying of RDF Archives},
journal = {Journal of Web Semantics},
year = {2018},
month = aug,
url = {https://rdfostrich.github.io/article-jws2018-ostrich/}
}
```## License
This software is written by [Ruben Taelman](http://rubensworks.net/), [Olivier Pelgrin](https://github.com/opelgrin), and colleagues.This code is copyrighted by [Ghent University – imec](http://idlab.ugent.be/) and [Aalborg University](https://www.cs.aau.dk/research/dkw-data-knowledge-and-web-engineering),
and is released under the [MIT license](http://opensource.org/licenses/MIT).