Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/wikidata/triplify-json

Create RDF triples from Wikidata JSON files.
https://github.com/wikidata/triplify-json

Last synced: about 1 month ago
JSON representation

Create RDF triples from Wikidata JSON files.

Host: GitHub
URL: https://github.com/wikidata/triplify-json
Owner: Wikidata
Created: 2021-05-25T15:50:00.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2021-06-03T21:45:28.000Z (over 3 years ago)
Last Synced: 2024-04-14T07:40:41.702Z (8 months ago)
Language: Jupyter Notebook
Size: 166 KB
Stars: 6
Watchers: 4
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# triplify-json
Create RDF triples from Wikidata JSON files.

``` shell
node triplify-json.js examples/rome.json
```

This repository contains both Javascript and Python scripts to transform Wikibase/Wikidata native JSON format to [RDF](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format). Although the Wikibase platform also delivers the RDF natively from its own backend, there are use-cases where a an external script reproducing the same RDF can be valuable.

## Use-cases
### Pre-ingestion EntitySchema Validation
Bots like those maintained in Gene Wiki, fetch a Wikidata item through the Wikidata API as a JSON object, which is then updated in memory. The updated JSON object is then submitted to the Wikidata API.
Checking for conformance to an applicable EntitySchema is currently only possible post-submission, since the RDF is directly derived from Wikidata content.
If a bot edit would lead to a inconsistancy in the applicable EntitySchema, this can only be picked up post ingestion, while these edits should be caught before the submission. By transforming the updated JSON object into its RDF equivalent EntitySchema testing can be validated before submission.

### Subset extraction
The RDF representation of a Wikidata item contains a redundancy, since it contains both the full statements and the "truthy" statements. A subset that contains is not always necessary and being able to separate truthy from full statements lead to smaller subsets. Similarly being able to taylor which part of Wikidata items (ie. Labels/descriptions, statements, and sitelinks) should be extracted can help toward leaner Wikidata subsets.