Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dgarijo/kgtk-delta-wikidata
This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpoints
https://github.com/dgarijo/kgtk-delta-wikidata
Last synced: 10 days ago
JSON representation
This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpoints
- Host: GitHub
- URL: https://github.com/dgarijo/kgtk-delta-wikidata
- Owner: dgarijo
- License: apache-2.0
- Created: 2020-11-12T19:17:30.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2020-11-13T00:13:26.000Z (about 4 years ago)
- Last Synced: 2024-04-16T19:32:57.396Z (8 months ago)
- Language: Python
- Size: 11.7 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# kgtk-delta-wikidata
This repository contains the code for calculating deltas from two sorted kgtk files. The effort is targeted towards capturing the differences between different Wikidata endpointsTo use the script just type:
```
python calculate_deltas.py -o path_to_old_dataset -n path_to_new_dataset -d output_directory
```Where `path_to_old_dataset` is the path to the oldest dataset taking part in the comparison; `path_to_new_dataset` is the most recent dataset, and `output_directory` is the directory where the outputs will be written to. Three outputs will be produced: `added.tsv`, which contains the statements that have been newly added in the more recent dataset; `deleted.tsv` wich contains the statements that have been deleted in the most recent dataset; and `modified_qual.tsv`, which contains those statements which have qualifiers that have been modified.
For all files, only id, node1, label, node2 is kept for simplicity.