https://github.com/jvfe/reconciler
Python package to reconcile DataFrames
https://github.com/jvfe/reconciler
dataframe linked-data open-data pandas python reconciliation-service wikidata
Last synced: 18 days ago
JSON representation
Python package to reconcile DataFrames
- Host: GitHub
- URL: https://github.com/jvfe/reconciler
- Owner: jvfe
- License: bsd-2-clause
- Created: 2020-08-26T18:44:48.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-15T18:31:18.000Z (almost 3 years ago)
- Last Synced: 2025-12-17T02:47:22.865Z (about 1 month ago)
- Topics: dataframe, linked-data, open-data, pandas, python, reconciliation-service, wikidata
- Language: Python
- Homepage: https://jvfe.github.io/reconciler/
- Size: 1.51 MB
- Stars: 24
- Watchers: 1
- Forks: 7
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# reconciler
[](https://github.com/jvfe/reconciler/blob/master/LICENSE)
[](https://github.com/jvfe/reconciler/actions)
[](https://jvfe.github.io/reconciler/)
[](https://doi.org/10.5281/zenodo.4088488)
`reconciler` is a python package to reconcile tabular data with various reconciliation services, such as
[Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), working similarly to what [OpenRefine](https://openrefine.org/)
does, but entirely within Python, using Pandas.
## Quickstart
You can install the latest version of reconciler from PyPI with:
``` bash
pip install reconciler
```
Then to use it:
```python
from reconciler import reconcile
import pandas as pd
# A DataFrame with a column you want to reconcile.
test_df = pd.DataFrame(
{
"City": ["Rio de Janeiro", "São Paulo", "São Paulo", "Natal"],
"Country": ["Q155", "Q155", "Q155", "Q155"]
}
)
# Reconcile against type city (Q515), getting the best match for each item.
reconciled = reconcile(test_df["City"], type_id="Q515")
```
The resulting dataframe would look like this:
| id | match | name | score | type | type_id | input_value |
|:--------|:--------|:---------------|--------:|:-----------------------|:-----------|:---------------|
| Q8678 | True | Rio de Janeiro | 100 | city | Q515 | Rio de Janeiro |
| Q174 | True | São Paulo | 100 | city | Q515 | São Paulo |
| Q131620 | True | Natal | 100 | municipality of Brazil | Q3184121 | Natal |
In case you want to ensure the results are cities from Brazil, you can specify the property_mapping argument with
a specific property-value pair:
```python
# Reconcile against type city (Q515) and items have the country (P17) property equals to Brazil (Q155)
reconciled = reconcile(test_df["City"], type_id="Q515", property_mapping={"P17": test_df["Country"]})
```
## Options
The `reconcile()` function accepts several options.
* `type_id` - The type of items to reconcile against per the [API specification](https://reconciliation-api.github.io/specs/latest/#structure-of-a-reconciliation-query).
* `top_res` - Either the number of results to return per entry or the string 'all' to return all results.
* `property_mapping` - A list of properties to filter results on per the [API specification](https://reconciliation-api.github.io/specs/latest/#structure-of-a-reconciliation-query).
* `reconciliation_endpoint` - The reconciliation service to connect to. Defaults to `https://wikidata.reconci.link/en/api`.
## Other very useful packages
Although my opinion may be biased, I think `reconciler` is a pretty nice package.
But the thing is, it probably won't fulfill all your Wikidata-related needs.
Here are other packages that could help with that:
* [WikidataIntegrator](https://github.com/SuLab/WikidataIntegrator) has a lot of very nice, low-level, functions
for dealing with various wikidata-related activities, such as item acquisition and programmatic editing.
* [wikidata2df](https://github.com/jvfe/wikidata2df) is a very simple utility package for quickly and easily
turning wikidata SPARQL queries into Pandas DataFrames.