https://github.com/jvfe/reconciler

Python package to reconcile DataFrames
https://github.com/jvfe/reconciler

dataframe linked-data open-data pandas python reconciliation-service wikidata

Last synced: 2 months ago
JSON representation

Python package to reconcile DataFrames

Host: GitHub
URL: https://github.com/jvfe/reconciler
Owner: jvfe
License: bsd-2-clause
Created: 2020-08-26T18:44:48.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-02-15T18:31:18.000Z (over 3 years ago)
Last Synced: 2026-01-03T03:56:03.817Z (5 months ago)
Topics: dataframe, linked-data, open-data, pandas, python, reconciliation-service, wikidata
Language: Python
Homepage: https://jvfe.github.io/reconciler/
Size: 1.51 MB
Stars: 24
Watchers: 1
Forks: 7
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          # reconciler

[![license](https://img.shields.io/badge/license-BSD%202--Clause-green)](https://github.com/jvfe/reconciler/blob/master/LICENSE)

[![pytest status](https://github.com/jvfe/reconciler/workflows/pytest/badge.svg)](https://github.com/jvfe/reconciler/actions)

[![documentation status](https://github.com/jvfe/reconciler/workflows/docs/badge.svg)](https://jvfe.github.io/reconciler/)

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4088488.svg)](https://doi.org/10.5281/zenodo.4088488)

`reconciler` is a python package to reconcile tabular data with various reconciliation services, such as 

[Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), working similarly to what [OpenRefine](https://openrefine.org/) 

does, but entirely within Python, using Pandas.

## Quickstart

You can install the latest version of reconciler from PyPI with:

``` bash

pip install reconciler

```

Then to use it:

```python

from reconciler import reconcile

import pandas as pd

# A DataFrame with a column you want to reconcile.

test_df = pd.DataFrame(

    {

        "City": ["Rio de Janeiro", "São Paulo", "São Paulo", "Natal"],

        "Country": ["Q155", "Q155", "Q155", "Q155"]

    }

)

# Reconcile against type city (Q515), getting the best match for each item.

reconciled = reconcile(test_df["City"], type_id="Q515")

```

The resulting dataframe would look like this:

| id      | match   | name           |   score | type                   | type_id   | input_value    |

|:--------|:--------|:---------------|--------:|:-----------------------|:-----------|:---------------|

| Q8678   | True    | Rio de Janeiro |     100 | city                   | Q515       | Rio de Janeiro |

| Q174    | True    | São Paulo      |     100 | city                   | Q515       | São Paulo      |

| Q131620 | True    | Natal          |     100 | municipality of Brazil | Q3184121   | Natal          |

In case you want to ensure the results are cities from Brazil, you can specify the property_mapping argument with

a specific property-value pair:

```python

# Reconcile against type city (Q515) and items have the country (P17) property equals to Brazil (Q155)

reconciled = reconcile(test_df["City"], type_id="Q515", property_mapping={"P17": test_df["Country"]})

```

## Options

The `reconcile()` function accepts several options.

* `type_id` - The type of items to reconcile against per the [API specification](https://reconciliation-api.github.io/specs/latest/#structure-of-a-reconciliation-query).

* `top_res` - Either the number of results to return per entry or the string 'all' to return all results.

* `property_mapping` - A list of properties to filter results on per the [API specification](https://reconciliation-api.github.io/specs/latest/#structure-of-a-reconciliation-query).

* `reconciliation_endpoint` - The reconciliation service to connect to.  Defaults to `https://wikidata.reconci.link/en/api`.

## Other very useful packages

Although my opinion may be biased, I think `reconciler` is a pretty nice package.

But the thing is, it probably won't fulfill all your Wikidata-related needs.

Here are other packages that could help with that:

* [WikidataIntegrator](https://github.com/SuLab/WikidataIntegrator) has a lot of very nice, low-level, functions 

    for dealing with various wikidata-related activities, such as item acquisition and programmatic editing.

* [wikidata2df](https://github.com/jvfe/wikidata2df) is a very simple utility package for quickly and easily

    turning wikidata SPARQL queries into Pandas DataFrames.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jvfe/reconciler

Awesome Lists containing this project

README