https://github.com/tteofili/certa

CERTA - Computing Entity Resolution explanations with TriAngles
https://github.com/tteofili/certa

data-integration entity-matching entity-resolution explainable-ai machine-learning python record-linkage xai

Last synced: 15 days ago
JSON representation

CERTA - Computing Entity Resolution explanations with TriAngles

Host: GitHub
URL: https://github.com/tteofili/certa
Owner: tteofili
License: apache-2.0
Created: 2021-01-28T07:07:48.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2025-01-03T06:54:18.000Z (4 months ago)
Last Synced: 2025-04-12T02:04:46.390Z (15 days ago)
Topics: data-integration, entity-matching, entity-resolution, explainable-ai, machine-learning, python, record-linkage, xai
Language: Python
Homepage:
Size: 26.8 MB
Stars: 5
Watchers: 2
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        CERTA

=======

Code for _CERTA_ (Computing ER explanations with TriAngles), an algorithm for computing saliency and counterfactual explanations for Entity Resolution models.

# Installation

To install _CERTA_ locally run :

```shell

pip install .

```

# Usage

Wrap the model whose predictions need to be explained using the [ERModel](models/ermodel.py) interface.

The _get_model_ utility method will load an existing model, if available, or train a new one using the data in the provided dataset.

E.g. for a _DeepMatcher_ model use:

```python

from certa.models.utils import get_model

model = get_model('dm', '/path/where/to/save', '/path/to/dataset', 'modelname')

```

Define a prediction function wrapping the _model.predict()_ method.

```python

def predict_fn(x, **kwargs):

    return model.predict(x, **kwargs)

```

Create a [CertaExplainer](certa/explain.py). 

_CERTA_ needs access to the data sources _lsource_ and _rsource_. 

```python

import pandas as pd

from certa.explain import CertaExplainer

lsource = pd.read_csv('/path/to/dataset/tableA.csv')

rsource = pd.read_csv('/path/to/dataset/tableB.csv')

certa_explainer = CertaExplainer(lsource, rsource)

```

To generate the prediction for the first two records in the data sources, do the following:

```python

import numpy as np

from certa.local_explain import get_original_prediction

l_tuple = lsource.iloc[0]

r_tuple = rsource.iloc[0]

prediction = get_original_prediction(l_tuple, r_tuple, predict_fn)

class_to_explain = np.argmax(prediction)

```

To explain the prediction using _CERTA_ :

```python

saliency, summary, cfs, triangles, lattices = certa_explainer.explain(l_tuple, r_tuple, predict_fn)

```

_CERTA_ returns:

* the saliency explanation within the _saliency_ pd.DataFrame 

* a _summary_ containing the set of attributes that has the highest probability of sufficiency of flipping the original prediction

* the generated counterfactual explanations within the _cfs_ pd.DataFrame 

* the list of open _triangles_ (in form of tuples of record ids) used to generate the explanations

# Examples

Examples of using _CERTA_ can be found in the following notebooks:

* [Explain DeepMatcher predictions](notebooks/sample.ipynb)

* [Explain Ditto predictions](https://gist.github.com/tteofili/b4c81a3de6aef40e8dfa27eaf22f116d)

# Citing CERTA

If you extend or use this work, please cite the [paper](https://arxiv.org/abs/2203.12978):

```

@article{teofili2022effective,

  title={Effective Explanations for Entity Resolution Models},

  author={Teofili, Tommaso and Firmani, Donatella and Koudas, Nick and Martello, Vincenzo and Merialdo, Paolo and Srivastava, Divesh},

  journal={arXiv preprint arXiv:2203.12978},

  year={2022}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tteofili/certa

Awesome Lists containing this project

README