Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tteofili/cheaper

Low Cost Entity Resolution with Transformers
https://github.com/tteofili/cheaper

data-management entity-resolution record-linkage

Last synced: about 6 hours ago
JSON representation

Low Cost Entity Resolution with Transformers

Awesome Lists containing this project

README

        

CheapER
=======

`CheapER` is a tool for performing Entity Resolution tasks with few labeled training samples.

`CheapER` adopts large language models within a _noisy training_ framework, in combination with _adaptive fine tuning_, _consistency training_, _adaptive softmax_ and _Monte Carlo dropout_.

![*CheapER pipeline*](cheaper.png)

# Experiments

`CheapER` requires less labeled training data with respect to SotA systems (as of early 2023) to reach the same _F1_.

![*CheapER cost on DM datasets*](dm_results.png)

Experiments on the _DeepMatcher_ datasets can be reproduced using the `eval.py` script.

# Notebooks

* [Effectiveness](https://colab.research.google.com/drive/1G0PMnt4xtrwvztjmOBTPbJmwV51ajysN#scrollTo=3sonS3GiFaE1) of [adaptive fine-tuning](https://ruder.io/recent-advances-lm-fine-tuning/) for the ER task.
* [CheapER training using 5% of the BeerAdvo-RateBeer dataset](example.ipynb) (using a DistilBert model).

# Citing CheapER

If you extend or use this work, please cite:

```
@article{teofili2023cheaper,
title={CheapER: Low Cost Entity Resolution},
author={Teofili, Tommaso and Firmani, Donatella and Merialdo, Paolo},
year={2023}
}
```