Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Entity resolution

Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.

https://github.com/dedupeio/dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

clustering datamade de-duplicating dedupe dedupe-library entity-resolution python python-library record-linkage

Last synced: 20 Jan 2025

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 21 Jan 2025

https://github.com/dedupeio/dedupe-examples

:id: Examples for using the dedupe library

dedupe entity-resolution python record-linkage

Last synced: 25 Jan 2025

https://github.com/dedupeio/csvdedupe

:id: Command line tool for deduplicating CSV files

cli csv-files dedupe entity-resolution record-linkage

Last synced: 22 Jan 2025

https://github.com/microsoft/vert-papers

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

bertel can-ner cross-lingual-ner entity-disambiguation entity-extraction entity-linking entity-resolution grn language-understanding linkingpark ml named-entity-recognition ner nlp nlp-resources unitrans xl-ner

Last synced: 20 Jan 2025

https://github.com/scify/jedaitoolkit

An open source, high scalability toolkit in Java for Entity Resolution.

blocking entity-matching entity-resolution scalability

Last synced: 21 Jan 2025

https://github.com/scify/JedAIToolkit

An open source, high scalability toolkit in Java for Entity Resolution.

blocking entity-matching entity-resolution scalability

Last synced: 14 Nov 2024

https://github.com/amazon-science/refined

ReFinED is an efficient and accurate entity linking (EL) system.

entity-extraction entity-linking entity-resolution nlp pytorch

Last synced: 22 Jan 2025

https://github.com/maxharlow/csvmatch

πŸ”Ž Finds fuzzy matches between CSV files

csv data-matching entity-resolution fuzzy-matching record-linkage

Last synced: 25 Jan 2025

https://github.com/vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

approximate-nearest-neighbors data-matching deduplication deep-learning embeddings entity-matching entity-resolution python pytorch record-linkage representation-learning

Last synced: 23 Jan 2025

https://github.com/codeforkjeff/conciliator

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.

entity-resolution openlibrary openrefine orcid reconciliation-service solr viaf

Last synced: 05 Nov 2024

https://github.com/wcmc-its/reciter

ReCiter: an enterprise open source author disambiguation system for academic institutions

algorithm aws awscodebuild awscodepipeline clustering dynamodb elasticbeanstalk entity-resolution java machine-learning-algorithms maven pubmed reciter s3 scopus spring-boot

Last synced: 21 Jan 2025

https://github.com/iesl/learned-string-alignments

Learning String Alignments for Entity Aliases

entity-resolution learned-string-edit-distance string-similarity

Last synced: 23 Dec 2024

https://github.com/iesl/stance

Learned string similarity for entity names using optimal transport.

aliases entity-resolution optimal-transport record-linkage stance string-distance string-matching string-similarity

Last synced: 23 Dec 2024

https://github.com/ing-bank/spark-matcher

Record matching and entity resolution at scale in Spark

deduplication entity-resolution record-linkage spark

Last synced: 08 Nov 2024

https://github.com/dobraczka/kiez

🏘️ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings

approximate-nearest-neighbor-search embedding entity-alignment entity-resolution hubness knowledge-graph knowledge-graph-embedding nearest-neighbors

Last synced: 08 Nov 2024

https://github.com/j535d165/febrl-fork-v0.4.2

Fork of the Freely Extensible Biomedical Record Linkage program

deduplication entity-resolution matching python-library record-linkage

Last synced: 22 Nov 2024

https://github.com/neo4j-graph-examples/entity-resolution

Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.

entity-resolution graph-datatabase neo4j neo4j-approved

Last synced: 12 Nov 2024

https://github.com/molybdenum-99/whatis

WhatIs.this: simple entity resolution through Wikipedia

entity-resolution wikipedia

Last synced: 20 Nov 2024

https://github.com/snipsco/snips-nlu-parsers

Rust crate for entity parsing

entity-recognition entity-resolution nlp nlu rust

Last synced: 08 Nov 2024

https://github.com/tilotech/langchain-tilores

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

agentic-rag entity-resolution langchain

Last synced: 01 Nov 2024

https://github.com/nickcrews/mismo

The SQL/Ibis powered sklearn of record linkage

deduplication duckdb entity-resolution ibis python record-linkage sql

Last synced: 18 Nov 2024

https://github.com/maxharlow/textmatch

πŸ”Ž Finds fuzzy matches between datasets

data-matching entity-resolution fuzzy-matching record-linkage

Last synced: 23 Nov 2024

https://github.com/dobraczka/forayer

forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.

data-integration entity-resolution knowledge-graph

Last synced: 08 Nov 2024

https://github.com/dobraczka/sylloge

πŸ—ƒοΈ Small library to simplify collecting and loading of entity alignment benchmark datasets

datasets entity-alignment entity-resolution knowledge-graph

Last synced: 08 Nov 2024

https://github.com/fgregg/smered

Mirror of https://bitbucket.org/resteorts/smered

deduplication entity-resolution record-linkage

Last synced: 15 Oct 2024

https://github.com/tilotech/tilores-langchain

This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.

agentic-rag entity-resolution identity-resolution langchain

Last synced: 25 Jan 2025

https://github.com/harpin-ai/toolkit-examples

Examples for trying out the harpin AI identity resolution and data quality toolkit

data-engineering data-quality dedupe deduplication entity-resolution identity identity-resolution spark

Last synced: 01 Nov 2024

https://github.com/tilotech/python-tilores-sdk

The tilores-sdk Python package is a small SDK to develop with the Tilores entity resolution system.

entity-resolution python tilores

Last synced: 01 Dec 2024

https://github.com/dobraczka/eche

πŸ•ΈοΈ Little helper for handling entity clusters

clustering connected-components deduplication entity-resolution record-linkage transitive-closure

Last synced: 08 Nov 2024

https://github.com/ilias-ant/entity-resolution-with-monetdb

A proof-of-concept entity resolution approach, with Tensorflow, inside a MonetDB.

entity-resolution monetdb product-matching sql tensorflow udfs

Last synced: 22 Jan 2025

https://github.com/gust4vosales/proxcluster-deduplicator

ProxCluster is a modularized framework for Incremental Entity Resolution that leverages concepts similar to the K-Means algorithm to cluster the duplicates found. This work was developed as the final paper for my Bachelor degree in Computer Science

clustering data-integration data-matching data-science database deduplication entity-resolution k-means pandas polars python

Last synced: 28 Nov 2024

https://github.com/imvladikon/deduplicator

Simple entity deduplication package

deduplication entity-resolution

Last synced: 02 Jan 2025

https://github.com/samir55/sigmod2021_teamaaa

ACM SIGMOD Programming Contest 2021. Team AAA submission (Top 10).

databases deduplication entity-resolution

Last synced: 31 Dec 2024

https://github.com/gaglia88/gsm_repro

Reproducibility experiments for Generalized Supervised Meta-blocking

blocking entity-matching entity-resolution meta-blocking reproducibility reproducible-paper reproducible-research

Last synced: 24 Dec 2024

https://github.com/lefteris-souflas/entity-resolution

Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python

edge-pruning entity-resolution graph jaccard-similarity meta-blocking token-blocking

Last synced: 12 Jan 2025