Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Entity resolution
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
- GitHub: https://github.com/topics/entity-resolution
- Wikipedia: https://en.wikipedia.org/wiki/Record_linkage
- Repo: https://github.com/entity-resolution
- Created by: Halbert L. Dunn
- Released: 1946
- Related Topics: artificial-intelligence, nlp,
- Aliases: entity-matching, entity-linking, link-discovery, deduplication, de-duplication, data-matching, record-linkage, data-disambigation,
- Last updated: 2024-12-15 00:09:39 UTC
- JSON Representation
https://github.com/dedupeio/dedupe
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
clustering datamade de-duplicating dedupe dedupe-library entity-resolution python python-library record-linkage
Last synced: 20 Jan 2025
https://github.com/moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science
Last synced: 21 Jan 2025
https://github.com/j535d165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 25 Jan 2025
https://github.com/zinggai/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
analytics analytics-engineering data-science data-transformation data-transformations dataengineering datalake dataquality dedupe deduplication entity-resolution etl fuzzy-matching fuzzymatch identity identity-resolution masterdata ml modern-data-stack spark
Last synced: 23 Jan 2025
https://github.com/J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 29 Oct 2024
https://github.com/johnsnowlabs/nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
bert-embedding dependency-parsing entity-resolution language-detection lemmatizer named-entity-recognition natural-language-understanding nlu pandas sentence-embeddings sentiment-analysis sentiment-classifier seq2seq spell-checker streamlit t5 text-classification text-summarization text-translation transformers
Last synced: 22 Jan 2025
https://github.com/JohnSnowLabs/nlu
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
bert-embedding dependency-parsing entity-resolution language-detection lemmatizer named-entity-recognition natural-language-understanding nlu pandas sentence-embeddings sentiment-analysis sentiment-classifier seq2seq spell-checker streamlit t5 text-classification text-summarization text-translation transformers
Last synced: 22 Nov 2024
https://github.com/picovoice/rhino
On-device Speech-to-Intent engine powered by deep learning
entity-resolution intent-inference natural-language-understanding nlu on-device slot-filling slu speech-recognition spoken-language-understanding voice-assistant voice-command voice-command-control voice-commands voice-control voice-recognition voice-ui voice-user-interface vui
Last synced: 25 Jan 2025
https://github.com/Picovoice/rhino
On-device Speech-to-Intent engine powered by deep learning
entity-resolution intent-inference natural-language-understanding nlu on-device slot-filling slu speech-recognition spoken-language-understanding voice-assistant voice-command voice-command-control voice-commands voice-control voice-recognition voice-ui voice-user-interface vui
Last synced: 27 Oct 2024
https://github.com/dedupeio/dedupe-examples
:id: Examples for using the dedupe library
dedupe entity-resolution python record-linkage
Last synced: 25 Jan 2025
https://github.com/dedupeio/csvdedupe
:id: Command line tool for deduplicating CSV files
cli csv-files dedupe entity-resolution record-linkage
Last synced: 22 Jan 2025
https://github.com/izuna385/entity-linking-recent-trends
Recent trends of Entity Linking, Disambiguation, and Representation.
bert entity-disambiguation entity-language-model entity-linking entity-representation entity-resolution natural-language-processing nlp
Last synced: 06 Jan 2025
https://github.com/microsoft/vert-papers
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
bertel can-ner cross-lingual-ner entity-disambiguation entity-extraction entity-linking entity-resolution grn language-understanding linkingpark ml named-entity-recognition ner nlp nlp-resources unitrans xl-ner
Last synced: 20 Jan 2025
https://github.com/scify/jedaitoolkit
An open source, high scalability toolkit in Java for Entity Resolution.
blocking entity-matching entity-resolution scalability
Last synced: 21 Jan 2025
https://github.com/scify/JedAIToolkit
An open source, high scalability toolkit in Java for Entity Resolution.
blocking entity-matching entity-resolution scalability
Last synced: 14 Nov 2024
https://github.com/amazon-science/refined
ReFinED is an efficient and accurate entity linking (EL) system.
entity-extraction entity-linking entity-resolution nlp pytorch
Last synced: 22 Jan 2025
https://github.com/maxharlow/csvmatch
π Finds fuzzy matches between CSV files
csv data-matching entity-resolution fuzzy-matching record-linkage
Last synced: 25 Jan 2025
https://github.com/vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
approximate-nearest-neighbors data-matching deduplication deep-learning embeddings entity-matching entity-resolution python pytorch record-linkage representation-learning
Last synced: 23 Jan 2025
https://github.com/codeforkjeff/conciliator
OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.
entity-resolution openlibrary openrefine orcid reconciliation-service solr viaf
Last synced: 05 Nov 2024
https://github.com/Wikidata/soweego
Link Wikidata items to large catalogs
data-matching entity-linking entity-resolution identifiers knowledge-graph record-linkage wikidata wikimedia
Last synced: 05 Nov 2024
https://github.com/wikidata/soweego
Link Wikidata items to large catalogs
data-matching entity-linking entity-resolution identifiers knowledge-graph record-linkage wikidata wikimedia
Last synced: 18 Nov 2024
https://github.com/gaglia88/sparker
SparkER: an Entity Resolution framework for Apache Spark
apache apache-spark entity entity-resolution meta-blocking python python-library python27 python3 resolution scala spark
Last synced: 24 Jan 2025
https://github.com/wcmc-its/reciter
ReCiter: an enterprise open source author disambiguation system for academic institutions
algorithm aws awscodebuild awscodepipeline clustering dynamodb elasticbeanstalk entity-resolution java machine-learning-algorithms maven pubmed reciter s3 scopus spring-boot
Last synced: 21 Jan 2025
https://github.com/j535d165/recordlinkage-annotator
A browser user interface for manual labeling of record pairs.
annotation-tool data-matching deduplication entity-resolution labeling-tool machine-learning record-linkage
Last synced: 22 Nov 2024
https://github.com/iesl/learned-string-alignments
Learning String Alignments for Entity Aliases
entity-resolution learned-string-edit-distance string-similarity
Last synced: 23 Dec 2024
https://github.com/entrepreneur-interet-general/merge-machine
Merge Dirty Data with Clean Reference Tables
elasticsearch entity-resolution entrepreneur-interet-general python record-linkage
Last synced: 31 Oct 2024
https://github.com/entrepreneur-interet-general/Merge-Machine
Merge Dirty Data with Clean Reference Tables
elasticsearch entity-resolution entrepreneur-interet-general python record-linkage
Last synced: 04 Nov 2024
https://github.com/iesl/stance
Learned string similarity for entity names using optimal transport.
aliases entity-resolution optimal-transport record-linkage stance string-distance string-matching string-similarity
Last synced: 23 Dec 2024
https://github.com/ing-bank/spark-matcher
Record matching and entity resolution at scale in Spark
deduplication entity-resolution record-linkage spark
Last synced: 08 Nov 2024
https://github.com/dobraczka/kiez
ποΈ Hubness reduced nearest neighbor search for entity alignment with knowledge graph embeddings
approximate-nearest-neighbor-search embedding entity-alignment entity-resolution hubness knowledge-graph knowledge-graph-embedding nearest-neighbors
Last synced: 08 Nov 2024
https://github.com/j535d165/febrl-fork-v0.4.2
Fork of the Freely Extensible Biomedical Record Linkage program
deduplication entity-resolution matching python-library record-linkage
Last synced: 22 Nov 2024
https://github.com/neo4j-graph-examples/entity-resolution
Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real entity across different digital entities present on same or different data sets. Record linking is necessary when joining different entities which are similar and may or may not share some common identifiers. Neo4j offers various advantages to perform entity resolution / record linking. This repository covers such a use case of linking similar user accounts for analytics and providing better recommendations.
entity-resolution graph-datatabase neo4j neo4j-approved
Last synced: 12 Nov 2024
https://github.com/molybdenum-99/whatis
WhatIs.this: simple entity resolution through Wikipedia
Last synced: 20 Nov 2024
https://github.com/scify/jedai-ui
UI for JedAI Toolkit
data-integration entity-resolution javafx jedai toolkit user-interface
Last synced: 07 Nov 2024
https://github.com/snipsco/snips-nlu-parsers
Rust crate for entity parsing
entity-recognition entity-resolution nlp nlu rust
Last synced: 08 Nov 2024
https://github.com/tilotech/langchain-tilores
This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.
agentic-rag entity-resolution langchain
Last synced: 01 Nov 2024
https://github.com/nickcrews/mismo
The SQL/Ibis powered sklearn of record linkage
deduplication duckdb entity-resolution ibis python record-linkage sql
Last synced: 18 Nov 2024
https://github.com/maxharlow/textmatch
π Finds fuzzy matches between datasets
data-matching entity-resolution fuzzy-matching record-linkage
Last synced: 23 Nov 2024
https://github.com/dobraczka/forayer
forayer is a library of first aid utilities for knowledge graph exploration with an entity centric approach.
data-integration entity-resolution knowledge-graph
Last synced: 08 Nov 2024
https://github.com/gaglia88/ruler
Scalable record-level matching rules
distributed-computing entity-matching entity-resolution similarity-join
Last synced: 06 Nov 2024
https://github.com/dobraczka/sylloge
ποΈ Small library to simplify collecting and loading of entity alignment benchmark datasets
datasets entity-alignment entity-resolution knowledge-graph
Last synced: 08 Nov 2024
https://github.com/fgregg/smered
Mirror of https://bitbucket.org/resteorts/smered
deduplication entity-resolution record-linkage
Last synced: 15 Oct 2024
https://github.com/dobraczka/klinker
𧱠blocking methods for entity resolution
blocking data-integration deduplication entity-alignment entity-resolution link-discovery record-linkage
Last synced: 08 Nov 2024
https://github.com/tilotech/tilores-langchain
This repository provides the building blocks for integrating LangChain, LangGraph, and the Tilores entity resolution system.
agentic-rag entity-resolution identity-resolution langchain
Last synced: 25 Jan 2025
https://github.com/harpin-ai/toolkit-examples
Examples for trying out the harpin AI identity resolution and data quality toolkit
data-engineering data-quality dedupe deduplication entity-resolution identity identity-resolution spark
Last synced: 01 Nov 2024
https://github.com/tilotech/python-tilores-sdk
The tilores-sdk Python package is a small SDK to develop with the Tilores entity resolution system.
entity-resolution python tilores
Last synced: 01 Dec 2024
https://github.com/dobraczka/eche
πΈοΈ Little helper for handling entity clusters
clustering connected-components deduplication entity-resolution record-linkage transitive-closure
Last synced: 08 Nov 2024
https://github.com/ilias-ant/entity-resolution-with-monetdb
A proof-of-concept entity resolution approach, with Tensorflow, inside a MonetDB.
entity-resolution monetdb product-matching sql tensorflow udfs
Last synced: 22 Jan 2025
https://github.com/rosette-api/ruby-script
Contains Ruby scripts for accessing Babel Street Analytics
api categorization entity-extraction entity-relationship entity-resolution lemmatization machine-learning natural-language-processing nlp relation-extraction ruby ruby-script sentiment-analysis text-analytics text-mining tokenization
Last synced: 11 Jan 2025
https://github.com/rosette-api/mock-data
Mock data that is used for unit testing of the Babel Street Analytics bindings
data entity-extraction entity-level-sentiment entity-linking entity-relationship entity-resolution language-detection machine-learning mock-data morphology natural-language-processing nlp relation-extraction sentiment-analysis test-framework testing text-mining text-processing tokenization
Last synced: 11 Jan 2025
https://github.com/gust4vosales/proxcluster-deduplicator
ProxCluster is a modularized framework for Incremental Entity Resolution that leverages concepts similar to the K-Means algorithm to cluster the duplicates found. This work was developed as the final paper for my Bachelor degree in Computer Science
clustering data-integration data-matching data-science database deduplication entity-resolution k-means pandas polars python
Last synced: 28 Nov 2024
https://github.com/imvladikon/deduplicator
Simple entity deduplication package
deduplication entity-resolution
Last synced: 02 Jan 2025
https://github.com/samir55/sigmod2021_teamaaa
ACM SIGMOD Programming Contest 2021. Team AAA submission (Top 10).
databases deduplication entity-resolution
Last synced: 31 Dec 2024
https://github.com/gaglia88/gsm_repro
Reproducibility experiments for Generalized Supervised Meta-blocking
blocking entity-matching entity-resolution meta-blocking reproducibility reproducible-paper reproducible-research
Last synced: 24 Dec 2024
https://github.com/lefteris-souflas/entity-resolution
Addressed Entity Resolution challenges. Tasks include schema-agnostic blocking, pairwise comparisons, Meta-Blocking graph construction, and Jaccard similarity computation. Deliverables include source code, reports, and reproducibility guidelines in Python
edge-pruning entity-resolution graph jaccard-similarity meta-blocking token-blocking
Last synced: 12 Jan 2025