Projects in Awesome Lists tagged with data-matching
A curated list of projects in awesome lists tagged with data-matching .
https://github.com/moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science
Last synced: 13 May 2025
https://github.com/j535d165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 14 May 2025
https://github.com/J535D165/recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
data-matching dedupe deduplication entity-resolution machine-learning privacy python python-library record-linkage similarity string-distance utrecht-university
Last synced: 26 Mar 2025
https://github.com/robinl/fuzzymatcher
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
data-matching fuzzy-matching probabalistic-matching pypi
Last synced: 04 Apr 2025
https://github.com/RobinL/fuzzymatcher
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
data-matching fuzzy-matching probabalistic-matching pypi
Last synced: 02 Apr 2025
https://github.com/maxharlow/csvmatch
๐ Finds fuzzy matches between CSV files
csv data-matching entity-resolution fuzzy-matching record-linkage
Last synced: 08 Apr 2025
https://github.com/vintasoftware/entity-embed
PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.
approximate-nearest-neighbors data-matching deduplication deep-learning embeddings entity-matching entity-resolution python pytorch record-linkage representation-learning
Last synced: 08 Oct 2025
https://github.com/Wikidata/soweego
Link Wikidata items to large catalogs
data-matching entity-linking entity-resolution identifiers knowledge-graph record-linkage wikidata wikimedia
Last synced: 04 Apr 2025
https://github.com/wikidata/soweego
Link Wikidata items to large catalogs
data-matching entity-linking entity-resolution identifiers knowledge-graph record-linkage wikidata wikimedia
Last synced: 07 Apr 2025
https://github.com/AI-team-UoA/pyJedAI
An open-source library that leverages Pythonโs data science ecosystem to build powerful end-to-end Entity Resolution workflows.
data-disambigation data-matching deduplication duplicate-detection entity-matching entity-resolution fuzzy-matching link-discovery machine-learning python
Last synced: 01 Mar 2026
https://github.com/j535d165/recordlinkage-annotator
A browser user interface for manual labeling of record pairs.
annotation-tool data-matching deduplication entity-resolution labeling-tool machine-learning record-linkage
Last synced: 14 Jul 2025
https://github.com/lewinfox/levitate
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
data-matching fuzzy-matching r similarity-measures string-similarity thefuzz
Last synced: 18 Jan 2026
https://github.com/maxharlow/textmatch
๐ Finds fuzzy matches between datasets
data-matching entity-resolution fuzzy-matching record-linkage
Last synced: 26 Jun 2025
https://github.com/princeton-ddss/lsh
DuckDB community extension for locality-sensitive hashing (LSH)
approximate-nearest-neighbor-search data-matching deduplication duckdb duckdb-community duckdb-extension entity-resolution fuzzy-matching locality-sensitive-hashing lsh record-linkage
Last synced: 19 Nov 2025
https://github.com/ihmeuw/person_linkage_case_study
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
census-bureau dask data-matching data-science entity-resolution fuzzy-matching record-linkage spark splink
Last synced: 04 Apr 2026
https://github.com/gust4vosales/proxcluster-deduplicator
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
clustering data-integration data-matching data-science database deduplication entity-resolution k-means pandas polars python
Last synced: 09 Apr 2026
https://github.com/kefilweditse/awesome-matchem-datasets
Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.
awesome awesome-dataset awesome-dataset-collection awesome-match-data awesome-matchem data-analysis data-matching dataset dataset-collection dataset-research dataset-samples match match-data match-dataset-analysis match-examples
Last synced: 07 Apr 2025