An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-matching

A curated list of projects in awesome lists tagged with data-matching .

https://github.com/moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-matching data-science deduplicate-data deduplication duckdb em-algorithm entity-resolution fuzzy-matching record-linkage spark uk-gov-data-science

Last synced: 13 May 2025

https://github.com/robinl/fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

data-matching fuzzy-matching probabalistic-matching pypi

Last synced: 04 Apr 2025

https://github.com/RobinL/fuzzymatcher

Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4

data-matching fuzzy-matching probabalistic-matching pypi

Last synced: 02 Apr 2025

https://github.com/maxharlow/csvmatch

๐Ÿ”Ž Finds fuzzy matches between CSV files

csv data-matching entity-resolution fuzzy-matching record-linkage

Last synced: 08 Apr 2025

https://github.com/vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

approximate-nearest-neighbors data-matching deduplication deep-learning embeddings entity-matching entity-resolution python pytorch record-linkage representation-learning

Last synced: 08 Oct 2025

https://github.com/AI-team-UoA/pyJedAI

An open-source library that leverages Pythonโ€™s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

data-disambigation data-matching deduplication duplicate-detection entity-matching entity-resolution fuzzy-matching link-discovery machine-learning python

Last synced: 01 Mar 2026

https://github.com/lewinfox/levitate

Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).

data-matching fuzzy-matching r similarity-measures string-similarity thefuzz

Last synced: 18 Jan 2026

https://github.com/maxharlow/textmatch

๐Ÿ”Ž Finds fuzzy matches between datasets

data-matching entity-resolution fuzzy-matching record-linkage

Last synced: 26 Jun 2025

https://github.com/ihmeuw/person_linkage_case_study

Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).

census-bureau dask data-matching data-science entity-resolution fuzzy-matching record-linkage spark splink

Last synced: 04 Apr 2026

https://github.com/gust4vosales/proxcluster-deduplicator

ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science

clustering data-integration data-matching data-science database deduplication entity-resolution k-means pandas polars python

Last synced: 09 Apr 2026

https://github.com/kefilweditse/awesome-matchem-datasets

Awesome-matchem-datasets is a curated collection of high-quality datasets for machine learning and data analysis in the field of chemistry. This repository includes various datasets, ranging from molecular structures to experimental results, suitable for both research and educational purposes.

awesome awesome-dataset awesome-dataset-collection awesome-match-data awesome-matchem data-analysis data-matching dataset dataset-collection dataset-research dataset-samples match match-data match-dataset-analysis match-examples

Last synced: 07 Apr 2025