Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
data-matching-software
A list of free data matching and record linkage software.
https://github.com/J535D165/data-matching-software
Last synced: 1 day ago
JSON representation
-
Contributing
-
Overview
-
Software
- GitHub
- ![GitHub stars
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars
- dirty-cat - source Python package that facilitates machine-learning with with dirty data: robust to morphological variants, such as typos. Some of the currently supported features are: fuzzy joining tables on dirty numerical, string or mixed type columns, deduplicating and encoding dirty categorical variables for ML. [This example](https://dirty-cat.github.io/stable/auto_examples/01_dirty_categories.html) illustrates why to use dirty-cat encoders rather than OneHotEncoder on dirty data and [this one](https://dirty-cat.github.io/stable/auto_examples/04_fuzzy_joining_and_FeatureAugmenter.html) shows how to join multiple dirty tables for ML.
- TableVectorizer - cat.github.io/stable/generated/dirty_cat.FeatureAugmenter.html)) are scikit-learn compatible, and easily introduced into ML pipelines.
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars
- fastLink
- CRAN/METACRAN
- ![CRAN - project.org/web/packages/fastLink/index.html) |
- ![metacran downloads - project.org/package=fastLink) |
- ![GitHub stars
- FEBRL
- FRIL
- FuzzyMatcher
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars
- hlink
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars
- JedAI
- GitHub
- ![GitHub stars
- GitHub
- ![GitHub stars - ALPHAnetwork/PIRL_RecordLinkageSoftware) |
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars
- RecordLinkage (R)
- CRAN/METACRAN
- ![CRAN - project.org/web/packages/RecordLinkage/index.html) |
- ![metacran downloads - project.org/package=RecordLinkage) |
- CRAN/METACRAN
- ![CRAN - project.org/web/packages/reclin2/index.html) |
- ![metacran downloads - project.org/package=reclin2) |
- ![GitHub stars
- RELAIS
- ReMaDDer
- PyPI - License
- ![PyPI
- PyPI - Downloads
- ![GitHub stars - isi-i2/rltk) |
- PyPI - License
- PyPI - Python Version
- ![PyPI
- PyPI - Downloads
- ![GitHub stars - analytical-services/splink) |
- Zingg - source ML based tool for entity resolution with which analytics engineer and the data scientist can quickly integrate data silos and build unified views at scale. Zingg has the ability to connect to disparate data source, local and cloud file systems in any format, enterprise applications and relational, NoSQL and cloud databases and warehouses. It scales to large volume of data and you can define domain specific functions to improve matching.
- slack community
- AtyImo
- Dedupe
- csvdedupe
- dirty-cat
-
Outdated/ no longer available
Programming Languages
Sub Categories