{"id":13469179,"url":"https://github.com/J535D165/data-matching-software","last_synced_at":"2025-03-26T06:31:57.732Z","repository":{"id":37432150,"uuid":"115945560","full_name":"J535D165/data-matching-software","owner":"J535D165","description":"A list of free data matching and record linkage software. ","archived":false,"fork":false,"pushed_at":"2024-02-21T15:30:56.000Z","size":96,"stargazers_count":360,"open_issues_count":9,"forks_count":42,"subscribers_count":24,"default_branch":"master","last_synced_at":"2024-10-29T23:54:37.435Z","etag":null,"topics":["awesome","awesome-list","data-matching","deduplication","entity-resolution","fuzzy-matching","machine-learning","open-source","record-linkage","software"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/J535D165.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-01-01T20:27:24.000Z","updated_at":"2024-10-28T21:49:59.000Z","dependencies_parsed_at":"2022-08-08T20:15:53.280Z","dependency_job_id":"d8dcffb2-608a-4a05-9ab7-349e25cefb38","html_url":"https://github.com/J535D165/data-matching-software","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdata-matching-software","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdata-matching-software/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdata-matching-software/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/J535D165%2Fdata-matching-software/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/J535D165","download_url":"https://codeload.github.com/J535D165/data-matching-software/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245603977,"owners_count":20642917,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","awesome-list","data-matching","deduplication","entity-resolution","fuzzy-matching","machine-learning","open-source","record-linkage","software"],"created_at":"2024-07-31T15:01:28.563Z","updated_at":"2025-03-26T06:31:57.279Z","avatar_url":"https://github.com/J535D165.png","language":null,"funding_links":[],"categories":["Misc","Other Lists",":pushpin: Miscellaneous"],"sub_categories":["TeX Lists","Clustering /"],"readme":"# Data Matching software\n\n- [Overview](#overview)\n- [Software](#software)\n- [Outdated](#outdated-no-longer-available)\n- [Contributing](#contributing)\n\nThis is a list of (Fuzzy) Data Matching software. The software in this list is\nFOSS (Free and open-source software).\n\nThe term data matching is used to indicate the procedure of bringing together\ninformation from two or more records that are believed to belong to the same\nentity. Data matching has two applications: (1) to match data across multiple\ndatasets (linkage) and (2) to match data within a dataset (deduplication).\nSee the [Wikipedia page](https://en.wikipedia.org/wiki/Record_linkage) about\ndata matching for more information.\n\n*Similar terms:* record linkage, data matching, deduplication, fuzzy matching,\n    entity resolution\n\n## Overview\n\nThe table below gives a dense overview of data matching software properties.\nThe properties evaluated are [Application Programming Interface\n(API)](https://en.wikipedia.org/wiki/Application_programming_interface),\n[Graphical User Interface\n(GUI)](https://en.wikipedia.org/wiki/Graphical_user_interface), Linking,\nDeduplication, [Supervised\nLearning](https://en.wikipedia.org/wiki/Supervised_learning), [Unsupervised\nLearning](https://en.wikipedia.org/wiki/Unsupervised_learning) and [Active\nLearning](https://en.wikipedia.org/wiki/Active_learning_(machine_learning)).\n\n| Software                                                        | API    |        GUI         |        Link        |       Dedup        | Supervised \u003cbr/\u003e Learning | Unsupervised \u003cbr/\u003e Learning | Active \u003cbr/\u003e Learning |\n|:----------------------------------------------------------------|:-------|:------------------:|:------------------:|:------------------:|:-------------------------:|:---------------------------:|:---------------------:|\n| [AtyImo](#atyimo)\t\t                                  | PySpark|         :x:\t| :white_check_mark: | :white_check_mark: |            :x:            |             :x:               |          :x:          |\n| [Dedupe](#dedupe)                                               | Python |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |             :x:             |  :white_check_mark:   |\n| [dirty-cat](#dirty-cat)                                               | Python|        :x:         | :white_check_mark: | :white_check_mark: |            :white_check_mark:            |      :white_check_mark:     |          :x:          |\n| [fastLink](#fastlink)                                           | R      |        :x:         | :white_check_mark: |  :grey_question:   |            :x:            |     :white_check_mark:      |          :x:          |\n| [FEBRL](#febrl)                                                 | Python | :white_check_mark: | :white_check_mark: | :white_check_mark: |            :x:            |             :x:             |          :x:          |\n| [FRIL](#fril)                                                   | Java   | :white_check_mark: | :white_check_mark: |        :x:         |      :grey_question:      |     :white_check_mark:      |          :x:          |\n| [FuzzyMatcher](#fuzzymatcher)                                   | Python |        :x:         | :white_check_mark: |        :x:         |            :x:            |     :white_check_mark:      |          :x:          |\n| [hlink](#hlink)                                                 | PySpark|        :x:         | :white_check_mark: |  :grey_question:   |            :x:            |            :x:              |          :x:          |\n| [JedAI](#jedai)                                                 | Java   | :white_check_mark: | :white_check_mark: |  :grey_question:   |    :white_check_mark:     |       :grey_question:       |    :grey_question:    |\n| [PRIL](#pril)                                                   | SQL    |        :x:         | :white_check_mark: |  :grey_question:   |      :grey_question:      |       :grey_question:       |    :grey_question:    |\n| [Python Record Linkage Toolkit](#python-record-linkage-toolkit) | Python |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |     :white_check_mark:      |          :x:          |\n| [RecordLinkage (R)](#recordlinkage-r)                           | R      |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |     :white_check_mark:      |          :x:          |\n| [Reclin2](#reclin2)                           | R      |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |     :x:      |          :x:          |\n| [RELAIS](#relais)                                               | :x:    | :white_check_mark: | :white_check_mark: |  :grey_question:   |      :grey_question:      |     :white_check_mark:      |          :x:          |\n| [ReMaDDer](#remadder)                                           | :x:    | :white_check_mark: | :white_check_mark: | :white_check_mark: |            :x:            |     :white_check_mark:      |          :x:          |\n| [RLTK](#rltk) | Python |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |     :x:      |          :x:          |\n| [Splink](#splink)                                               | Python |        :x:         | :white_check_mark: | :white_check_mark: |    :white_check_mark:     |     :white_check_mark:      |          :x:          |\n| [Zingg](#zingg)                                               | Python|        :x:         | :white_check_mark: | :white_check_mark: |            :white_check_mark:            |      :x:     |          :x:          |\n\n:white_check_mark: Yes/Implemented\n:x: No/Not implemented\n:grey_question: Unknown\n\n## Software\n\nThis section describes **data matching** software. The software is\nalphabetically ordered.\n\n#### [AtyImo](https://github.com/pierrepita/atyimo)\nAtyImo implements a mixture of deterministic and probabilistic routines for data \nlinkage. Initially developed in 2013 to serve as a linkage tool supporting a joint \nBrazil–U.K. project aiming at building a large population-based cohort with data \nfrom more than 100 million participants and producing disease-specific data to facilitate \ndiverse epidemiological research studies. \n\n|  |  |\n|---|---| \n| License | ![GitHub](https://img.shields.io/github/license/pierrepita/atyimo) |\n| Language | `Python` `Spark` | \n| Latest release | NA |\n| Downloads per month |  |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/pierrepita/atyimo.svg?style=social\u0026label=Star)](https://github.com/pierrepita/atyimo) |\n\n#### [Dedupe](https://github.com/dedupeio/dedupe)\n\nDedupe is a python library for fuzzy matching, deduplication and entity\nresolution on structured data. The library makes use of active learning to\nmatch record pairs. Active learning is useful in cases without training data.\nDedupe has a side-product for deduplicating CSV files,\n[csvdedupe](https://github.com/dedupeio/csvdedupe), through the command line.\nDedupeio also offers commercial products for data matching.  \n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/dedupe) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dedupe) | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/dedupe.svg)](https://pypi.python.org/pypi/dedupe/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/dedupe) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/dedupeio/dedupe.svg?style=social\u0026label=Star)](https://github.com/dedupeio/dedupe) |\n\n#### [dirty-cat](https://github.com/dirty-cat/dirty_cat)\n\n[dirty-cat](https://dirty-cat.github.io/) is an open-source Python package that facilitates machine-learning with with dirty data: robust to morphological variants, such as typos. Some of the currently supported features are: fuzzy joining tables on dirty numerical, string or mixed type columns, deduplicating and encoding dirty categorical variables for ML. [This example](https://dirty-cat.github.io/stable/auto_examples/01_dirty_categories.html) illustrates why to use dirty-cat encoders rather than OneHotEncoder on dirty data and [this one](https://dirty-cat.github.io/stable/auto_examples/04_fuzzy_joining_and_FeatureAugmenter.html) shows how to join multiple dirty tables for ML.\nThe transfomers ([TableVectorizer](https://dirty-cat.github.io/stable/generated/dirty_cat.TableVectorizer.html), [FeatureAugmenter](https://dirty-cat.github.io/stable/generated/dirty_cat.FeatureAugmenter.html)) are scikit-learn compatible, and easily introduced into ML pipelines.\n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/zingg) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zingg) `Spark` | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/zingg.svg)](https://pypi.python.org/pypi/zingg/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/zingg) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/zinggAI/zingg.svg?style=social\u0026label=Star)](https://github.com/zinggAI) |\n\n#### [fastLink](https://cran.r-project.org/web/packages/fastLink/index.html)\n\nImplements a Fellegi-Sunter probabilistic record linkage model that allows for\nmissing data and the inclusion of auxiliary information. This includes\nfunctionalities to conduct a merge of two datasets under the Fellegi-Sunter\nmodel using the Expectation-Maximization algorithm. fastLink is a programming\nAPI written in R. ([Enamorado, Fifield \u0026 Imai,\n2017](http://imai.princeton.edu/research/files/linkage.pdf))  [[source\ncode]](https://github.com/kosukeimai/fastLink) \n\n|  |  |\n|---|---| \n| License | ![CRAN/METACRAN](https://img.shields.io/cran/l/fastLink) |\n| Language | `R`  | \n| Latest release | [![CRAN](https://img.shields.io/cran/v/fastLink.svg)](https://cran.r-project.org/web/packages/fastLink/index.html) |\n| Downloads per month | [![metacran downloads](https://cranlogs.r-pkg.org/badges/last-month/fastLink)](https://cran.r-project.org/package=fastLink) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/kosukeimai/fastLink.svg?style=social\u0026label=Star)](https://github.com/kosukeimai/fastLink) |\n\n#### [FEBRL](https://sourceforge.net/projects/febrl/)\n\nFebrl (Freely Extensible Biomedical Record Linkage) is a training tool\nsuitable for users to learn and experiment with record linkage techniques, as\nwell as for practitioners to conduct linkages with data sets containing up to\nseveral hundred thousand records. Febrl is a data matching tool with a large\nnumber of algorithms implemented and offers a Python programming interface as\nwell as simple GUI. Febrl doesn't offer unsupervised and active learning\nalgorithms. The software is no longer actively maintained. ([Christen,\n2008](http://crpit.com/confpapers/CRPITV80Christen.pdf)) [[source\ncode]](https://sourceforge.net/projects/febrl/)\n\n|  |  |\n|---|---| \n| License | Custom |\n| Language | `Python` | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars |  |\n\n#### [FRIL](http://fril.sourceforge.net/)\n\nFRIL (Fine-grained Records Integration and Linkage tool) is free tool that\nenables record linkage through a GUI. The tool implements automatic weights\nestimation through the EM-algorithm and offers serveral techniques to make\nrecord pairs. FRIL was developed by the Emory University and is not longer\nmaintained. [[source code]](http://fril.sourceforge.net/download.html)\n\n|  |  |\n|---|---| \n| License | Custom |\n| Language | `Java` | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars |  |\n\n#### [FuzzyMatcher](https://pypi.python.org/pypi/fuzzymatcher) \n\nA Python package that allows the user to fuzzy match two pandas dataframes\nbased on one or more fields in common. The functionality is limited at the\nmoment. [[source code]](https://github.com/RobinL/fuzzymatcher) \n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/fuzzymatcher) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/fuzzymatcher) | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/fuzzymatcher.svg)](https://pypi.python.org/pypi/fuzzymatcher/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/fuzzymatcher) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/RobinL/fuzzymatcher.svg?style=social\u0026label=Star)](https://github.com/RobinL/fuzzymatcher) |\n\n\n#### [hlink](https://pypi.python.org/pypi/hlink) \n\nA Python package designed to link two datasets. The primary use case was for linking demographics in the Household -\u003e Person hierarchical structure, however it can be used to link generic datasets as well by skipping household linking tasks. It allows for probabilistic and deterministic record linkage. [[source_code]](https://github.com/ipums/hlink)\n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/hlink) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/hlink) | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/hlink.svg)](https://pypi.python.org/pypi/hlink/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/hlink) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/ipums/hlink?style=social\u0026label=Star)](https://github.com/ipums/hlink) |\n\n\n#### [JedAI](http://jedai.scify.org/) \n\nJava gEneric DAta Integration (JedAI) Toolkit is a Entity Resolution Tool\ndeveloped by a group of univeristies. JedAI offers a Graphical User Interface.\n[[source code]](https://github.com/scify/JedAIToolkit) \n\n|  |  |\n|---|---| \n| License | ![GitHub](https://img.shields.io/github/license/scify/JedAIToolkit) |\n| Language | `Java` | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/scify/JedAIToolkit.svg?style=social\u0026label=Star)](https://github.com/scify/JedAIToolkit) |\n\n#### [PRIL](https://github.com/LSHTM-ALPHAnetwork/PIRL_RecordLinkageSoftware) \n\nPRIL (Point-of-contact Interactive Record Linkage) is a record linkage program\nwith a GUI. PRIL can be used to link datasets about individuals. ([Rentsch CT,\nKabudula CW, Catlett J et al.,\n2017](https://gatesopenresearch.org/articles/1-8/v1)) [[source\ncode]](https://github.com/LSHTM-ALPHAnetwork/PIRL_RecordLinkageSoftware)\n\n|  |  |\n|---|---| \n| License | ![GitHub](https://img.shields.io/github/license/LSHTM-ALPHAnetwork/PIRL_RecordLinkageSoftware) |\n| Language | `SQLPL` | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/LSHTM-ALPHAnetwork/PIRL_RecordLinkageSoftware.svg?style=social\u0026label=Star)](https://github.com/LSHTM-ALPHAnetwork/PIRL_RecordLinkageSoftware) |\n\n#### [Python Record Linkage Toolkit](https://github.com/J535D165/recordlinkage) \n\nThe Python Record Linkage Toolkit is a library to link records in or between\ndata sources. The toolkit provides most of the tools needed for record linkage\nand deduplication. The package is developed for research and the linking of\nsmall or medium sized files. \n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/recordlinkage) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/recordlinkage) | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/recordlinkage.svg)](https://pypi.python.org/pypi/recordlinkage/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/recordlinkage) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/J535D165/recordlinkage.svg?style=social\u0026label=Star)](https://github.com/J535D165/recordlinkage) |\n\n#### [RecordLinkage (R)](https://cran.r-project.org/web/packages/RecordLinkage/index.html) \n\nPackage written in R that provides functions for linking and de-duplicating\ndata sets. Both supervised and unsupervised classification algorithms are\navailable. Record pairs can be compared with a limited set of algorithms. The\npackage is published on CRAN. \n\n|  |  |\n|---|---| \n| License | ![CRAN/METACRAN](https://img.shields.io/cran/l/RecordLinkage) |\n| Language | `R` | \n| Latest release | [![CRAN](https://img.shields.io/cran/v/RecordLinkage.svg)](https://cran.r-project.org/web/packages/RecordLinkage/index.html) |\n| Downloads per month | [![metacran downloads](https://cranlogs.r-pkg.org/badges/last-month/RecordLinkage)](https://cran.r-project.org/package=RecordLinkage) |\n| GitHub stars |  |\n\n\n#### [Reclin2](https://github.com/djvanderlaan/reclin2)\n\nPackage written in R that provides functions for linking data sets. The framework offers\nthe option to compute the weigths of the Fellegi-Sunter model. It doesn't implement an\nundersupervised algorithms to predict the cutoff. The\npackage is published on CRAN. Formerly https://github.com/djvanderlaan/reclin. \n\n|  |  |\n|---|---| \n| License | ![CRAN/METACRAN](https://img.shields.io/cran/l/reclin2) |\n| Language | `R` | \n| Latest release | [![CRAN](https://img.shields.io/cran/v/reclin2.svg)](https://cran.r-project.org/web/packages/reclin2/index.html) |\n| Downloads per month | [![metacran downloads](https://cranlogs.r-pkg.org/badges/last-month/reclin2)](https://cran.r-project.org/package=reclin2) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/djvanderlaan/reclin2.svg?style=social\u0026label=Star)](https://github.com/djvanderlaan/reclin2) |\n\n#### [RELAIS](https://www.istat.it/en/methods-and-tools/methods-and-it-tools/process/processing-tools/relais)\n\nRELAIS (REcord Linkage At IStat) is a toolkit providing a set of techniques\nfor dealing with record linkage projects. IStat is the main producer of\nofficial statistics in Italy.\n\n|  |  |\n|---|---| \n| License | `EUPL-1.1` |\n| Language | `R/Java` | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars |  |\n\n#### [ReMaDDer](http://remadder.findmysoft.com/)\n\nReMaDDer is unsupervised free fuzzy data matching software with a GUI.\nReMaDDer is capable to perform fully automatic fuzzy record matching without\nhuman expert intervention, while attaining accuracy of human clerical review.\nNOTE: The software is free, but not open source and requires an internet\nconnection to work.\n\n|  |  |\n|---|---| \n| License |  |\n| Language |  | \n| Latest release |  |\n| Downloads per month |  |\n| GitHub stars |  |\n\n#### [RLTK](https://github.com/usc-isi-i2/rltk)\n\nThe Record Linkage ToolKit (RLTK) is a general-purpose open-source record\nlinkage package. The toolkit provides a full pipeline needed for record linkage\nand deduplication. \n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/rltk) |\n| Language | `Python` | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/rltk.svg)](https://pypi.python.org/pypi/rltk/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/rltk) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/usc-isi-i2/rltk.svg?style=social\u0026label=Star)](https://github.com/usc-isi-i2/rltk) |\n\n#### [Splink](https://github.com/moj-analytical-services/splink)\n\nSplink is a Python package for probabilistic record linkage at scale.\nIt supports multiple backends to execute linkage jobs, including DuckDB\nApache Spark and AWS Athena. It is able to perform linking and deduplication of very large datasets\nof tens of millions of records with runtimes of less than an hour, including \nthe clustering of results using connected components. It includes interactive tools\nto support the lifecycle of a linking project, from exploratory analysis through to\ndiagnostics and quality assurance.[[source\ncode]](https://github.com/moj-analytical-services/splink)\n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/splink) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/splink) | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/splink.svg)](https://pypi.python.org/pypi/splink/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/splink) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/moj-analytical-services/splink.svg?style=social\u0026label=Star)](https://github.com/moj-analytical-services/splink) |\n\n#### [Zingg](https://github.com/zinggAI/zingg)\n\n[Zingg](https://zingg.ai) is an open-source ML based tool for entity resolution with which analytics engineer and the data scientist can quickly integrate data silos and build unified views at scale. Zingg has the ability to connect to disparate data source, local and cloud file systems in any format, enterprise applications and relational, NoSQL and cloud databases and warehouses. It scales to large volume of data and you can define domain specific functions to improve matching.\nNot only Zingg support English as well as Chinese, Thai, Japanese, Hindi and other languages, it also has a very active [slack community](https://join.slack.com/t/zinggai/shared_invite/zt-w7zlcnol-vEuqU9m~Q56kLLUVxRgpOA) where people around the globe come and help and share their views.\n\n|  |  |\n|---|---| \n| License | ![PyPI - License](https://img.shields.io/pypi/l/zingg) |\n| Language | ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zingg) `Spark` | \n| Latest release | [![PyPI](https://img.shields.io/pypi/v/zingg.svg)](https://pypi.python.org/pypi/zingg/) |\n| Downloads per month | ![PyPI - Downloads](https://img.shields.io/pypi/dm/zingg) |\n| GitHub stars | [![GitHub stars](https://img.shields.io/github/stars/zinggAI/zingg.svg?style=social\u0026label=Star)](https://github.com/zinggAI) |\n\n## Outdated/ no longer available\n\n#### BigMatch (by USA census)\n\nA record linkage tool for use in matching a very large file against a moderate\nsize file developed by the USA Census Bureau. There are several papers\navailable about this program [(BigMatch,\n2007)](https://www.census.gov/srd/papers/pdf/rrc2007-01.pdf)\n\n#### [The Link King](http://the-link-king.party/) \n\nThe Link King’s graphical user interface (GUI) makes record linkage and\nunduplication easy for beginning and advanced users. The software requires a\nSAS license. `SAS`\n\n## Contributing\n\nDo you know an open source and/or free data matching tool? Please open an\nissue or do a Pull Request. The same holds for missing or incomplete\ninformation.\n\nThis project is initiated by the author of the [Python Record Linkage\nToolkit](https://github.com/J535D165/recordlinkage) @J535D165. The aim is to\nget a list and comparison of data matching software. \n\nThis list is licensed under [CC-BY-SA 3.0](http://creativecommons.org/licenses/by-sa/3.0/). \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJ535D165%2Fdata-matching-software","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJ535D165%2Fdata-matching-software","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJ535D165%2Fdata-matching-software/lists"}