{"id":16598458,"url":"https://github.com/robinl/fuzzy_data_matcher","last_synced_at":"2026-04-22T10:32:06.023Z","repository":{"id":151655039,"uuid":"66841534","full_name":"RobinL/fuzzy_data_matcher","owner":"RobinL","description":null,"archived":false,"fork":false,"pushed_at":"2017-11-02T20:33:10.000Z","size":58,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-03-14T05:26:31.554Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RobinL.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-29T12:25:31.000Z","updated_at":"2017-11-27T08:42:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"e0917458-bbc1-46c5-869d-47efdd31c70f","html_url":"https://github.com/RobinL/fuzzy_data_matcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/RobinL/fuzzy_data_matcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobinL%2Ffuzzy_data_matcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobinL%2Ffuzzy_data_matcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobinL%2Ffuzzy_data_matcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobinL%2Ffuzzy_data_matcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RobinL","download_url":"https://codeload.github.com/RobinL/fuzzy_data_matcher/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobinL%2Ffuzzy_data_matcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32132170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-22T08:34:57.708Z","status":"ssl_error","status_checked_at":"2026-04-22T08:34:55.583Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-12T00:08:43.436Z","updated_at":"2026-04-22T10:32:06.008Z","avatar_url":"https://github.com/RobinL.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Packages needed:\n\nPython 3.  If you need a Python 2 version, go to back to [here](https://github.com/RobinL/fuzzy_data_matcher/tree/b4bb9115ce3cbe08036c5efc294799a128174f51) \n\nAnaconda distribution of Python (numpy, pandas)\n\n    pip install metaphone\n    pip install python-levenshtein\n\nYou need fts4 enabled in your sqlite.dll - you might need to replace\nthis in your anaconda installation directory with the one found [here](http://www.sqlite.org/download.html) :\n\nSee also [this](http://stackoverflow.com/questions/3823659/how-to-setup-fts3-fts4-with-python2-7-on-windows) stackoverflow\n\n## Basic usage instructions.\n\nThis repo provides codes that allows you to easily fuzzy match two datasets - i.e. probabalistically match two datasets based on which row in the target dataset is most likely to be a match for each candidate record.  This allows for missing information and misspellings etc.\n\nSee [this example](https://github.com/RobinL/fuzzy_data_matcher/blob/master/Simple%20demo.ipynb) for basic usage instructions.  You should be able to modify this example to your purposes.\n\nYou start with the path to the csv of the two datasets you want to match.  \n\nBy default the algorithm will attempt a fuzzy match on all fields of these datasets.  You can specify `candidate_drop_columns` and `target_drop_columns` if you want to modify this behaviour.\n\nYou can also optionally specify `candidate_dmetaphone_cols` and `target_dmetaphone_cols` if you have columns which contain text information that may have misspellings.  This may improve match rates, and will work on columns such as first name, surname, address, but not on columns such as an ID or other code.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobinl%2Ffuzzy_data_matcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobinl%2Ffuzzy_data_matcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobinl%2Ffuzzy_data_matcher/lists"}