{"id":13384505,"url":"https://github.com/yougov/Fuzzy","last_synced_at":"2025-03-13T10:31:19.606Z","repository":{"id":53685371,"uuid":"93347990","full_name":"yougov/fuzzy","owner":"yougov","description":null,"archived":false,"fork":false,"pushed_at":"2023-07-24T16:55:00.000Z","size":109,"stargazers_count":50,"open_issues_count":14,"forks_count":13,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-10T06:35:18.434Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yougov.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-06-04T23:34:38.000Z","updated_at":"2024-01-03T14:14:52.000Z","dependencies_parsed_at":"2024-01-18T02:38:13.461Z","dependency_job_id":"2626153a-6b71-4821-a73a-5efc0739770d","html_url":"https://github.com/yougov/fuzzy","commit_stats":{"total_commits":137,"total_committers":7,"mean_commits":"19.571428571428573","dds":0.3211678832116789,"last_synced_commit":"e15b195467223a684a26fadb53997bf6f36be2c4"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yougov%2Ffuzzy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yougov%2Ffuzzy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yougov%2Ffuzzy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yougov%2Ffuzzy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yougov","download_url":"https://codeload.github.com/yougov/fuzzy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243386051,"owners_count":20282679,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T11:00:50.440Z","updated_at":"2025-03-13T10:31:19.240Z","avatar_url":"https://github.com/yougov.png","language":"C","readme":".. image:: https://img.shields.io/pypi/v/Fuzzy.svg\n   :target: https://pypi.org/project/Fuzzy\n\n.. image:: https://img.shields.io/pypi/pyversions/Fuzzy.svg\n\n.. image:: https://img.shields.io/travis/yougov/fuzzy/master.svg\n   :target: http://travis-ci.org/yougov/fuzzy\n\n\nFuzzy is a python library implementing common phonetic algorithms quickly.\nTypically this is in string similarity exercises, but they're pretty versatile.\n\nIt uses C Extensions (via Cython) for speed.\n\nThe algorithms are:\n\n* `Soundex \u003chttp://en.wikipedia.org/wiki/Soundex\u003e`_\n* `NYSIIS \u003chttp://en.wikipedia.org/wiki/NYSIIS\u003e`_\n* `Double Metaphone \u003chttp://en.wikipedia.org/wiki/Metaphone\u003e`_ Based on Maurice\n  Aubrey's C code from his perl implementation.\n\nUsage\n=====\n\nThe functions are quite easy to use!\n\n\u003e\u003e\u003e import fuzzy\n\u003e\u003e\u003e soundex = fuzzy.Soundex(4)\n\u003e\u003e\u003e soundex('fuzzy')\n'F200'\n\u003e\u003e\u003e dmeta = fuzzy.DMetaphone()\n\u003e\u003e\u003e dmeta('fuzzy')\n['FS', None]\n\u003e\u003e\u003e fuzzy.nysiis('fuzzy')\n'FASY'\n\nPerformance\n===========\n\nFuzzy's Double Metaphone was ~10 times faster than the pure python\nimplementation by  `Andrew Collins \u003chttp://www.atomodo.com/code/double-metaphone\u003e`_\nin some recent `testing \u003chttp://chmullig.com/2011/03/pypy-testing/\u003e`_.\nSoundex and NYSIIS should be similarly faster. Using iPython's timeit::\n\n  In [3]: timeit soundex('fuzzy')\n  1000000 loops, best of 3: 326 ns per loop\n\n  In [4]: timeit dmeta('fuzzy')\n  100000 loops, best of 3: 2.18 us per loop\n\n  In [5]: timeit fuzzy.nysiis('fuzzy')\n  100000 loops, best of 3: 13.7 us per loop\n\n\nDistance Metrics\n================\n\nWe recommend the `Python-Levenshtein \u003chttp://code.google.com/p/pylevenshtein/\u003e`_\nmodule for fast, C based string distance/similarity metrics. Among others\nfunctions it includes:\n\n * `Levenshtein \u003chttp://en.wikipedia.org/wiki/Levenshtein_distance\u003e`_ edit distance\n * `Jaro \u003chttp://en.wikipedia.org/wiki/Jaro_distance\u003e`_ distance\n * `Jaro-Winkler \u003chttp://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance\u003e`_ distance\n * `Hamming distance \u003chttp://en.wikipedia.org/wiki/Hamming_distance\u003e`_\n\nIn testing it's been several times faster than comparable pure python\nimplementations of those algorithms.\n","funding_links":[],"categories":["Feature Extraction"],"sub_categories":["Text/NLP"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyougov%2FFuzzy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyougov%2FFuzzy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyougov%2FFuzzy/lists"}