{"id":20653485,"url":"https://github.com/t-ski/string-similarity-algorithms","last_synced_at":"2026-04-21T19:34:50.528Z","repository":{"id":241361831,"uuid":"805141700","full_name":"t-ski/string-similarity-algorithms","owner":"t-ski","description":"Common string similarity algorithm implementations.","archived":false,"fork":false,"pushed_at":"2024-06-03T22:54:03.000Z","size":7,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-09T04:21:47.226Z","etag":null,"topics":["nlp","python","string-distance","string-similarity"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/t-ski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-24T00:59:59.000Z","updated_at":"2024-11-24T19:16:24.000Z","dependencies_parsed_at":"2024-06-04T01:06:58.759Z","dependency_job_id":"178df25d-66b7-4cd6-bee2-6019eee54d6e","html_url":"https://github.com/t-ski/string-similarity-algorithms","commit_stats":null,"previous_names":["t-ski/string-similarity-algorithms"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/t-ski/string-similarity-algorithms","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t-ski%2Fstring-similarity-algorithms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t-ski%2Fstring-similarity-algorithms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t-ski%2Fstring-similarity-algorithms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t-ski%2Fstring-similarity-algorithms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/t-ski","download_url":"https://codeload.github.com/t-ski/string-similarity-algorithms/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/t-ski%2Fstring-similarity-algorithms/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32106746,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T11:25:29.218Z","status":"ssl_error","status_checked_at":"2026-04-21T11:25:28.499Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","python","string-distance","string-similarity"],"created_at":"2024-11-16T17:44:30.492Z","updated_at":"2026-04-21T19:34:50.509Z","avatar_url":"https://github.com/t-ski.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# String Similarity Algorithms\n\nCommon string similarity algorithms `sim: (Σ* × Σ*) → [0, 1]`.\n\n\u003e Open value range algorithms (like Hamming) are normalized.\n\n## Hamming \n\n**Complexity:** `O(n)`\n\n``` python\ndef hamming_distance(str1: str, str2: str) -\u003e int\n```\n\n``` python\ndef hamming(str1: str, str2: str) -\u003e float\n```\n\n\u003e The shorter string is padded with blank symbols to apply the algorithm.\n\n## Levenshtein\n\n**Complexity:** `O(n²)`\n\n``` python\ndef levenshtein_distance(str1: str, str2: str) -\u003e int\n```\n\n``` python\ndef levenshtein(str1: str, str2: str) -\u003e float\n```\n\n## Damerau-Levenshtein\n\n**Complexity:** `O(n²)`\n\n``` python\ndef damerau_levenshtein_distance(str1: str, str2: str) -\u003e int\n```\n\n``` python\ndef damerau_levenshtein(str1: str, str2: str) -\u003e float\n```\n\n## Jaro\n\n**Complexity:** `O(n²)`\n\n``` python\ndef jaro(str1: str, str2: str) -\u003e float\n```\n\n## Jaro-Winkler\n\n**Complexity:** `O(n²)`\n\n``` python\ndef jaro_winkler(str1: str, str2: str, p: float = 0.1) -\u003e float\n```\n\n## Jaccard\n\n**Complexity:** `O(n)`\n\n``` python\ndef jaccard(str1: str, str2: str) -\u003e float\n```\n\n\u003e The set based similarity algorithms use character and index combination to mimic set element identity (`{ (character, index) ∀ c ∈ S₁, S₂ }`).\n\n## Sørensen-Dice\n\n**Complexity:** `O(n)`\n\n``` python\ndef sorensen_dice(str1: str, str2: str) -\u003e float\n```\n\n## Szymkiewicz-Simpson\n\n**Complexity:** `O(n)`\n\n``` python\ndef szymkiewicz_simpson(str1: str, str2: str) -\u003e float\n```\n\n\u003e Szymkiewicz-Simpson is also simply known as “overlap”.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ft-ski%2Fstring-similarity-algorithms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ft-ski%2Fstring-similarity-algorithms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ft-ski%2Fstring-similarity-algorithms/lists"}