{"id":17011253,"url":"https://github.com/poke1024/simtrie","last_synced_at":"2025-10-26T10:11:54.842Z","repository":{"id":93584789,"uuid":"184074920","full_name":"poke1024/simtrie","owner":"poke1024","description":"An efficient data structure for fast string similarity searches","archived":false,"fork":false,"pushed_at":"2021-02-08T14:14:19.000Z","size":38,"stargazers_count":22,"open_issues_count":2,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-12T08:41:17.486Z","etag":null,"topics":["damerau-levenshtein-distance","edit-distance","fuzzy-matching","levenshtein-distance","prefix-tree","python","spell-check","spelling-correction","trie"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/poke1024.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-04-29T13:19:07.000Z","updated_at":"2024-11-28T16:34:20.000Z","dependencies_parsed_at":"2023-08-26T19:48:50.144Z","dependency_job_id":null,"html_url":"https://github.com/poke1024/simtrie","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/poke1024/simtrie","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/poke1024%2Fsimtrie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/poke1024%2Fsimtrie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/poke1024%2Fsimtrie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/poke1024%2Fsimtrie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/poke1024","download_url":"https://codeload.github.com/poke1024/simtrie/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/poke1024%2Fsimtrie/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281087727,"owners_count":26441636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-26T02:00:06.575Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["damerau-levenshtein-distance","edit-distance","fuzzy-matching","levenshtein-distance","prefix-tree","python","spell-check","spelling-correction","trie"],"created_at":"2024-10-14T06:06:35.768Z","updated_at":"2025-10-26T10:11:54.836Z","avatar_url":"https://github.com/poke1024.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# simtrie\n\n`simtrie` is a library for fast, highly\nconfigurable approximate string similarity\nsearches.\n\nHere's a simple example:\n\n```\nimport timeit\n\nfrom nltk.corpus import wordnet as wn\nlemmas = list(set(i for i in wn.words()))\n\nprint('creating set of %d words' % len(lemmas))\n\u003e\u003e creating set of 147306 words\ns = simtrie.Set(lemmas)\n\ndef search():\n\treturn list(s.similar(\"bookish\", 2))\n\nprint(timeit.timeit(stmt=search, number=1))\n\u003e\u003e 0.00041486499958409695\n\nprint(search())\n\u003e\u003e [('blockish', 2.0), ('bookie', 2.0), ('booking', 2.0), ('bookish', 0.0), ('boorish', 1.0), ('boxfish', 2.0), ('boyish', 2.0), ('foolish', 2.0), ('goodish', 2.0), ('monkish', 2.0), ('moorish', 2.0)]\n\n```\n\n`simtrie` allows you to fine-tune searches using custom\nweighted metrics:\n\n```\nmetric = simtrie.Metric(\n    (('c', None), 1.9),  # deletion cost\n    (('ab', 'ba'), 1.5)  # transpose cost\n)\n\ns.similar(\"bookish\", 2, metric, allow_transpose=True)\n```\n\nSome of simtrie's features:\n\n* Stores string sets and dicts in ram using a prefix tree\n* Fast, configurable similarity searches over large sets\n* Pythonic API similar to regular `set` and `dict`\n* Supports transpose, split and union weights\n\nNote: binary data files are not portable between machine\narchitectures (they are either little or big endian).\n\n# Credits\n\n`simtrie` is a fork of https://github.com/pytries/DAWG. Its internal\ndata structure is a very clever C++ implementation of a DAFSA by Susumu Yata.\n\nVarious test cases and ideas were taken from the super clean\nimplementation at https://github.com/infoscout/weighted-levenshtein/.\n\n# Similar Projects\n\n* https://github.com/wolfgarbe/SymSpell\n\n# License\n\nPython code is licensed under the MIT License.\n\nBundled `dawgdic`_ C++ library and C++ extensions\nfor simtrie are licensed under the BSD license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpoke1024%2Fsimtrie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpoke1024%2Fsimtrie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpoke1024%2Fsimtrie/lists"}