{"id":34250188,"url":"https://github.com/yutanagano/symscan","last_synced_at":"2025-12-16T09:14:25.747Z","repository":{"id":266570731,"uuid":"882763863","full_name":"yutanagano/symscan","owner":"yutanagano","description":"Fast discovery of similar strings in bulk","archived":false,"fork":false,"pushed_at":"2025-12-12T23:29:49.000Z","size":10963,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-12-13T00:08:34.735Z","etag":null,"topics":["edit-distance","levenshtein-distance","string-matching","string-search","string-similarity"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yutanagano.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-03T17:42:28.000Z","updated_at":"2025-12-12T23:28:26.000Z","dependencies_parsed_at":"2025-12-13T02:03:11.019Z","dependency_job_id":null,"html_url":"https://github.com/yutanagano/symscan","commit_stats":null,"previous_names":["yutanagano/nearust","yutanagano/symscan"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/yutanagano/symscan","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yutanagano%2Fsymscan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yutanagano%2Fsymscan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yutanagano%2Fsymscan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yutanagano%2Fsymscan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yutanagano","download_url":"https://codeload.github.com/yutanagano/symscan/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yutanagano%2Fsymscan/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27761772,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-16T02:00:10.477Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["edit-distance","levenshtein-distance","string-matching","string-search","string-similarity"],"created_at":"2025-12-16T09:14:25.093Z","updated_at":"2025-12-16T09:14:25.733Z","avatar_url":"https://github.com/yutanagano.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SymScan\n\n### Check out the [documentation page](https://symscan.readthedocs.io).\n\n**SymScan** enables extremely fast discovery of pairs of similar strings within\nand across large collections.\n\nSymScan is a variation on the [symmetric deletion\n](https://seekstorm.com/blog/1000x-spelling-correction/) algorithm that is\noptimised for bulk-searching similar strings within one or across two large\nstring collections at once (e.g. searching for similar protein sequences among\na collection of 10M). The key algorithmic difference between SymScan and\ntraditional symmetric deletion is the use of a [sort-merge\njoin](https://en.wikipedia.org/wiki/Sort-merge_join) approach in place of hash\nmaps to discover input strings that share common deletion variants. This\nsort-and-scan approach trades off an additional factor of O(log N) (with N the\ntotal number of strings being compared) in expected time complexity for\nimproved cache locality and effective parallelization, and ends up being much\nfaster for the above use case.\n\n## Installing\n\n### CLI\n\n```sh\nbrew install yutanagano/tap/symscan-cli\n```\n\n### Rust library\n\n```sh\ncargo add symscan\n```\n\n### Python package\n\n```sh\npip install symscan\n```\n\n## Licensing\n\nSymScan is dual-licensed under the MIT and Apache 2.0 licenses. Unless\nexplicitly stated otherwise, any contribution submitted by you, as defined in\nthe Apache license, shall be dual-licensed as above, without any additional\nterms and conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyutanagano%2Fsymscan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyutanagano%2Fsymscan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyutanagano%2Fsymscan/lists"}