{"id":50256635,"url":"https://github.com/G-Research/ahocorasick_rs","last_synced_at":"2026-06-12T22:00:50.605Z","repository":{"id":37089790,"uuid":"358014052","full_name":"G-Research/ahocorasick_rs","owner":"G-Research","description":"Check for multiple patterns in a single string at the same time: a fast Aho-Corasick algorithm for Python","archived":false,"fork":false,"pushed_at":"2026-05-27T12:29:42.000Z","size":341,"stargazers_count":228,"open_issues_count":9,"forks_count":18,"subscribers_count":16,"default_branch":"main","last_synced_at":"2026-06-11T18:25:49.536Z","etag":null,"topics":["aho-corasick","pattern-matching","python","rust"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/G-Research.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-04-14T19:06:11.000Z","updated_at":"2026-06-10T03:26:52.000Z","dependencies_parsed_at":"2025-12-18T01:13:37.191Z","dependency_job_id":null,"html_url":"https://github.com/G-Research/ahocorasick_rs","commit_stats":{"total_commits":147,"total_committers":3,"mean_commits":49.0,"dds":0.3945578231292517,"last_synced_commit":"839a84f828b0caa24f2b19f7ee202d21cf501ff6"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/G-Research/ahocorasick_rs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2Fahocorasick_rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2Fahocorasick_rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2Fahocorasick_rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2Fahocorasick_rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/G-Research","download_url":"https://codeload.github.com/G-Research/ahocorasick_rs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/G-Research%2Fahocorasick_rs/sbom","scorecard":{"id":53702,"data":{"date":"2025-08-11","repo":{"name":"github.com/G-Research/ahocorasick_rs","commit":"fdf9f5d87ecade281a960553d11599665888399d"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.6,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/6 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":5,"reason":"4 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 5","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/main.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/main.yml:87: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/main.yml:98: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/main.yml:32: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/main.yml:36: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/main.yml:39: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/main.yml:56: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/main.yml:68: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/main.yml:74: update your workflow using https://app.stepsecurity.io/secureworkflow/G-Research/ahocorasick_rs/main.yml/main?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/main.yml:42","Warn: pipCommand not pinned by hash: .github/workflows/main.yml:43","Warn: pipCommand not pinned by hash: .github/workflows/main.yml:65","Info:   0 out of   5 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 third-party GitHubAction dependencies pinned","Info:   0 out of   3 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Packaging","score":10,"reason":"packaging workflow detected","details":["Info: Project packages its releases by way of GitHub Actions.: .github/workflows/main.yml:79"],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Vulnerabilities","score":9,"reason":"1 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2024-48 / GHSA-fj7x-q9j7-g6q6"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 30 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-15T00:13:40.034Z","repository_id":37089790,"created_at":"2025-08-15T00:13:40.035Z","updated_at":"2025-08-15T00:13:40.035Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34263874,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aho-corasick","pattern-matching","python","rust"],"created_at":"2026-05-27T06:31:27.811Z","updated_at":"2026-06-12T22:00:50.597Z","avatar_url":"https://github.com/G-Research.png","language":"Python","funding_links":[],"categories":["Search \u0026 Indexing"],"sub_categories":[],"readme":"# ahocorasick_rs: Quickly search for multiple substrings at once\n\n`ahocorasick_rs` allows you to search for multiple substrings (\"patterns\") in a given string (\"haystack\") using variations of the [Aho-Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm).\n\nIn particular, it's implemented as a wrapper of the Rust [`aho-corasick`](https://docs.rs/aho-corasick/) library, and provides a faster alternative to the [`pyahocorasick`](https://pyahocorasick.readthedocs.io/) library.\n\nFound any problems or have any questions? [File an issue on the GitHub project](https://github.com/G-Research/ahocorasick_rs).\n\n* [Quickstart](#quickstart)\n* [Choosing the matching algorithm](#matching)\n* [Additional configuration: speed and memory usage tradeoffs](#configuration2)\n* [Implementation details](#implementation)\n* [Benchmarks](#benchmarks)\n\n## Quickstart \u003ca name=\"quickstart\"\u003e\u003c/a\u003e\n\nThe `ahocorasick_rs` library allows you to search for multiple strings (\"patterns\") within a haystack, or alternatively search multiple bytes.\nFor example, let's install the library:\n\n```shell-session\n$ pip install ahocorasick-rs\n```\n\n### Searching strings\n\nWe can construct a `AhoCorasick` object:\n\n```python\n\u003e\u003e\u003e import ahocorasick_rs\n\u003e\u003e\u003e patterns = [\"hello\", \"world\", \"fish\"]\n\u003e\u003e\u003e haystack = \"this is my first hello world. hello!\"\n\u003e\u003e\u003e ac = ahocorasick_rs.AhoCorasick(patterns)\n```\n\nYou can construct a `AhoCorasick` object from any iterable (including generators), not just lists:\n\n```python\n\u003e\u003e\u003e ac = ahocorasick_rs.AhoCorasick((p.lower() for p in patterns))\n```\n\n`AhoCorasick.find_matches_as_indexes()` returns a list of tuples, each tuple being:\n\n1. The index of the found pattern inside the list of patterns.\n2. The start index of the pattern inside the haystack.\n3. The end index of the pattern inside the haystack.\n\n```python\n\u003e\u003e\u003e ac.find_matches_as_indexes(haystack)\n[(0, 17, 22), (1, 23, 28), (0, 30, 35)]\n\u003e\u003e\u003e patterns[0], patterns[1], patterns[0]\n('hello', 'world', 'hello')\n\u003e\u003e\u003e haystack[17:22], haystack[23:28], haystack[30:35]\n('hello', 'world', 'hello')\n```\n\n`find_matches_as_strings()` returns a list of found patterns:\n\n```python\n\u003e\u003e\u003e ac.find_matches_as_strings(haystack)\n['hello', 'world', 'hello']\n```\n\n### Searching `bytes` and other similar objects\n\nYou can also search `bytes`, `bytearray`, `memoryview`, and other objects supporting the Python buffer API.\n\n\u003e **IMPORTANT:** If you are searching mutable buffer, you **must not mutate it in another thread** while `find_matches_as_indexes()` is running.\n\u003e Similarly, the patterns cannot be mutated while the `BytesAhoCorasick` object is being constructed.\n\n```python\n\u003e\u003e\u003e patterns = [b\"hello\", b\"world\"]\n\u003e\u003e\u003e ac = ahocorasick_rs.BytesAhoCorasick(patterns)\n\u003e\u003e\u003e haystack = b\"hello world\"\n\u003e\u003e\u003e ac.find_matches_as_indexes(b\"hello world\")\n[(0, 0, 5), (1, 6, 11)]\n\u003e\u003e\u003e patterns[0], patterns[1]\n(b'hello', b'world')\n\u003e\u003e\u003e haystack[0:5], haystack[6:11]\n(b'hello', b'world')\n```\n\nThe `find_matches_as_strings()` API is not supported by `BytesAhoCorasick`.\n\n## Choosing the matching algorithm \u003ca name=\"matching\"\u003e\u003c/a\u003e\n\n### Match kind\n\nThere are three ways you can configure matching in cases where multiple patterns overlap, supported by both `AhoCorasick` and `BytesAhoCorasick` objects.\nFor a more in-depth explanation, see the [underlying Rust library's documentation of matching](https://docs.rs/aho-corasick/latest/aho_corasick/enum.MatchKind.html).\n\nAssume we have this starting point:\n\n```python\n\u003e\u003e\u003e from ahocorasick_rs import AhoCorasick, MatchKind\n```\n\n#### `Standard` (the default)\n\nThis returns the pattern that matches first, semantically-speaking.\nThis is the default matching pattern.\n\n```python\n\u003e\u003e\u003e ac AhoCorasick([\"disco\", \"disc\", \"discontent\"])\n\u003e\u003e\u003e ac.find_matches_as_strings(\"discontent\")\n['disc']\n\u003e\u003e\u003e ac = AhoCorasick([\"b\", \"abcd\"])\n\u003e\u003e\u003e ac.find_matches_as_strings(\"abcdef\")\n['b']\n```\n\nIn this case `disc` will match before `disco` or `discontent`.\n\nSimilarly, `b` will match before `abcd` because it ends earlier in the haystack than `abcd` does:\n\n```python\n\u003e\u003e\u003e ac = AhoCorasick([\"b\", \"abcd\"])\n\u003e\u003e\u003e ac.find_matches_as_strings(\"abcdef\")\n['b']\n```\n\n#### `LeftmostFirst`\n\nThis returns the leftmost-in-the-haystack matching pattern that appears first in _the list of given patterns_.\nThat means the order of patterns makes a difference:\n\n```python\n\u003e\u003e\u003e ac = AhoCorasick([\"disco\", \"disc\"], matchkind=MatchKind.LeftmostFirst)\n\u003e\u003e\u003e ac.find_matches_as_strings(\"discontent\")\n['disco']\n\u003e\u003e\u003e ac = AhoCorasick([\"disc\", \"disco\"], matchkind=MatchKind.LeftmostFirst)\n['disc']\n```\n\nHere we see `abcd` matched first, because it starts before `b`:\n\n```python\n\u003e\u003e\u003e ac = AhoCorasick([\"b\", \"abcd\"], matchkind=MatchKind.LeftmostFirst)\n\u003e\u003e\u003e ac.find_matches_as_strings(\"abcdef\")\n['abcd']\n```\n##### `LeftmostLongest`\n\nThis returns the leftmost-in-the-haystack matching pattern that is longest:\n\n```python\n\u003e\u003e\u003e ac = AhoCorasick([\"disco\", \"disc\", \"discontent\"], matchkind=MatchKind.LeftmostLongest)\n\u003e\u003e\u003e ac.find_matches_as_strings(\"discontent\")\n['discontent']\n```\n\n### Overlapping matches\n\nYou can get all overlapping matches, instead of just one of them, but only if you stick to the default matchkind, `MatchKind.Standard`.\nAgain, this is supported by both `AhoCorasick` and `BytesAhoCorasick`.\n\n```python\n\u003e\u003e\u003e from ahocorasick_rs import AhoCorasick\n\u003e\u003e\u003e patterns = [\"winter\", \"onte\", \"disco\", \"discontent\"]\n\u003e\u003e\u003e ac = AhoCorasick(patterns)\n\u003e\u003e\u003e ac.find_matches_as_strings(\"discontent\", overlapping=True)\n['disco', 'onte', 'discontent']\n```\n\n## Additional configuration: speed and memory usage tradeoffs \u003ca name=\"configuration2\"\u003e\u003c/a\u003e\n\n### Algorithm implementations: trading construction speed, memory, and performance (`AhoCorasick` and `BytesAhoCorasick`)\n\nYou can choose the type of underlying automaton to use, with different performance tradeoffs.\nThe short version: if you want maximum matching speed, and you don't have too many patterns, try the `Implementation.DFA` implementation and see if it helps.\n\nThe underlying Rust library supports [four choices](https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoCorasickBuilder.html#method.kind), which are exposed as follows:\n\n* `None` uses a heuristic to choose the \"best\" Aho-Corasick implementation for the given patterns, balancing construction time, memory usage, and matching speed.\n  This is the default.\n* `Implementation.NoncontiguousNFA`: A noncontiguous NFA is the fastest to be built, has moderate memory usage and is typically the slowest to execute a search.\n* `Implementation.ContiguousNFA`: A contiguous NFA is a little slower to build than a noncontiguous NFA, has excellent memory usage and is typically a little slower than a DFA for a search.\n* `Implementation.DFA`: A DFA is very slow to build, uses exorbitant amounts of memory, but will typically execute searches the fastest.\n\n```python\n\u003e\u003e\u003e from ahocorasick_rs import AhoCorasick, Implementation\n\u003e\u003e\u003e ac = AhoCorasick([\"disco\", \"disc\"], implementation=Implementation.DFA)\n```\n\n### Trading memory for speed (`AhoCorasick` only)\n\nIf you use ``find_matches_as_strings()``, there are two ways strings can be constructed: from the haystack, or by caching the patterns on the object.\nThe former takes more work, the latter uses more memory if the patterns would otherwise have been garbage-collected.\nYou can control the behavior by using the `store_patterns` keyword argument to `AhoCorasick()`.\n\n* ``AhoCorasick(..., store_patterns=None)``: The default.\n  Use a heuristic (currently, whether the total of pattern string lengths is less than 4096 characters) to decide whether to store patterns or not.\n* ``AhoCorasick(..., store_patterns=True)``: Keep references to the patterns, potentially speeding up ``find_matches_as_strings()`` at the cost of using more memory.\n  If this uses large amounts of memory this might actually slow things down due to pressure on the CPU memory cache, and/or the performance benefit might be overwhelmed by the algorithm's search time.\n* ``AhoCorasick(..., store_patterns=False)``: Don't keep references to the patterns, saving some memory but potentially slowing down ``find_matches_as_strings()``, especially when there are only a small number of patterns and you are searching a small haystack.\n\n## Implementation details \u003ca name=\"implementation\"\u003e\u003c/a\u003e\n\n* Matching on strings releases the GIL, to enable concurrency.\n  Matching on bytes does not currently release the GIL for memory-safety reasons, unless the haystack type is `bytes`.\n* Not all features from the underlying library are exposed; if you would like additional features, please [file an issue](https://github.com/g-research/ahocorasick_rs/issues/new) or submit a PR.\n\n## Benchmarks \u003ca name=\"benchmarks\"\u003e\u003c/a\u003e\n\nAs with any benchmark, real-world results will differ based on your particular situation.\nIf performance is important to your application, measure the alternatives yourself!\n\nThat being said, I've seen `ahocorasick_rs` run 1.5× to 7× as fast as `pyahocorasick`, depending on the options used.\nYou can run the included benchmarks, if you want, to see some comparative results locally.\nClone the repository, then:\n\n```\npip install pytest-benchmark ahocorasick_rs pyahocorasick\npytest benchmarks/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FG-Research%2Fahocorasick_rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FG-Research%2Fahocorasick_rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FG-Research%2Fahocorasick_rs/lists"}