{"id":21961746,"url":"https://github.com/daac-tools/python-daachorse","last_synced_at":"2026-03-15T02:22:35.103Z","repository":{"id":64820336,"uuid":"500005138","full_name":"daac-tools/python-daachorse","owner":"daac-tools","description":"🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)","archived":false,"fork":false,"pushed_at":"2025-03-15T09:12:55.000Z","size":3376,"stargazers_count":16,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T03:11:10.603Z","etag":null,"topics":["aho-corasick","double-array","finite-state-machine","python","search","substring-matching","text-processing"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daac-tools.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-05T05:05:27.000Z","updated_at":"2025-03-15T08:49:14.000Z","dependencies_parsed_at":"2023-02-17T16:46:01.928Z","dependency_job_id":null,"html_url":"https://github.com/daac-tools/python-daachorse","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-daachorse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-daachorse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-daachorse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daac-tools%2Fpython-daachorse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daac-tools","download_url":"https://codeload.github.com/daac-tools/python-daachorse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250506541,"owners_count":21441795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aho-corasick","double-array","finite-state-machine","python","search","substring-matching","text-processing"],"created_at":"2024-11-29T10:17:47.443Z","updated_at":"2026-03-15T02:22:35.051Z","avatar_url":"https://github.com/daac-tools.png","language":"Rust","readme":"# python-daachorse\n\n[daachorse](https://github.com/daac-tools/daachorse) is a fast implementation of the Aho-Corasick algorithm using the compact double-array data structure.\nThis is a Python wrapper.\n\n[![PyPI](https://img.shields.io/pypi/v/daachorse)](https://pypi.org/project/daachorse/)\n[![Build Status](https://github.com/vbkaisetsu/python-daachorse/actions/workflows/CI.yml/badge.svg)](https://github.com/vbkaisetsu/python-daachorse/actions)\n[![Documentation Status](https://readthedocs.org/projects/python-daachorse/badge/?version=latest)](https://python-daachorse.readthedocs.io/en/latest/?badge=latest)\n\n## Installation\n\n### Install pre-built package from PyPI\n\nRun the following command:\n\n```\n$ pip install daachorse\n```\n\n### Build from source\n\nYou need to install the Rust compiler following [the documentation](https://www.rust-lang.org/tools/install) beforehand.\ndaachorse uses `pyproject.toml`, so you also need to upgrade pip to version 19 or later.\n\n```\n$ pip install --upgrade pip\n```\n\nAfter setting up the environment, you can install daachorse as follows:\n\n```\n$ pip install git+https://github.com/daac-tools/python-daachorse\n```\n\n## Example usage\n\nDaachorse contains some search options,\nranging from basic matching with the Aho-Corasick algorithm to trickier matching.\nAll of them will run very fast based on the double-array data structure and\ncan be easily plugged into your application as shown below.\n\n### Finding overlapped occurrences\n\nTo search for all occurrences of registered patterns\nthat allow for positional overlap in the input text,\nuse `find_overlapping()`. When you instantiate a new automaton,\nunique identifiers are assigned to each pattern in the input order.\nThe match result has the character positions of the occurrence and its identifier.\n\n```python\n\u003e\u003e import daachorse\n\u003e\u003e patterns = ['bcd', 'ab', 'a']\n\u003e\u003e pma = daachorse.Automaton(patterns)\n\u003e\u003e pma.find_overlapping('abcd')\n[(0, 1, 2), (0, 2, 1), (1, 4, 0)]\n```\n\n### Finding non-overlapped occurrences with standard matching\n\nIf you do not want to allow positional overlap, use `find()` instead.\nIt performs the search on the Aho-Corasick automaton\nand reports patterns first found in each iteration.\n\n```python\n\u003e\u003e import daachorse\n\u003e\u003e patterns = ['bcd', 'ab', 'a']\n\u003e\u003e pma = daachorse.Automaton(patterns)\n\u003e\u003e pma.find('abcd')\n[(0, 1, 2), (1, 4, 0)]\n```\n\n### Finding non-overlapped occurrences with longest matching\n\nIf you want to search for the longest pattern without positional overlap in each iteration,\nuse `MATCH_KIND_LEFTMOST_LONGEST` in the construction.\n\n```python\n\u003e\u003e import daachorse\n\u003e\u003e patterns = ['ab', 'a', 'abcd']\n\u003e\u003e pma = daachorse.Automaton(patterns, daachorse.MATCH_KIND_LEFTMOST_LONGEST)\n\u003e\u003e pma.find('abcd')\n[(0, 4, 2)]\n```\n\n### Finding non-overlapped occurrences with leftmost-first matching\n\nIf you want to find the the earliest registered pattern\namong ones starting from the search position,\nuse `MATCH_KIND_LEFTMOST_FIRST`.\n\nThis is so-called *the leftmost first match*, a bit tricky search option.\nFor example, in the following code,\n`ab` is reported because it is the earliest registered one.\n\n```python\n\u003e\u003e import daachorse\n\u003e\u003e patterns = ['ab', 'a', 'abcd']\n\u003e\u003e pma = daachorse.Automaton(patterns, daachorse.MATCH_KIND_LEFTMOST_FIRST)\n\u003e\u003e pma.find('abcd')\n[(0, 2, 0)]\n```\n\n## License\n\nLicensed under either of\n\n * Apache License, Version 2.0\n   ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)\n * MIT license\n   ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)\n\nat your option.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaac-tools%2Fpython-daachorse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaac-tools%2Fpython-daachorse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaac-tools%2Fpython-daachorse/lists"}