{"id":19787447,"url":"https://github.com/codeplea/ahocorasickphp","last_synced_at":"2025-04-30T23:33:35.755Z","repository":{"id":44387846,"uuid":"141845294","full_name":"codeplea/ahocorasickphp","owner":"codeplea","description":"Aho-Corasick multi-keyword string searching library in PHP.","archived":false,"fork":false,"pushed_at":"2018-10-01T16:29:59.000Z","size":240,"stargazers_count":185,"open_issues_count":2,"forks_count":16,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-22T11:18:02.256Z","etag":null,"topics":["aho-corasick","ahocorasick","algorithm","php","search-algorithm","string-search"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"zlib","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codeplea.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-21T20:53:40.000Z","updated_at":"2025-03-23T09:00:13.000Z","dependencies_parsed_at":"2022-07-14T17:17:57.304Z","dependency_job_id":null,"html_url":"https://github.com/codeplea/ahocorasickphp","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeplea%2Fahocorasickphp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeplea%2Fahocorasickphp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeplea%2Fahocorasickphp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codeplea%2Fahocorasickphp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codeplea","download_url":"https://codeload.github.com/codeplea/ahocorasickphp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251800761,"owners_count":21645964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aho-corasick","ahocorasick","algorithm","php","search-algorithm","string-search"],"created_at":"2024-11-12T06:23:09.637Z","updated_at":"2025-04-30T23:33:35.390Z","avatar_url":"https://github.com/codeplea.png","language":"PHP","readme":"\n# Aho Corasick in PHP\n\nThis is a small library which implements the [Aho-Corasick string\nsearch\nalgorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm).\n\nIt's coded in pure PHP and self-contained in a single file, `ahocorasick.php`.\n\nIt's useful when you want to search for many keywords all at once. It's faster\nthan simply calling `strpos` many times, and it's much faster than calling\n`preg_match_all` with something like `/keyword1|keyword2|...|keywordn/`.\n\nI originally wrote this to use with [F5Bot](https://f5bot.com), since it's\nsearching for the same set of a few thousand keywords over and over again.\n\n# Usage\n\nIt's designed to be really easy to use. You create the `ahocorasick` object,\nadd your keywords, call `finalize()` to finish setup, and then search your\ntext. It'll return an array of the keywords found and their position in the\nsearch text.\n\nCreate, add keywords, and `finalize()`:\n\n```php\nrequire('ahocorasick.php');\n\n$ac = new ahocorasick();\n\n$ac-\u003eadd_needle('art');\n$ac-\u003eadd_needle('cart');\n$ac-\u003eadd_needle('ted');\n\n$ac-\u003efinalize();\n\n```\n\nCall `search()` to preform the actual search. It'll return an array of matches.\n\n```php\n$found = $ac-\u003esearch('a carted mart lot one blue ted');\nprint_r($found);\n```\n\n`$found` will be an array with these elements:\n\n```\n[0] =\u003e Array\n    (\n        [0] =\u003e cart\n        [1] =\u003e 2\n    )\n[1] =\u003e Array\n    (\n        [0] =\u003e art\n        [1] =\u003e 3\n    )\n[2] =\u003e Array\n    (\n        [0] =\u003e ted\n        [1] =\u003e 5\n    )\n[3] =\u003e Array\n    (\n        [0] =\u003e art\n        [1] =\u003e 10\n    )\n[4] =\u003e Array\n    (\n        [0] =\u003e ted\n        [1] =\u003e 27\n    )\n```\n\nSee `example.php` for a complete example.\n\n# Speed\n\nA simple benchmarking program is included which compares various alternatives.\n\n```\n$ php benchmark.php\nLoaded 3000 keywords to search on a text of 19377 characters.\n\nSearching with strpos...\ntime: 0.38440799713135\n\nSearching with preg_match...\ntime: 5.6817619800568\n\nSearching with preg_match_all...\ntime: 5.0735609531403\n\nSearching with aho corasick...\ntime: 0.054709911346436\n\n```\n\nNote: the regex solutions are actually slightly broken. They won't work if you\nhave a keyword that is a prefix or suffix of another. But hey, who really uses\nregex when it's not slightly broken?\n\nAlso keep in mind that building the search tree (the `add_needle()` and\n`finalize()` calls) takes time. So you'll get the best speed-up if you're\nreusing the same keywords and calling `search()` many times.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeplea%2Fahocorasickphp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodeplea%2Fahocorasickphp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodeplea%2Fahocorasickphp/lists"}