{"id":13700144,"url":"https://github.com/ph4r05/php_aho_corasick","last_synced_at":"2025-04-09T19:51:14.039Z","repository":{"id":6075199,"uuid":"7301367","full_name":"ph4r05/php_aho_corasick","owner":"ph4r05","description":"Aho-Corasick string search algorithm PHP extension implementation.","archived":false,"fork":false,"pushed_at":"2020-10-08T14:51:25.000Z","size":169,"stargazers_count":49,"open_issues_count":1,"forks_count":15,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-23T21:45:56.156Z","etag":null,"topics":["aho-corasick","algorithm","automata","hacktoberfest","pecl","php","php-extension","php5","php7","string-matching"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"timkay/solo","license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ph4r05.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-12-24T02:52:11.000Z","updated_at":"2024-03-16T20:49:22.000Z","dependencies_parsed_at":"2022-09-06T18:51:35.938Z","dependency_job_id":null,"html_url":"https://github.com/ph4r05/php_aho_corasick","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ph4r05%2Fphp_aho_corasick","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ph4r05%2Fphp_aho_corasick/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ph4r05%2Fphp_aho_corasick/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ph4r05%2Fphp_aho_corasick/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ph4r05","download_url":"https://codeload.github.com/ph4r05/php_aho_corasick/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103907,"owners_count":21048244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aho-corasick","algorithm","automata","hacktoberfest","pecl","php","php-extension","php5","php7","string-matching"],"created_at":"2024-08-02T20:00:49.032Z","updated_at":"2025-04-09T19:51:14.015Z","avatar_url":"https://github.com/ph4r05.png","language":"C","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026business=XK6RLD768RGGJ\u0026lc=SK\u0026item_name=ph4r05\u0026item_number=php_aho_corasick%2egit\u0026currency_code=EUR\u0026bn=PP%2dDonationsBF%3abtn_donateCC_LG%2egif%3aNonHosted"],"categories":["字符串处理"],"sub_categories":[],"readme":"# php_aho_corasick\n[![Build Status](https://travis-ci.org/ph4r05/php_aho_corasick.svg?branch=master)](https://travis-ci.org/ph4r05/php_aho_corasick)\n[![Coverity Status](https://scan.coverity.com/projects/7177/badge.svg)](https://scan.coverity.com/projects/ph4r05-php_aho_corasick)\n\nPHP extension implementing Aho-Corasick pattern matching algorithm (more on [wiki]).\n\nIs especially effective if there is a large database of needles (=strings to be searched, for example virus signatures). \nAnother advantage is that built search structure is initialized before search in separate call thus it can be called\nmore times with different haystack, saving time.\n\nComputing Aho-Corasick in th native code (PHP extension) rather than in a pure PHP manner gives this implementation \nsignificant performance boost.\n\n## Dependencies\nThis project is simple PHP wrapper of (or interface to) another project: [MultiFast]. Sources include MultiFast library v 2.0.\nNo extra dependencies are required. [MultiFast] library is wrapped as PHP extension loadable to PHP.\n\nSource of inspiration for this project was a great [tutorial].\n\nCompatible with PHP 5.3+ and PHP 7.0+.\n\n## PECL \u0026 Licensing\nThe original project [MultiFast] is licensed under LGPLv3 so this PHP wrapper is also licensed under LGPLv3.\nThanks to the [author] of the [MultiFast], Kamiar Kanani, who gave me a [permission] to license the code under PHP License 3.01 for the purpose\nof adding this extension to PECL repository. \n\nhttps://pecl.php.net/package/ahocorasick\n\nPecl installation:\n\n```bash\npecl install channel://pecl.php.net/ahocorasick-0.0.7\n```\n\nNote the `php-dev` (or `php-devel`, depends on your distribution) is required for pecl package to compile.\n\n## Build\n```bash\nphpize\n./configure --enable-ahocorasick\nmake\n```\n\n## Docker build\n\n```bash\n$ docker build -t=\"ahoc\" .\n$ docker run -i -t ahoc\n$ php -d extension=modules/ahocorasick.so -f examples/test.php\n```\n\nInstall debugging tools, remote debugging\n\n```bash\n$ docker build -t=\"ahoc\" --build-arg DEVEL_TOOLS=1 .\n$ docker run -i --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t ahoc\n$ docker run -i --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --mount type=bind,src=`pwd`,dst=/aho -t ahoc\n```\n\nRecompiling, or cache busting:\n\n```bash\n$ docker build -t=\"ahoc\" --build-arg DIR_BUSTER=`date`.\n```\n\n## Usage\nThis extension is case sensitive, thus if you want case insensitive, convert every string input to this algorithm to \nlowercase (use mb_strtolower() for example).\n\nFor more usage examples, see provided testing examples.\n\n`examples/test.php`:\n```php\n$data = array(\n  \tarray('key'=\u003e'ab', 'value'=\u003e'alfa'),\n\t\tarray('key'=\u003e'ac', 'value'=\u003e'beta'),\n\t\tarray('key'=\u003e'ad', 'value'=\u003e'gamma', 'aux'=\u003earray(1)),\n\t\tarray('key'=\u003e'ae', 'value'=\u003e'delta'),\n\t\tarray('id'=\u003e0, 'value'=\u003e'zeta'),\n\t\tarray('key'=\u003e'ag', 'value'=\u003e'omega'),\n\t\tarray('value'=\u003e'lfa')\n\t     );\n\n// initialize search , returns resourceID for search structure\n$c = ahocorasick_init($data);\n\n// perform search\n$d1 = ahocorasick_match(\"alFABETA gamma zetaomegaalfa!\", $c);\n\n// deinitialize search structure (will free memory)\nahocorasick_deinit($c);\n\nvar_dump($d1);\n```\n\nCall with:\n```bash\nphp -d extension=modules/ahocorasick.so -f examples/test.php\n```\n\nResults with:\n```\narray(5) {\n  [0]=\u003e\n  array(5) {\n    [\"pos\"]=\u003e\n    int(14)\n    [\"key\"]=\u003e\n    string(2) \"ad\"\n    [\"aux\"]=\u003e\n    array(1) {\n      [0]=\u003e\n      int(1)\n    }\n    [\"start_postion\"]=\u003e\n    int(9)\n    [\"value\"]=\u003e\n    string(5) \"gamma\"\n  }\n  [1]=\u003e\n  array(4) {\n    [\"pos\"]=\u003e\n    int(19)\n    [\"keyIdx\"]=\u003e\n    int(0)\n    [\"start_postion\"]=\u003e\n    int(15)\n    [\"value\"]=\u003e\n    string(4) \"zeta\"\n  }\n  [2]=\u003e\n  array(4) {\n    [\"pos\"]=\u003e\n    int(24)\n    [\"key\"]=\u003e\n    string(2) \"ag\"\n    [\"start_postion\"]=\u003e\n    int(19)\n    [\"value\"]=\u003e\n    string(5) \"omega\"\n  }\n  [3]=\u003e\n  array(4) {\n    [\"pos\"]=\u003e\n    int(28)\n    [\"key\"]=\u003e\n    string(2) \"ab\"\n    [\"start_postion\"]=\u003e\n    int(24)\n    [\"value\"]=\u003e\n    string(4) \"alfa\"\n  }\n  [4]=\u003e\n  array(3) {\n    [\"pos\"]=\u003e\n    int(28)\n    [\"start_postion\"]=\u003e\n    int(25)\n    [\"value\"]=\u003e\n    string(3) \"lfa\"\n  }\n}\n```\n\n## Benchmark\nIn this repo you can find `examples/benchmark.php` file, with this you can perform your own benchmark and measure speed up.\n\nMy setup generates random haystacks and needles from alphabet=\"abcdef\". There is performed 5 measurements of time spent by search and average is computed.\nSearch structure construction is conted to time measurements.\n\nScript generates:\n  * 256 random haystacks of size 8192 characters\n  * 2048 needles with 16 characters.\n\nPrinciple:\n  * Naive approach simply iterates over haystacks and needles, search is performed with strpos().\n  * Aho-Corasick approach constructs search structure, then all haystacks are searched for needles.\n\nResults:\n```\n$\u003e php -d extension=modules/ahocorasick.so -f examples/benchmark.php\nClassic search; sampleCount: 10; keySize: 2048; timeAvg: 13.060877\nAhoCorasick search; sampleCount: 10; keySize: 2048; timeAvg: 0.174326 s, totalTime: 1.743264 s, memory increase: 272 B\nAhoCorasick pattern matching is 74.921962 times faster than naive approach\n```\n\nSpeedup: 74x compared to the naive approach.\n\n## API\nDocumentation writing is in progress.\n\nBasic ideas of the API:\n* AhoCorasick pattern matching engine has to be initialized (`ahocorasick_init()`) before use and deinitialized (`ahocorasick_deinit()`) \nafter use so memory is handled properly.\n* Engine has to be fed with pattern matching rules, given as array of rules, either to initialization function (`ahocorasick_init()`)\nor later (`ahocorasick_add_patterns()`).\n* After engine is finalized (`ahocorasick_finalize()`) or a first matching is performed (`ahocorasick_match()`) no further patterns are\nallowed, as underlying searching trie is finalized.\n* When matching finishes, it returns array of matched results. Each entry determines position of the found occurrence and pattern \nthat was matched. \n* Modifications made during the php 7 migration: \n** 'value' is the default key when adding patterns.\n** 'start_postion' field added to the results. The original algorithm returns the end position of the matched patterns.\n\nRules:\n* Simplest pattern looks like: \n```php\narray('lorem')\n```\n* Pattern can be identified, so it is easier to process result from match call. Either by string\n```php\narray('key'=\u003e'ae', 'value'=\u003e'delta')\n```\nor integer\n```php\narray('id'=\u003e0, 'value'=\u003e'zeta')\n```\n* Pattern can carry an arbitrary object\n```php\narray('key'=\u003e'ad', 'value'=\u003e'gamma', 'aux'=\u003earray(1))\n``` \n\n## Development\n\n```\n# Create package for distribution\npear package\n```\n\n### OSX Mojave\n\nSince Mojave the PHP module needs to be codesigned in order to be loaded to the process.\n\nhttps://developer.apple.com/library/archive/technotes/tn2206/_index.html\n\n\nDonating\n========\n\nThis implementation is an open source. If you like the code or you do find it useful please feel free to donate to the\nauthor whatever amount you would like by clicking on the paypal button below.\nAnd if you don't feel like donating, that's OK too.\n\n[![](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_donations\u0026business=XK6RLD768RGGJ\u0026lc=SK\u0026item_name=ph4r05\u0026item_number=php_aho_corasick%2egit\u0026currency_code=EUR\u0026bn=PP%2dDonationsBF%3abtn_donateCC_LG%2egif%3aNonHosted)\n\nBitcoin:\n\n![1Gh3TC55L4FjCyS2y5WKc4EGMYBYa6qvDw](https://deadcode.me/btc-aho.png)\u003cbr /\u003e`1Gh3TC55L4FjCyS2y5WKc4EGMYBYa6qvDw`\n\nMonero:\n```\n89bF7TFrhdyczkz6JmUzXx57yQa3fb28tbyT8nXLpj3bVFfQEE6cpjxec1gAJVSWHBBG7ex2XBj7u6BLrgKBaEmuSzWgdcn\n```\n\n[wiki]: http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm\n[MultiFast]: http://sourceforge.net/projects/multifast/?source=dlp\n[tutorial]: http://devzone.zend.com/446/extension-writing-part-iii-resources/\n[permission]: https://sourceforge.net/p/multifast/discussion/1317362/thread/dc5b4a1e/#a0a2\n[author]: https://sourceforge.net/u/kamiark/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fph4r05%2Fphp_aho_corasick","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fph4r05%2Fphp_aho_corasick","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fph4r05%2Fphp_aho_corasick/lists"}