Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cloudflare/lua-aho-corasick
https://github.com/cloudflare/lua-aho-corasick
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/cloudflare/lua-aho-corasick
- Owner: cloudflare
- License: bsd-3-clause
- Created: 2014-05-28T20:49:34.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2023-03-21T13:33:48.000Z (over 1 year ago)
- Last Synced: 2024-04-14T22:17:10.921Z (7 months ago)
- Language: C++
- Size: 89.8 KB
- Stars: 145
- Watchers: 28
- Forks: 52
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-resty - lua-aho-corasick - Corasick (AC) string matching algorithm (Libraries)
README
aho-corasick-lua
================C++ and Lua Implementation of the Aho-Corasick (AC) string matching algorithm
(http://dl.acm.org/citation.cfm?id=360855).We began with pure Lua implementation and realize the performance is not
satisfactory. So we switch to C/C++ implementation.There are two shared objects provied by this package: libac.so and ahocorasick.so
The former is a regular shared object which can be directly used by C/C++
application, or by Lua via FFI; and the later is a Lua module. An example usage
is shown below:```lua
local ac = require "ahocorasick"
local dict = {"string1", "string", "etc"}
local acinst = ac.create(dict)
local r = ac.match(acinst, "mystring")
```For efficiency reasons, the implementation is slightly different from the
standard AC algorithm in that it doesn't return a set of strings in the dictionary
that match the given string, instead it only returns one of them in case the string
matches. The functionality of our implementation can be (precisely) described by
following pseudo-c snippet.```C
string foo(input-string, dictionary) {
string ret = the-end-of-input-string;
for each string s in dictionary {
// find the first occurrence match sub-string.
ret = min(ret, strstr(input-string, s);
}
return ret;
}
```It's pretty easy to get rid of this limitation, just to associate each state with
a spare bit-vector depicting the set of strings recognized by that state.