Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bnoordhuis/libsm
A fast string matcher library.
https://github.com/bnoordhuis/libsm
Last synced: about 1 month ago
JSON representation
A fast string matcher library.
- Host: GitHub
- URL: https://github.com/bnoordhuis/libsm
- Owner: bnoordhuis
- Created: 2010-12-19T22:25:59.000Z (almost 14 years ago)
- Default Branch: master
- Last Pushed: 2010-12-25T16:36:42.000Z (almost 14 years ago)
- Last Synced: 2024-04-15T02:37:09.818Z (7 months ago)
- Language: C
- Homepage:
- Size: 97.7 KB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# libsm
A fast string matcher library.
## Goals
1. Should be able to match one or more strings against the input text.
2. Should be able to find one ore more strings in the input text.
This is different from e.g. `strcmp` and `strstr` because those functions
will only let you match against or search for a single string.Now imagine you want to match against or search for a million strings...
## Status
Pre-alpha work-in-progress.
## To do
Lots.
* Suffix tree. Current implementation is O(n^2) space and O(n^3) time. Switch to Ukkonen's algorithm, it's O(n) for both.
* Pattern matcher. Model as a DFA. Use longest substring finding to keep the number of states down (this is where the suffix
tree comes into play).* Pattern matcher. Should be able to both pre-generate the matcher (serialized, or as C code) and to dynamically generate it
at run-time. Fairly trivial, just the SMOP.