https://github.com/mmkhattab/multimatcher
A convenient implementation of the Aho-Corasick algorithm to find multiple search patterns
https://github.com/mmkhattab/multimatcher
aho-corasick string-search
Last synced: 3 months ago
JSON representation
A convenient implementation of the Aho-Corasick algorithm to find multiple search patterns
- Host: GitHub
- URL: https://github.com/mmkhattab/multimatcher
- Owner: mmkhattab
- License: mit
- Created: 2023-04-24T20:26:48.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-04T21:01:28.000Z (over 2 years ago)
- Last Synced: 2025-09-03T06:59:41.392Z (10 months ago)
- Topics: aho-corasick, string-search
- Language: Python
- Homepage:
- Size: 27.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Introduction
Multimatcher is an implementation of the Aho-Corasick (Aho & Corasick 1975) search algorithm.
It efficiently finds multiple keywords in an input string, without having to loop
over the input string multiple times.
The rationale behind the Multimatcher is that most often we want to do something with the found matches, and
the Multimatcher provides a flexible "replace" method that allows different use cases such as:
- find and delete
- find and replace
- tag with a global label (i.e. all matches get the same label)
- tag with custom label (i.e. each match gets its own label)
- count matches
When possible, it's recommended to set whole_words_only to True, which makes matching significantly faster.
# Examples
## Find and delete matches
```
from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("") # matches will be deleted
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x y z"
```
## Find and transform matches
```
from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_method(lambda x: x.capitalize()) # matches will be capitalized
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x A y B z C"
```
## Find and replace matches with the same label
```
from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("0") # all matches will be replaced with 0
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 0 y 0 z 0"
```
## Find and replace matches with custom labels
```
from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_map({"a": "1", "b": "2", "c": "3"}) # replaces a > 1, b > 2, c > 3
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 1 y 2 z 3"
```
## Find and replace matches with custom labels
```
from multimatcher import Multimatcher
mm = Multimatcher(separator='')
mm.set_search_patterns(['a', 'b', 'c'])
mm.count("aa xx bb yy cc zz") # produces {'a': 2, 'b': 2, 'c': 2}
```
# References
Aho, A. V., & Corasick, M. J. (1975). Efficient string matching: an aid to bibliographic search.
Communications of the ACM, 18(6), 333-340.