An open API service indexing awesome lists of open source software.

https://github.com/lofcz/ahocorasick


https://github.com/lofcz/ahocorasick

Last synced: 8 months ago
JSON representation

Awesome Lists containing this project

README

          

# AhoCorasick

## Install

```
dotnet add package AhoCorasickCore
```

## Use

This implementation of [Aho-Corasick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) can be used in scenarios where a string needs to be matched against several substrings, and each substring is assigned a certain meaning. For example, one could scan an e-mail against a few words known to be used by spammers and trigger some follow-up actions on each match. Instead of doing that linearly (e.g., by calling `Contains` on each needle), AhoCorasick and similar algorithms scan efficiently by reusing the already traversed space.

A minimal example:

```cs
enum WordCategory
{
Noun,
Verb,
Adjective,
Adverb
}

Dictionary patterns = new Dictionary
{
{"he", WordCategory.Noun},
{"she", WordCategory.Noun},
{"his", WordCategory.Adjective},
{"hers", WordCategory.Adjective},
{"run", WordCategory.Verb},
{"quickly", WordCategory.Adverb}
};

// cache the instance and reuse it, all public methods are thread-safe
AhoCorasick inst = new AhoCorasick(patterns);

// use Search() for consuming hits via yield
List> results = inst.SearchAll("he runs")

/* returns: [
{pattern: "he", value: (Noun), pos: 0},
{pattern: "run", value: (Verb), pos: 3}
] */
```