https://github.com/dav009/flash
Golang Keyword extraction/replacement Datastructure using Tries instead of regexes
https://github.com/dav009/flash
data-extraction go golang search text text-search trie
Last synced: about 1 month ago
JSON representation
Golang Keyword extraction/replacement Datastructure using Tries instead of regexes
- Host: GitHub
- URL: https://github.com/dav009/flash
- Owner: dav009
- Created: 2017-09-18T13:59:43.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-12-15T08:13:20.000Z (over 7 years ago)
- Last Synced: 2025-04-30T07:08:30.207Z (about 1 month ago)
- Topics: data-extraction, go, golang, search, text, text-search, trie
- Language: Go
- Homepage:
- Size: 7.81 KB
- Stars: 89
- Watchers: 5
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Flash

Fast Keyword extraction using [Aho–Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) and Tries.
Flash is a Golang reimplementation of [Flashtext](https://github.com/vi3k6i5/flashtext),
This is meant to be used when you have a large number of words that you want to:
- extract from text
- search and replaceFlash is meant as a replacement for Regex, which in such cases can be extremely slow.
## Usage
```go
import "github.com/dav009/flash"
words := flash.NewKeywords()
words.Add("New York")
words.Add("Hello")
words.Add("Tokyo")
foundKeywords := words.Extract("New York and Tokyo are Cities")
fmt.Println(foundKeywords)
// [New York, Tokyo]
```## Benchmarks
As a reference using go-flash with 10K keywords in a 1000 sentence text, took 7.3ms,
while using regexes took 1minute 37s.| Sentences | Keywords | String.Contains | Regex | Go-Flash |
|-----------|----------|-----------------|----------|----------|
| 1000 | 10K | 1.0035s | 1min 37s | 2.72ms## Warning
This is a toy-project for me to get more familiar with Golang
Please be-aware of potential issues.