https://github.com/michurin/ngramindex
Golang ngram index implementation
https://github.com/michurin/ngramindex
go golang index ngram ngrams search trigram trigrams
Last synced: about 2 months ago
JSON representation
Golang ngram index implementation
- Host: GitHub
- URL: https://github.com/michurin/ngramindex
- Owner: michurin
- License: mit
- Created: 2024-12-21T09:23:23.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-01-25T00:45:46.000Z (4 months ago)
- Last Synced: 2025-02-14T19:51:53.624Z (4 months ago)
- Topics: go, golang, index, ngram, ngrams, search, trigram, trigrams
- Language: Go
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# N-gram Indexing and Searching
[](https://github.com/michurin/ngramindex/actions/workflows/lint.yaml)
[](https://github.com/michurin/ngramindex/actions/workflows/test.yaml)
[](https://github.com/michurin/ngramindex/actions/workflows/codecov.yaml)
[](https://codecov.io/gh/michurin/ngramindex)
[](https://goreportcard.com/report/github.com/michurin/ngramindex)
[](https://pkg.go.dev/github.com/michurin/ngramindex)
[](https://go.dev/play/p/QClnrDlruau)N-gram indexing is a simple and powerful lookup technique. It is based on approximate (fuzzy) string matching.
## Motivation
The package offers advantages:
- Document type agnostic, thanks to generics.
- Rune based and Unicode friendly.
- Adjustable text normalization to manage things like case sensibility, spaces and punctuation handling, extra typos tolerance etc.
- Simple ranking algorithm out of the box.
- Ability to customize ranking algorithm entirely up to your implementation of less-function for sorting.
- Ability to associate one document with several texts and lookup by several texts## Examples
- [Life example](https://go.dev/play/p/QClnrDlruau).
- [Examples in documentation](https://pkg.go.dev/github.com/michurin/ngramindex), [the same examples right in this repository](https://github.com/michurin/ngramindex/blob/master/example_test.go).## Known issues
- Beware: index modification is not thread safe.
- It is in-memory implementation.
- There is no way to import/export/save/restore the index.
- It is impossible to remove document from index.## Related links
- [Russ Cox - Regular Expression Matching with a Trigram Index or How Google Code Search Worked](https://swtch.com/~rsc/regexp/regexp4.html).