Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hbollon/go-edlib
π String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
https://github.com/hbollon/go-edlib
algorithms cosine damerau-levenshtein edit-distance edit-distance-algorithms go golang golang-string-comparison hamming jaro-winkler lcs lcs-distance levenshtein levenshtein-distance similarity-measures string-comparison string-distance string-matching unicode
Last synced: 1 day ago
JSON representation
π String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
- Host: GitHub
- URL: https://github.com/hbollon/go-edlib
- Owner: hbollon
- License: mit
- Created: 2020-08-18T09:30:59.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-07-03T15:35:18.000Z (over 2 years ago)
- Last Synced: 2025-01-04T17:05:52.387Z (8 days ago)
- Topics: algorithms, cosine, damerau-levenshtein, edit-distance, edit-distance-algorithms, go, golang, golang-string-comparison, hamming, jaro-winkler, lcs, lcs-distance, levenshtein, levenshtein-distance, similarity-measures, string-comparison, string-distance, string-matching, unicode
- Language: Go
- Homepage:
- Size: 76.2 KB
- Stars: 494
- Watchers: 13
- Forks: 24
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
- awesome-go - go-edlib - Go string comparison and edit distance algorithms library (Levenshtein, LCS, Hamming, Damerau levenshtein, Jaro-Winkler, etc.) compatible with Unicode. (Data Structures and Algorithms / Text Analysis)
- awesome-repositories - hbollon/go-edlib - π String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc... (Go)
- awesome-go - go-edlib - Go string comparison and edit distance algorithms library (Levenshtein, LCS, Hamming, Damerau levenshtein, Jaro-Winkler, etc.) compatible with Unicode. (Data Structures and Algorithms / Text Analysis)
- awesome-go-extra - go-edlib - Winkler, Cosine, etc...|338|19|1|2020-08-18T09:30:59Z|2022-07-03T15:35:18Z| (Generators / Text Analysis)
README
Go-edlib : Edit distance and string comparison library
> Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
---
## Table of Contents
- [Requirements](#requirements)
- [Introduction](#introduction)
- [Features](#features)
- [Installation](#installation)
- [Benchmarks](#benchmarks)
- [Documentation](#documentation)
- [Examples](#examples)
- [Author](#author)
- [Contributing](#-contributing)
- [License](#-license)---
## Requirements
- [Go](https://golang.org/doc/install) (v1.13+)## Introduction
Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode characters!
This library is 100% test covered π## Features
- [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance)
- [LCS](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem) (Longest common subsequence) with edit distance, backtrack and diff functions
- [Hamming](https://en.wikipedia.org/wiki/Hamming_distance)
- [Damerau-Levenshtein](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance), with following variants:
- OSA (Optimal string alignment)
- Adjacent transpositions
- [Jaro & Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) similarity algorithms
- [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
- [Jaccard Index](https://en.wikipedia.org/wiki/Jaccard_index)
- [QGram](https://en.wikipedia.org/wiki/N-gram)
- [Sorensen-Dice](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)
- Computed similarity percentage functions based on all available edit distance algorithms in this lib
- Fuzzy search functions based on edit distance with unique or multiples strings output
- Unicode compatibility π₯³## Benchmarks
You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through **StringsSimilarity** function [here](http://benchgraph.codingberg.com/q5)However, if you want or need more details, you can also viewing benchmark raw output [here](https://github.com/hbollon/go-edlib/blob/master/tests/outputs/benchmarks.txt), which also includes memory allocations and test cases output (similarity result and errors).
If you are on Linux and want to run them on your setup, you can run ``` ./tests/benchmark.sh ``` script.
## Installation
Open bash into your project folder and run:```bash
go get github.com/hbollon/go-edlib
```And import it into your project:
```go
import (
"github.com/hbollon/go-edlib"
)
```### Run tests
If you are on Linux and want to run all unit tests just run ``` ./tests/tests.sh ``` script.For Windows users you can run:
```bash
go test ./... # Add desired parameters to this command if you want
```## Documentation
**You can find all the documentation here :** [Documentation](https://godoc.org/github.com/hbollon/go-edlib)
## Examples
### Calculate string similarity index between two string
You can use ``` StringSimilarity(str1, str2, algorithm) ``` function.
**algorithm** parameter must one of the following constants:
```go
// Algorithm identifiers
const (
Levenshtein Algorithm = iota
DamerauLevenshtein
OSADamerauLevenshtein
Lcs
Hamming
Jaro
JaroWinkler
Cosine
)
```Example with levenshtein:
```go
res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Similarity: %f", res)
}
```### Execute fuzzy search based on string similarity algorithm
#### 1. Most matching unique result without threshold
You can use ``` FuzzySearch(str, strList, algorithm) ``` function.
```go
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result: %s", res)
}```
```
Result: testing
```#### 2. Most matching unique result with threshold
You can use ``` FuzzySearchThreshold(str, strList, minSimilarity, algorithm) ``` function.
```go
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig': %s", res)
}res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'hello': %s", res)
}```
```
Result for 'testnig': testing
Result for 'hello':
```#### 3. Most matching result set without threshold
You can use ``` FuzzySearchSet(str, strList, resultQuantity, algorithm) ``` function.
```go
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Results: %s", strings.Join(res, ", "))
}```
```
Results: testing, test, tester
```#### 4. Most matching result set with threshold
You can use ``` FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm) ``` function.
```go
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}```
```
Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing
```### Get raw edit distance (Levenshtein, LCS, DamerauβLevenshtein, Hamming)
You can use one of the following function to get an edit distance between two strings :
- [LevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#LevenshteinDistance)(str1, str2)
- [DamerauLevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#DamerauLevenshteinDistance)(str1, str2)
- [OSADamerauLevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#OSADamerauLevenshteinDistance)(str1, str2)
- [LCSEditDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#LCSEditDistance)(str1, str2)
- [HammingDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#HammingDistance)(str1, str2)Example with Levenshtein distance:
```go
res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
``````
Result: 3
```### LCS, LCS Backtrack and LCS Diff
#### 1. Compute LCS(Longuest Common Subsequence) between two stringsYou can use ``` LCS(str1, str2) ``` function.
```go
lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
``````
Length of their LCS: 3
```#### 2. Backtrack their LCS
You can use ``` LCSBacktrack(str1, str2) ``` function.
```go
res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", res)
}
``````
LCS: ABD
```#### 3. Backtrack all their LCS
You can use ``` LCSBacktrackAll(str1, str2) ``` function.
```go
res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
``````
LCS: ABD, ACD
```#### 4. Get LCS Diff between two strings
You can use ``` LCSDiff(str1, str2) ``` function.
```go
res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
``````
LCS Diff:
h c o m p u s e b o a t e r
+ - - - + + + + + - -
```## Author
π€ **Hugo Bollon**
* Github: [@hbollon](https://github.com/hbollon)
* LinkedIn: [@Hugo Bollon](https://www.linkedin.com/in/hugo-bollon-68a2381a4/)
* Portfolio: [hugobollon.me](https://www.hugobollon.me)## π€ Contributing
Contributions, issues and feature requests are welcome!
Feel free to check [issues page](https://github.com/hbollon/go-edlib/issues).## Show your support
Give a βοΈ if this project helped you!
## π License
Copyright Β© 2020 [Hugo Bollon](https://github.com/hbollon).
This project is [MIT License](https://github.com/hbollon/go-edlib/blob/master/LICENSE.md) licensed.