https://github.com/agext/levenshtein
Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.
https://github.com/agext/levenshtein
awesome-go common-prefix-bonus edit-costs levenshtein levenshtein-distance similarity-metric string-distance string-pairs string-similarity winkler
Last synced: 12 months ago
JSON representation
Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.
- Host: GitHub
- URL: https://github.com/agext/levenshtein
- Owner: agext
- License: apache-2.0
- Created: 2016-04-08T00:14:31.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2020-10-15T13:29:05.000Z (over 5 years ago)
- Last Synced: 2024-07-31T20:48:38.104Z (over 1 year ago)
- Topics: awesome-go, common-prefix-bonus, edit-costs, levenshtein, levenshtein-distance, similarity-metric, string-distance, string-pairs, string-similarity, winkler
- Language: Go
- Homepage:
- Size: 23.4 KB
- Stars: 85
- Watchers: 2
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go-info - levenshtein - like bonus for common prefix. | (Uncategorized)
- awesome-go-cn - levenshtein - like bonus for common prefix.) (数据结构 / Advanced Console UIs)
- awesome-go-cn - levenshtein - like bonus for common prefix. [![近三年未更新][Y]](https://github.com/agext/levenshtein) [![godoc][D]](https://godoc.org/github.com/agext/levenshtein) (数据结构与算法 / 文本分析)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. - ★ 21 (Data Structures)
- awesome-go-with-stars - levenshtein - like bonus for common prefix. | 2020-10-15 | (Data Integration Frameworks / Text Analysis)
- fucking-awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures / Advanced Console UIs)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. - :arrow_down:13 - :star:0 (Data Structures / Advanced Console UIs)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures / Advanced Console UIs)
- go-awesome-with-star-updatetime - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures / Advanced Console UIs)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go-extra - levenshtein - like bonus for common prefix.|69|6|0|2016-04-08T00:14:31Z|2020-10-15T13:29:05Z| (Generators / Text Analysis)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go-cn - levenshtein
- awesome-go-processed - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.| (Data Structures / Advanced Console UIs)
- awesome-go-plus - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.  (Data Structures and Algorithms / Text Analysis)
- awesome-go-cn - levenshtein - like bonus for common prefix. [![近三年未更新][Y]](https://github.com/agext/levenshtein) [![godoc][D]](https://godoc.org/github.com/agext/levenshtein) (数据结构与算法 / 文本分析)
- awesome-go - levenshtein - like bonus for common prefix. | - | - | - | (Data Structures / Advanced Console UIs)
- awesome-Char - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures / Advanced Console UIs)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
- awesome-go - levenshtein - Levenshtein距离和相似性指标与可定制的编辑成本和Winkler般的奖金为通用前缀。 (<span id="数据结构-data-structures">数据结构 Data Structures</span> / <span id="高级控制台用户界面-advanced-console-uis">高级控制台用户界面 Advanced Console UIs</span>)
- awesome-go - levenshtein - Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix. (Data Structures and Algorithms / Text Analysis)
README
# A Go package for calculating the Levenshtein distance between two strings
[](https://github.com/agext/levenshtein/releases/latest)
[](https://godoc.org/github.com/agext/levenshtein)
[](https://travis-ci.org/agext/levenshtein)
[](https://coveralls.io/github/agext/levenshtein)
[](https://goreportcard.com/report/github.com/agext/levenshtein)
This package implements distance and similarity metrics for strings, based on the Levenshtein measure, in [Go](http://golang.org).
## Project Status
v1.2.3 Stable: Guaranteed no breaking changes to the API in future v1.x releases. Probably safe to use in production, though provided on "AS IS" basis.
This package is being actively maintained. If you encounter any problems or have any suggestions for improvement, please [open an issue](https://github.com/agext/levenshtein/issues). Pull requests are welcome.
## Overview
The Levenshtein `Distance` between two strings is the minimum total cost of edits that would convert the first string into the second. The allowed edit operations are insertions, deletions, and substitutions, all at character (one UTF-8 code point) level. Each operation has a default cost of 1, but each can be assigned its own cost equal to or greater than 0.
A `Distance` of 0 means the two strings are identical, and the higher the value the more different the strings. Since in practice we are interested in finding if the two strings are "close enough", it often does not make sense to continue the calculation once the result is mathematically guaranteed to exceed a desired threshold. Providing this value to the `Distance` function allows it to take a shortcut and return a lower bound instead of an exact cost when the threshold is exceeded.
The `Similarity` function calculates the distance, then converts it into a normalized metric within the range 0..1, with 1 meaning the strings are identical, and 0 that they have nothing in common. A minimum similarity threshold can be provided to speed up the calculation of the metric for strings that are far too dissimilar for the purpose at hand. All values under this threshold are rounded down to 0.
The `Match` function provides a similarity metric, with the same range and meaning as `Similarity`, but with a bonus for string pairs that share a common prefix and have a similarity above a "bonus threshold". It uses the same method as proposed by Winkler for the Jaro distance, and the reasoning behind it is that these string pairs are very likely spelling variations or errors, and they are more closely linked than the edit distance alone would suggest.
The underlying `Calculate` function is also exported, to allow the building of other derivative metrics, if needed.
## Installation
```
go get github.com/agext/levenshtein
```
## License
Package levenshtein is released under the Apache 2.0 license. See the [LICENSE](LICENSE) file for details.