Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrkkrp/text-metrics
Calculate various string metrics efficiently in Haskell
https://github.com/mrkkrp/text-metrics
hamming-distance haskell jaccard-similarity jaro-distance jaro-winkler-distance levenshtein-distance string-metrics
Last synced: 3 months ago
JSON representation
Calculate various string metrics efficiently in Haskell
- Host: GitHub
- URL: https://github.com/mrkkrp/text-metrics
- Owner: mrkkrp
- License: other
- Created: 2016-07-23T08:24:16.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-03-25T08:13:39.000Z (10 months ago)
- Last Synced: 2024-05-08T20:25:15.859Z (9 months ago)
- Topics: hamming-distance, haskell, jaccard-similarity, jaro-distance, jaro-winkler-distance, levenshtein-distance, string-metrics
- Language: Haskell
- Size: 131 KB
- Stars: 42
- Watchers: 3
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Text Metrics
[![License BSD3](https://img.shields.io/badge/license-BSD3-brightgreen.svg)](http://opensource.org/licenses/BSD-3-Clause)
[![Hackage](https://img.shields.io/hackage/v/text-metrics.svg?style=flat)](https://hackage.haskell.org/package/text-metrics)
[![Stackage Nightly](http://stackage.org/package/text-metrics/badge/nightly)](http://stackage.org/nightly/package/text-metrics)
[![Stackage LTS](http://stackage.org/package/text-metrics/badge/lts)](http://stackage.org/lts/package/text-metrics)
![CI](https://github.com/mrkkrp/text-metrics/workflows/CI/badge.svg?branch=master)The library provides efficient implementations of various strings metric
algorithms. It works with strict `Text` values.The current version of the package implements:
* [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
* [Normalized Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
* [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)
* [Normalized Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)
* [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance)
* [Jaro distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
* [Jaro-Winkler distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)
* [Overlap coefficient](https://en.wikipedia.org/wiki/Overlap_coefficient)
* [Jaccard similarity coefficient](https://en.wikipedia.org/wiki/Jaccard_index)## Comparison with the `edit-distance` package
There is
[`edit-distance`](https://hackage.haskell.org/package/edit-distance) package
whose scope overlaps with the scope of this package. The differences are:* `edit-distance` allows to specify costs for every operation when
calculating Levenshtein distance (insertion, deletion, substitution, and
transposition). This is rarely needed though in real-world applications,
IMO.* `edit-distance` only provides Levenshtein distance, `text-metrics` aims to
provide implementations of most string metrics algorithms.* `edit-distance` works on `Strings`, while `text-metrics` works on strict
`Text` values.## Implementation
Although we originally used C for speed, currently all functions are pure
Haskell tuned for performance. See [this blog
post](https://markkarpov.com/post/migrating-text-metrics.html) for more
info.## Contribution
Issues, bugs, and questions may be reported in [the GitHub issue tracker for
this project](https://github.com/mrkkrp/text-metrics/issues).Pull requests are also welcome.
## License
Copyright © 2016–present Mark Karpov
Distributed under BSD 3 clause license.