Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/0xd34df00d/edit-distance-linear
Levenshtein edit distance in linear memory (also turns out to be faster than C++)
https://github.com/0xd34df00d/edit-distance-linear
edit-distance haskell levenshtein-distance
Last synced: about 1 month ago
JSON representation
Levenshtein edit distance in linear memory (also turns out to be faster than C++)
- Host: GitHub
- URL: https://github.com/0xd34df00d/edit-distance-linear
- Owner: 0xd34df00d
- License: bsd-3-clause
- Created: 2019-11-30T22:50:55.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-19T20:39:36.000Z (almost 2 years ago)
- Last Synced: 2024-11-28T14:09:00.597Z (about 2 months ago)
- Topics: edit-distance, haskell, levenshtein-distance
- Language: Haskell
- Homepage:
- Size: 24.4 KB
- Stars: 3
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.md
- License: LICENSE
Awesome Lists containing this project
README
# edit-distance-linear
[![Build Status][travis-badge]][travis]
[![Hackage][hackage-badge]][hackage]The pure Haskell implementation of the Levenshtein edit distance, with linear space complexity.
## Comparison
There are already several other existing implementations, but the goals and design decisions vary. In particular, this package is intended to be used to:
* compare long strings (think tens of thousands of characters), driving the implementation to live in the `ST` monad and aim at linear space complexity to lower GC pressure;
* not care about Unicode, thus accepting `ByteString`s and comparing them byte-by-byte rather than character-by-character (or glyph-by-glyph, or whatever is the right notion of an edit for Unicode).Among the alternatives:
* [text-metrics](http://hackage.haskell.org/package/text-metrics) — uses a similar algorithm, but cares about Unicode, making it 4-5 times slower.
* [edit-distance](http://hackage.haskell.org/package/edit-distance) — uses a very different algorithm (which we might implement here one day with huge potential benefits), which tends to consume more memory (I'm not up for estimating its space asymptotics, though).[travis]:
[travis-badge]:
[hackage]:
[hackage-badge]: