Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/flori/amatch

Approximate String Matching library
https://github.com/flori/amatch

Last synced: 3 months ago
JSON representation

Approximate String Matching library

Awesome Lists containing this project

README

        

# amatch - Approximate Matching Extension for Ruby

## Description

This is a collection of classes that can be used for Approximate
matching, searching, and comparing of Strings. They implement algorithms
that compute the Levenshtein edit distance, Sellers edit distance, the
Hamming distance, the longest common subsequence length, the longest common
substring length, the pair distance metric, the Jaro-Winkler metric.

## Installation

To install this extension as a gem type

# gem install amatch

into the shell.

## Download

The homepage of this library is located at

* https://github.com/flori/amatch

## Examples

require 'amatch'
# => true
include Amatch
# => Object

m = Sellers.new("pattern")
# => #
m.match("pattren")
# => 2.0
m.substitution = m.insertion = 3
# => 3
m.match("pattren")
# => 4.0
m.reset_weights
# => #
m.match(["pattren","parent"])
# => [2.0, 4.0]
m.search("abcpattrendef")
# => 2.0

m = Levenshtein.new("pattern")
# => #
m.match("pattren")
# => 2
m.search("abcpattrendef")
# => 2
"pattern language".levenshtein_similar("language of patterns")
# => 0.2

m = Amatch::DamerauLevenshtein.new("pattern")
# => #
m.match("pattren")
# => 1
"pattern language".damerau_levenshtein_similar("language of patterns")
# => 0.19999999999999996

m = Hamming.new("pattern")
# => #
m.match("pattren")
# => 2
"pattern language".hamming_similar("language of patterns")
# => 0.1

m = PairDistance.new("pattern")
# => #
m.match("pattr en")
# => 0.545454545454545
m.match("pattr en", nil)
# => 0.461538461538462
m.match("pattr en", /t+/)
# => 0.285714285714286
"pattern language".pair_distance_similar("language of patterns")
# => 0.928571428571429

m = LongestSubsequence.new("pattern")
# => #
m.match("pattren")
# => 6
"pattern language".longest_subsequence_similar("language of patterns")
# => 0.4

m = LongestSubstring.new("pattern")
# => #
m.match("pattren")
# => 4
"pattern language".longest_substring_similar("language of patterns")
# => 0.4

m = Jaro.new("pattern")
# => #
m.match("paTTren")
# => 0.952380952380952
m.ignore_case = false
m.match("paTTren")
# => 0.742857142857143
"pattern language".jaro_similar("language of patterns")
# => 0.672222222222222

m = JaroWinkler.new("pattern")
# #
m.match("paTTren")
# => 0.971428571712403
m.ignore_case = false
m.match("paTTren")
# => 0.79428571505206
m.scaling_factor = 0.05
m.match("pattren")
# => 0.961904762046678
"pattern language".jarowinkler_similar("language of patterns")
# => 0.672222222222222

## Author

Florian Frank mailto:[email protected]

## License

Apache License, Version 2.0 – See the COPYING file in the source archive.