https://github.com/airblade/respecta
Measures how well an abbreviation matches a string.
https://github.com/airblade/respecta
Last synced: 6 months ago
JSON representation
Measures how well an abbreviation matches a string.
- Host: GitHub
- URL: https://github.com/airblade/respecta
- Owner: airblade
- Created: 2013-10-26T11:21:28.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2013-11-07T10:22:07.000Z (almost 12 years ago)
- Last Synced: 2025-02-06T08:25:25.750Z (8 months ago)
- Language: Ruby
- Size: 422 KB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.mkd
Awesome Lists containing this project
README
# Respecta
Respecta measures how well an abbreviation matches a string. It tries to respect your wishes by finding the string you want as intuitively as possible.
The matching is not fuzzy: all the letters in the abbreviation must be present in the text to be searched, in the right order.
Its primary purpose is for finding file paths with [Selecta][], a fuzzy text selector, but it can be used for other things.
Respecta prefers:
- letters at the start of "words" to letters in the middle of words
- contiguous letters to isolated lettersWhen Respecta scores how well an abbreviation matches a string, it calculates the best possible score; i.e. the best score of all the ways the abbreviation could match. While the definitive nature of the score is appealing, it means Respecta (currently) takes too long when there are many possible matches. See `benchmark.rb` for details.
## Usage
```ruby
r = Respecta.new 'app/controllers/search_controller.rb'r.score 'search'
# => 0.158 (3 d.p.)r.score 'acsearch'
# => 0.211 (3 d.p.)
```## How it works
Respecta separates the finding of matches from their scoring, making each part easier to understand.
Unlike most match-scoring algorithms (as far as I can tell) the scoring algorithm is simple. Finding all the possible matches is also simple, at least conceptually (admittedly the implementation is a little tricky).
First Respecta finds all the possible combinations of locations where the abbreviation matches the string. It takes the first letter of the abbreviation and finds all the matches in the string. From each of those starting points, it looks for matches of the second letter of the abbreviation. And so on.
Once it has all possible matches, Respecta scores each match and returns the highest score.
## Scoring
First Respecta assigns values to each letter in the haystack string.
1. Each letter in the haystack string is given a default value of 1.
2. Each letter in the haystack string which starts a word is given a bonus value.To calculate how well an abbreviation matches, Respecta adds up the scores of the letters in the haystack string which were matched.
Respecta then gives a bonus for the number of contiguous matched letters.
These two scores are added and the result is normalised to between 0 and 1.
## Why?
There are many such algorithms already but I couldn't understand any of the ones I looked at. Respecta is simple enough for me to understand.
## Intellectual Property
Copyright 2013 Andrew Stewart, AirBlade Sofware.
Released under the MIT licence.
[selecta]: https://github.com/garybernhardt/selecta