Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/phstc/spelling-corrector
It's a Ruby implementation of Norvig Spelling Corrector plus Levenshtein distance fallback.
https://github.com/phstc/spelling-corrector
Last synced: 21 days ago
JSON representation
It's a Ruby implementation of Norvig Spelling Corrector plus Levenshtein distance fallback.
- Host: GitHub
- URL: https://github.com/phstc/spelling-corrector
- Owner: phstc
- Created: 2013-03-16T20:21:21.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2013-03-26T14:27:00.000Z (over 11 years ago)
- Last Synced: 2024-04-15T02:13:45.893Z (7 months ago)
- Language: Ruby
- Homepage:
- Size: 3.77 MB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spelling Corrector
It's a Ruby implementation of [Norvig Spelling Corrector](http://norvig.com/spell-correct.html) plus [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance) fallback.
If Norvig algorithm doesn't find the correction, this implementation will look for the first occurrence (distance <= 8) of a similar word using Levenshtein distance.
```ruby
known([word]) || known(edits1(word)) || known_edits2(word) || levenshtein(word) || ["NO SUGGESTION"]
```Levenshtein costs: ins=2, del=2 and sub=1.
## The Algorithm
Firstly, I recommend to read the [Norvig explanation](http://norvig.com/spell-correct.html) and [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance) then have a look at the tests (specs directory), they show how each method work, it helps the understading of the algorithm.
Most of the `SpellingCorrector` methods, should be private, I left them as public only to document (explain) them with tests.
## How to use it
```ruby
require "lib/spelling_corrector"corrector = SpellingCorrector.new
corrector.correct "cen" => "can"corrector.correct "unknownword" => "NO SUGGESTION"
```### Persisted Spelling Corrector
The `PersistedSpellingCorrector` and `PersistedWordCollection` are implementions using MongoDB (encapsulating the non-persisted implementations) to persisted the corrections and trained word collection.
## Examples
In the `examples` directory, there are two examples, one using refinements and another with [Sinatra](http://www.sinatrarb.com/) to expose Spelling Corrector as an API.
### Refinements
If you are using Ruby 2.0.0 we can use refine your string classes using the Spelling Corrector.
```ruby
# examples/refinement_spelling_corrector.rbusing StringSpellingCorrectorRefinement
puts "cen".correct
```### webapp
The webapp example is deployed at Heroku, you can easily test it via `curl` or directly in the browser (shame on you).
```bash
curl spelling-corrector.herokuapp.com/correct/cen
=> can
```Since it uses PersistedSpellingCorrector, to run it locally, you will need a MongoDB connection.
## License
This code is licensed under:MIT License GPL