{"id":13686758,"url":"https://github.com/sajari/fuzzy","last_synced_at":"2025-05-15T13:08:53.835Z","repository":{"id":16270970,"uuid":"19019230","full_name":"sajari/fuzzy","owner":"sajari","description":"Spell checking and fuzzy search suggestion written in Go","archived":false,"fork":false,"pushed_at":"2021-10-21T19:13:54.000Z","size":2421,"stargazers_count":387,"open_issues_count":7,"forks_count":53,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-15T05:33:55.955Z","etag":null,"topics":["autocomplete","fuzzy","go","spell-check"],"latest_commit_sha":null,"homepage":"https://www.sajari.com/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"KirillOsenkov/SourceBrowser","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sajari.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-04-22T06:16:33.000Z","updated_at":"2025-03-30T22:10:47.000Z","dependencies_parsed_at":"2022-09-07T05:13:35.888Z","dependency_job_id":null,"html_url":"https://github.com/sajari/fuzzy","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Ffuzzy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Ffuzzy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Ffuzzy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sajari%2Ffuzzy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sajari","download_url":"https://codeload.github.com/sajari/fuzzy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254346625,"owners_count":22055808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autocomplete","fuzzy","go","spell-check"],"created_at":"2024-08-02T15:00:39.501Z","updated_at":"2025-05-15T13:08:48.811Z","avatar_url":"https://github.com/sajari.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# Fuzzy\n[![Build Status](https://travis-ci.org/sajari/fuzzy.svg?branch=master)](https://travis-ci.org/sajari/fuzzy)\n\nFuzzy is a very fast spell checker and query suggester written in Golang. \n\nMotivation:\n- Sajari uses very large queries (hundreds of words) but needs to respond sub-second to these queries where possible. Common spell check algorithms are quite slow or very resource intensive.\n- The aim was to achieve spell checks in sub 100usec per word (10,000 / second single core) with at least 60% accuracy and multi-language support.\n- Currently we see sub 40usec per word and ~70% accuracy for a Levenshtein distance of 2 chars on a 2012 macbook pro (english test set comes from Peter Norvig's article, see http://norvig.com/spell-correct.html). \n- A 500 word query can be spell checked in ~0.02 sec / cpu cores, which is good enough for us.\n\nNotes:\n- It is currently executed as a single goroutine per lookup, so undoubtedly this could be much faster using multiple cores, but currently the speed is quite good.\n- Accuracy is hit slightly because several correct words don't appear at all in the training text (data/big.txt).\n- Fuzzy is a \"Symmetric Delete Spelling Corrector\", which relates to some blogs by Wolf Garbe at Faroo.com (see http://blog.faroo.com/2012/06/07/improved-edit-distance-based-spelling-correction/)\n\nConfig:\n- Generally no config is required, but you can tweak the model for your application. \n- `\"threshold\"` is the trigger point when a word becomes popular enough to build lookup keys for it. Setting this to \"1\" means any instance of a given word makes it a legitimate spelling. This typically corrects the most errors, but can also cause false positives if incorrect spellings exist in the training data. It also causes a much larger index to be built. By default this is set to 4.\n- `\"depth\"` is the Levenshtein distance the model builds lookup keys for. For spelling correction, a setting of \"2\" is typically very good. At a distance of \"3\" the potential number of words is much, much larger, but adds little benefit to accuracy. For query prediction a larger number can be useful, but again is much more expensive. **A depth of \"1\" and threshold of \"1\" for the 1st Norvig test set gives ~70% correction accuracy at ~5usec per check (e.g. ~200kHz)**, for many applications this will be good enough. At depths \u003e 2, the false positives begin to hurt the accuracy.\n\nFuture improvements:\n- Make some of the expensive processes concurrent. \n- Add spelling checks for different languages. If you have misspellings in different languages please add them or send to us.\n- Allow the term-score map to be read from an external term set (e.g. integrating this currently may double up on keeping a term count).\n- Currently there is no method to delete lookup keys, so potentially this may cause bloating over time if the dictionary changes signficantly.\n- Add right to left deletion beyond Levenshtein config depth (e.g. don't process all deletes accept for query predictors).\n\nUsage:\n- Below is some example code showing how to use the package.\n- An example showing how to train with a static set of words is contained in the fuzzy_test.go file, which uses the \"big.text\" file to create an english dictionary. \n- To integrate with your application (e.g. custom dictionary / word popularity), use the single word and multiword training functions shown in the example below. Each time you add a new instance of a given word, pass it to this function. The model will keep a count and \n- We haven't tested with other langauges, but this should work fine. Please let us know how you go? `support@sajari.com`\n\n\n```go\npackage main \n\nimport(\n\t\"github.com/sajari/fuzzy\"\n\t\"fmt\"\n)\n\nfunc main() {\n\tmodel := fuzzy.NewModel()\n\n\t// For testing only, this is not advisable on production\n\tmodel.SetThreshold(1)\n\n\t// This expands the distance searched, but costs more resources (memory and time). \n\t// For spell checking, \"2\" is typically enough, for query suggestions this can be higher\n\tmodel.SetDepth(5)\n\n\t// Train multiple words simultaneously by passing an array of strings to the \"Train\" function\n\twords := []string{\"bob\", \"your\", \"uncle\", \"dynamite\", \"delicate\", \"biggest\", \"big\", \"bigger\", \"aunty\", \"you're\"}\n\tmodel.Train(words)\n\t\n\t// Train word by word (typically triggered in your application once a given word is popular enough)\n\tmodel.TrainWord(\"single\")\n\n\t// Check Spelling\n\tfmt.Println(\"\\nSPELL CHECKS\")\n\tfmt.Println(\"\tDeletion test (yor) : \", model.SpellCheck(\"yor\"))\n\tfmt.Println(\"\tSwap test (uncel) : \", model.SpellCheck(\"uncel\"))\n\tfmt.Println(\"\tReplace test (dynemite) : \", model.SpellCheck(\"dynemite\"))\n\tfmt.Println(\"\tInsert test (dellicate) : \", model.SpellCheck(\"dellicate\"))\n\tfmt.Println(\"\tTwo char test (dellicade) : \", model.SpellCheck(\"dellicade\"))\n\n\t// Suggest completions\n\tfmt.Println(\"\\nQUERY SUGGESTIONS\")\n\tfmt.Println(\"\t\\\"bigge\\\". Did you mean?: \", model.Suggestions(\"bigge\", false))\n\tfmt.Println(\"\t\\\"bo\\\". Did you mean?: \", model.Suggestions(\"bo\", false))\n\tfmt.Println(\"\t\\\"dyn\\\". Did you mean?: \", model.Suggestions(\"dyn\", false))\n\n\t// Autocomplete suggestions\n\tsuggested, _ := model.Autocomplete(\"bi\")\n\tfmt.Printf(\"\t\\\"bi\\\". Suggestions: %v\", suggested)\n\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsajari%2Ffuzzy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsajari%2Ffuzzy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsajari%2Ffuzzy/lists"}