{"id":13836679,"url":"https://github.com/wolfgarbe/symspell","last_synced_at":"2025-07-10T16:30:28.331Z","repository":{"id":15366204,"uuid":"18097275","full_name":"wolfgarbe/SymSpell","owner":"wolfgarbe","description":"SymSpell: 1 million times faster spelling correction \u0026 fuzzy search through Symmetric Delete spelling correction algorithm","archived":false,"fork":false,"pushed_at":"2024-10-27T14:36:12.000Z","size":12686,"stargazers_count":3148,"open_issues_count":33,"forks_count":298,"subscribers_count":71,"default_branch":"master","last_synced_at":"2024-11-11T17:50:10.955Z","etag":null,"topics":["approximate-string-matching","chinese-text-segmentation","chinese-word-segmentation","damerau-levenshtein","edit-distance","fuzzy-matching","fuzzy-search","levenshtein","levenshtein-distance","spell-check","spellcheck","spelling","spelling-correction","symspell","text-segmentation","word-segmentation"],"latest_commit_sha":null,"homepage":"https://seekstorm.com/blog/1000x-spelling-correction/","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wolfgarbe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-03-25T11:01:35.000Z","updated_at":"2024-11-11T06:22:17.000Z","dependencies_parsed_at":"2022-07-12T14:02:47.973Z","dependency_job_id":"40e24fa0-95d0-4f8a-b5e2-561d852aa14b","html_url":"https://github.com/wolfgarbe/SymSpell","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolfgarbe%2FSymSpell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolfgarbe%2FSymSpell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolfgarbe%2FSymSpell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wolfgarbe%2FSymSpell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wolfgarbe","download_url":"https://codeload.github.com/wolfgarbe/SymSpell/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225647662,"owners_count":17502117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-string-matching","chinese-text-segmentation","chinese-word-segmentation","damerau-levenshtein","edit-distance","fuzzy-matching","fuzzy-search","levenshtein","levenshtein-distance","spell-check","spellcheck","spelling","spelling-correction","symspell","text-segmentation","word-segmentation"],"created_at":"2024-08-04T15:00:52.308Z","updated_at":"2025-07-10T16:30:28.317Z","avatar_url":"https://github.com/wolfgarbe.png","language":"C#","funding_links":[],"categories":["C# #","Spelling correction"],"sub_categories":["Other"],"readme":"SymSpell\u003cbr\u003e\n[![NuGet version](https://badge.fury.io/nu/symspell.svg)](https://badge.fury.io/nu/symspell)\n[![MIT License](https://img.shields.io/github/license/wolfgarbe/symspell.svg)](https://github.com/wolfgarbe/SymSpell/blob/master/LICENSE)\n========\n\nSpelling correction \u0026 Fuzzy search: **1 million times faster** through Symmetric Delete spelling correction algorithm\n \nThe Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster ([than the standard approach with deletes + transposes + replaces + inserts](http://norvig.com/spell-correct.html)) and language independent.\n\nOpposite to other algorithms only deletes are required, no transposes + replaces + inserts.\nTransposes + replaces + inserts of the input term are transformed into deletes of the dictionary term.\nReplaces and inserts are expensive and language dependent: e.g. Chinese has 70,000 Unicode Han characters!\n\nThe speed comes from the inexpensive **delete-only edit candidate generation** and the **pre-calculation**.\u003cbr\u003e\nAn average 5 letter word has about **3 million possible spelling errors** within a maximum edit distance of 3,\u003cbr\u003e\nbut SymSpell needs to generate **only 25 deletes** to cover them all, both at pre-calculation and at lookup time. Magic!\n\nIf you like SymSpell, try [**SeekStorm**](https://github.com/SeekStorm/SeekStorm) - a sub-millisecond full-text search library \u0026 multi-tenancy server in Rust (Open Source).\n\n\u003cbr\u003e\n\n```\nCopyright (c) 2025 Wolf Garbe\nVersion: 6.7.3\nAuthor: Wolf Garbe \u003cwolf.garbe@seekstorm.com\u003e\nMaintainer: Wolf Garbe \u003cwolf.garbe@seekstorm.com\u003e\nURL: https://github.com/wolfgarbe/symspell\nDescription: https://seekstorm.com/blog/1000x-spelling-correction/\n\nMIT License\n\nCopyright (c) 2025 Wolf Garbe\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated \ndocumentation files (the \"Software\"), to deal in the Software without restriction, including without limitation \nthe rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, \nand to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nhttps://opensource.org/licenses/MIT\n```\n\n---\n\n## Single word spelling correction\n\n**Lookup** provides a very fast spelling correction of single words.\n* A **Verbosity parameter** allows to control the number of returned results:\u003cbr\u003e\nTop: Top suggestion with the highest term frequency of the suggestions of smallest edit distance found.\u003cbr\u003e\nClosest: All suggestions of smallest edit distance found, suggestions ordered by term frequency.\u003cbr\u003e\nAll: All suggestions within maxEditDistance, suggestions ordered by edit distance, then by term frequency.\n* The **Maximum edit distance parameter** controls up to which edit distance words from the dictionary should be treated as suggestions.\n* The required **Word frequency dictionary** can either be directly loaded from text files (**LoadDictionary**) or generated from a large text corpus (**CreateDictionary**).\n\n#### Applications\n\n* Spelling correction,\n* Query correction (10–15% of queries contain misspelled terms),\n* Chatbots,\n* OCR post-processing,\n* Automated proofreading.\n* Fuzzy search \u0026 approximate string matching\n\n#### Performance (single term)\n\n0.033 milliseconds/word (edit distance 2) and 0.180 milliseconds/word (edit distance 3) (single core on 2012 Macbook Pro)\u003cbr\u003e\n\n![Benchmark](https://cdn-images-1.medium.com/max/800/1*1l_5pOYU3AhoijKfVD-Qag.png \"Benchmark\")\n\u003cbr\u003e\u003cbr\u003e\n**1,870 times faster than [BK-tree](https://en.wikipedia.org/wiki/BK-tree)** (see [Benchmark 1](https://seekstorm.com/blog/symspell-vs-bk-tree/): dictionary size=500,000, maximum edit distance=3, query terms with random edit distance = 0...maximum edit distance, verbose=0)\u003cbr\u003e\u003cbr\u003e\n**1 million times faster than [Norvig's algorithm](http://norvig.com/spell-correct.html)** (see [Benchmark 2](http://blog.faroo.com/2015/03/24/fast-approximate-string-matching-with-large-edit-distances/): dictionary size=29,157, maximum edit distance=3, query terms with fixed edit distance = maximum edit distance, verbose=0)\u003cbr\u003e\n\n#### Blog Posts: Algorithm, Benchmarks, Applications\n[1000x Faster Spelling Correction algorithm](https://seekstorm.com/blog/1000x-spelling-correction/)\u003cbr\u003e\n[Fast approximate string matching with large edit distances in Big Data](https://seekstorm.com/blog/fast-approximate-string-matching/)\u003cbr\u003e \n[Very fast Data cleaning of product names, company names \u0026 street names](https://seekstorm.com/blog/very-data-cleaning-of-product-names-company-names-street-names/)\u003cbr\u003e\n[Sub-millisecond compound aware automatic spelling correction](https://seekstorm.com/blog/sub-millisecond-compound-aware-automatic.spelling-correction/)\u003cbr\u003e\n[SymSpell vs. BK-tree: 100x faster fuzzy string search \u0026 spell checking](https://seekstorm.com/blog/symspell-vs-bk-tree/)\u003cbr\u003e\n[Fast Word Segmentation for noisy text](https://seekstorm.com/blog/fast-word-segmentation-noisy-text/)\u003cbr\u003e\n[The Pruning Radix Trie — a Radix trie on steroids](https://seekstorm.com/blog/pruning-radix-trie/)\u003cbr\u003e\n\n---\n\n## Compound aware multi-word spelling correction\n\n**LookupCompound** supports __compound__ aware __automatic__ spelling correction of __multi-word input__ strings. \n\n__1. Compound splitting \u0026 decompounding__\n\nLookup() assumes every input string as _single term_. LookupCompound also supports _compound splitting / decompounding_ with three cases:\n1. mistakenly __inserted space within a correct word__ led to two incorrect terms \n2. mistakenly __omitted space between two correct words__ led to one incorrect combined term\n3. __multiple input terms__ with/without spelling errors\n\nSplitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.\n\n__2. Automatic spelling correction__\n\n* Large document collections make manual correction infeasible and require unsupervised, fully-automatic spelling correction. \n* In conventional spelling correction of a single token, the user is presented with multiple spelling correction suggestions. \u003cbr\u003eFor automatic spelling correction of long multi-word text the algorithm itself has to make an educated choice.\n\n__Examples:__\n\n```diff\n- whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him\n+ where is the love he had dated for much of the past who couldn't read in sixth grade and inspired him  (9 edits)\n\n- in te dhird qarter oflast jear he hadlearned ofca sekretplan\n+ in the third quarter of last year he had learned of a secret plan  (9 edits)\n\n- the bigjest playrs in te strogsommer film slatew ith plety of funn\n+ the biggest players in the strong summer film slate with plenty of fun  (9 edits)\n\n- Can yu readthis messa ge despite thehorible sppelingmsitakes\n+ can you read this message despite the horrible spelling mistakes  (9 edits)\n```\n#### Performance (compounds)\n\n0.2 milliseconds / word (edit distance 2)\n5000 words / second (single core on 2012 Macbook Pro)\n\n---\n\n## Word Segmentation of noisy text\n\n**WordSegmentation** divides a string into words by inserting missing spaces at appropriate positions.\u003cbr\u003e\n* Misspelled words are corrected and do not prevent segmentation.\u003cbr\u003e\n* Existing spaces are allowed and considered for optimum segmentation.\u003cbr\u003e\n* SymSpell.WordSegmentation uses a [**Triangular Matrix approach**](https://seekstorm.com/blog/fast-word-segmentation-noisy-text/) instead of the conventional Dynamic Programming: It uses an array instead of a dictionary for memoization, loops instead of recursion and incrementally optimizes prefix strings instead of remainder strings.\u003cbr\u003e\n* The Triangular Matrix approach is faster than the Dynamic Programming approach. It has a lower memory consumption, better scaling (constant O(1) memory consumption vs. linear O(n)) and is GC friendly.\n* While each string of length n can be segmented into **2^n−1** possible [compositions](https://en.wikipedia.org/wiki/Composition_(combinatorics)),\u003cbr\u003e \n   SymSpell.WordSegmentation has a **linear runtime O(n)** to find the optimum composition.\n\n__Examples:__\n\n```diff\n- thequickbrownfoxjumpsoverthelazydog\n+ the quick brown fox jumps over the lazy dog\n\n- itwasabrightcolddayinaprilandtheclockswerestrikingthirteen\n+ it was a bright cold day in april and the clocks were striking thirteen\n\n- itwasthebestoftimesitwastheworstoftimesitwastheageofwisdomitwastheageoffoolishness\n+ it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness \n```\n\n__Applications:__\n\n* Word Segmentation for CJK languages for Indexing Spelling correction, Machine translation, Language understanding, Sentiment analysis\n* Normalizing English compound nouns for search \u0026 indexing (e.g. ice box = ice-box = icebox; pig sty = pig-sty = pigsty) \n* Word segmentation for compounds if both original word and split word parts should be indexed.\n* Correction of missing spaces caused by Typing errors.\n* Correction of Conversion errors: spaces between word may get lost e.g. when removing line breaks.\n* Correction of OCR errors: inferior quality of original documents or handwritten text may prevent that all spaces are recognized.\n* Correction of Transmission errors: during the transmission over noisy channels spaces can get lost or spelling errors introduced.\n* Keyword extraction from URL addresses, domain names, #hashtags, table column descriptions or programming variables written without spaces.\n* For password analysis, the extraction of terms from passwords can be required.\n* For Speech recognition, if spaces between words are not properly recognized in spoken language.\n* Automatic CamelCasing of programming variables.\n* Applications beyond Natural Language processing, e.g. segmenting DNA sequence into words\n\n__Performance:__\n\n4 milliseconds for segmenting an 185 char string into 53 words (single core on 2012 Macbook Pro)\n\u003cbr\u003e\n\n---\n\n#### Usage SymSpell Demo\nsingle word + Enter:  Display spelling suggestions\u003cbr\u003e\nEnter without input:  Terminate the program\n\n#### Usage SymSpellCompound Demo\nmultiple words + Enter: Display spelling suggestions\u003cbr\u003e\nEnter without input: Terminate the program\n\n#### Usage Segmentation Demo\nstring without spaces + Enter: Display word segmented text\u003cbr\u003e\nEnter without input: Terminate the program\n\n*Demo, DemoCompound and SegmentationDemo projects can be built with the free [Visual Studio Code](https://code.visualstudio.com/), which runs on Windows, MacOS and Linux.*\n\n#### Usage SymSpell Library\n```csharp\n//create object\nint initialCapacity = 82765;\nint maxEditDistanceDictionary = 2; //maximum edit distance per dictionary precalculation\nvar symSpell = new SymSpell(initialCapacity, maxEditDistanceDictionary);\n      \n//load dictionary\nstring baseDirectory = AppDomain.CurrentDomain.BaseDirectory;\nstring dictionaryPath= baseDirectory + \"../../../../SymSpell/frequency_dictionary_en_82_765.txt\";\nint termIndex = 0; //column of the term in the dictionary text file\nint countIndex = 1; //column of the term frequency in the dictionary text file\nif (!symSpell.LoadDictionary(dictionaryPath, termIndex, countIndex))\n{\n  Console.WriteLine(\"File not found!\");\n  //press any key to exit program\n  Console.ReadKey();\n  return;\n}\n\n//lookup suggestions for single-word input strings\nstring inputTerm=\"house\";\nint maxEditDistanceLookup = 1; //max edit distance per lookup (maxEditDistanceLookup\u003c=maxEditDistanceDictionary)\nvar suggestionVerbosity = SymSpell.Verbosity.Closest; //Top, Closest, All\nvar suggestions = symSpell.Lookup(inputTerm, suggestionVerbosity, maxEditDistanceLookup);\n\n//display suggestions, edit distance and term frequency\nforeach (var suggestion in suggestions)\n{ \n  Console.WriteLine(suggestion.term +\" \"+ suggestion.distance.ToString() +\" \"+ suggestion.count.ToString(\"N0\"));\n}\n\n\n//load bigram dictionary\nstring dictionaryPath= baseDirectory + \"../../../../SymSpell/frequency_bigramdictionary_en_243_342.txt\";\nint termIndex = 0; //column of the term in the dictionary text file\nint countIndex = 2; //column of the term frequency in the dictionary text file\nif (!symSpell.LoadBigramDictionary(dictionaryPath, termIndex, countIndex))\n{\n  Console.WriteLine(\"File not found!\");\n  //press any key to exit program\n  Console.ReadKey();\n  return;\n}\n\n//lookup suggestions for multi-word input strings (supports compound splitting \u0026 merging)\ninputTerm=\"whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him\";\nmaxEditDistanceLookup = 2; //max edit distance per lookup (per single word, not per whole input string)\nsuggestions = symSpell.LookupCompound(inputTerm, maxEditDistanceLookup);\n\n//display suggestions, edit distance and term frequency\nforeach (var suggestion in suggestions)\n{ \n  Console.WriteLine(suggestion.term +\" \"+ suggestion.distance.ToString() +\" \"+ suggestion.count.ToString(\"N0\"));\n}\n\n\n//word segmentation and correction for multi-word input strings with/without spaces\ninputTerm=\"thequickbrownfoxjumpsoverthelazydog\";\nmaxEditDistance = 0;\nsuggestion = symSpell.WordSegmentation(input);\n\n//display term and edit distance\nConsole.WriteLine(suggestion.correctedString + \" \" + suggestion.distanceSum.ToString(\"N0\"));\n\n\n//press any key to exit program\nConsole.ReadKey();\n```\n#### Three ways to add SymSpell to your project:\n1. Add **[SymSpell.cs](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/SymSpell.cs), [EditDistance.cs](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/EditDistance.cs) and [frequency_dictionary_en_82_765.txt](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/frequency_dictionary_en_82_765.txt)** to your project. All three files are located in the [SymSpell folder](https://github.com/wolfgarbe/SymSpell/tree/master/SymSpell). Enabling the compiler option **\"Prefer 32-bit\"** will significantly **reduce the memory consumption** of the precalculated dictionary.\n2. Add **[SymSpell NuGet](https://www.nuget.org/packages/symspell)** to your **Net Framework** project: Visual Studio / Tools / NuGet Packager / Manage Nuget packages for solution / Select \"Browse tab\"/ Search for SymSpell / Select SymSpell / Check your project on the right hand windows / Click install button. The [frequency_dictionary_en_82_765.txt](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/frequency_dictionary_en_82_765.txt) is **automatically installed**. \n3. Add **[SymSpell NuGet](https://www.nuget.org/packages/symspell)** to your **Net Core** project: Visual Studio / Tools / NuGet Packager / Manage Nuget packages for solution / Select \"Browse tab\"/ Search for SymSpell / Select SymSpell / Check your project on the right hand windows / Click install button. The [frequency_dictionary_en_82_765.txt](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/frequency_dictionary_en_82_765.txt) must be **copied manually** to your project.\n\nSymSpell targets [.NET Standard v2.0](https://blogs.msdn.microsoft.com/dotnet/2016/09/26/introducing-net-standard/) and can be used  in:\n1. NET Framework (**Windows** Forms, WPF, ASP.NET), \n2. NET Core (UWP, ASP.NET Core, **Windows**, **OS X**, **Linux**),\n3. XAMARIN (**iOS**, **OS X**, **Android**) projects.\n\n*The SymSpell, Demo,  DemoCompound and Benchmark projects can be built with the free [Visual Studio Code](https://code.visualstudio.com/), which runs on Windows, MacOS and Linux.*\n\n---\n\n#### Frequency dictionary\nDictionary quality is paramount for correction quality. In order to achieve this two data sources were combined by intersection: Google Books Ngram data which provides representative word frequencies (but contains many entries with spelling errors) and SCOWL — Spell Checker Oriented Word Lists which ensures genuine English vocabulary (but contained no word frequencies required for ranking of suggestions within the same edit distance).\n\nThe [frequency_dictionary_en_82_765.txt](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/frequency_dictionary_en_82_765.txt) was created by intersecting the two lists mentioned below. By reciprocally filtering only those words which appear in both lists are used. Additional filters were applied and the resulting list truncated to \u0026#8776; 80,000 most frequent words.\n* [Google Books Ngram data](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)   [(License)](https://creativecommons.org/licenses/by/3.0/) : Provides representative word frequencies\n* [SCOWL - Spell Checker Oriented Word Lists](http://wordlist.aspell.net/)   [(License)](http://wordlist.aspell.net/scowl-readme/) : Ensures genuine English vocabulary    \n\n#### Dictionary file format\n* Plain text file in UTF-8 encoding.\n* Word and Word Frequency are separated by space or tab. Per default, the word is expected in the first column and the frequency in the second column. But with the termIndex and countIndex parameters in LoadDictionary() the position and order of the values can be changed and selected from a row with more than two values. This allows to augment the dictionary with additional information or to adapt to existing dictionaries without reformatting.\n* Every word-frequency-pair in a separate line. A line is defined as a sequence of characters followed by a line feed (\"\\n\"), a carriage return (\"\\r\"), or a carriage return immediately followed by a line feed (\"\\r\\n\").\n* Both dictionary terms and input term are expected to be in **lower case**.\n\nYou can build your own frequency dictionary for your language or your specialized technical domain.\nThe SymSpell spelling correction algorithm supports languages with non-latin characters, e.g Cyrillic, Chinese or [Georgian](https://github.com/irakli97/Frequency_Dictionary_GE_363_202).\n\n#### Frequency dictionaries in other languages\n\nSymSpell includes an [English frequency dictionary](https://github.com/wolfgarbe/SymSpell/blob/master/SymSpell/frequency_dictionary_en_82_765.txt) \n\nDictionaries for Chinese, English, French, German, Hebrew, Italian, Russian and Spanish are located here:\u003cbr\u003e\n[SymSpell.FrequencyDictionary](SymSpell.FrequencyDictionary)  \n\nFrequency dictionaries in many other languages can be found here:\u003cbr\u003e\n[FrequencyWords repository](https://github.com/hermitdave/FrequencyWords)\u003cbr\u003e\n[Frequency dictionaries](https://github.com/dataiku/dss-plugin-nlp-preparation/tree/master/resource/dictionaries)\u003cbr\u003e\n[Frequency dictionaries](https://github.com/LuminosoInsight/wordfreq/tree/master/wordfreq/data)\n\nN-Gram Generator by repetitio:\u003cbr\u003e\nThis repository contains a script to generate unigrams and bigrams from Wikipedias dataset from HuggingFace for the use with the SymSpell.\u003cbr\u003e\nhttps://gitlab.com/repetitio/utils/ngram-frequencies/-/tree/main?ref_type=heads\n\n---\n\n**C#** (original source code)\u003cbr\u003e\nhttps://github.com/wolfgarbe/symspell\n\n**.NET** (NuGet package)\u003cbr\u003e\nhttps://www.nuget.org/packages/symspell\n\n### Ports\n\nThe following third party ports or reimplementations to other programming languages have not been tested by myself whether they are an exact port, error free, provide identical results or are as fast as the original algorithm. \n\nMost ports target SymSpell **version 3.0**. But **version 6.1.** provides **much higher speed \u0026 lower memory consumption!**\n\n**WebAssembly**\u003cbr\u003e\nhttps://github.com/justinwilaby/spellchecker-wasm\u003cbr\u003e\n\n**WEB API (Docker)**\u003cbr\u003e\nhttps://github.com/LeonErath/SymSpellAPI (Version 6.3)\u003cbr\u003e\n\n**C++**\u003cbr\u003e\nhttps://github.com/AtheS21/SymspellCPP (Version 6.5)\u003cbr\u003e\nhttps://github.com/erhanbaris/SymSpellPlusPlus (Version 6.1)\n\n**Crystal**\u003cbr\u003e\nhttps://github.com/chenkovsky/aha/blob/master/src/aha/sym_spell.cr\n\n**Go**\u003cbr\u003e\nhttps://github.com/snapp-incubator/go-symspell\u003cbr\u003e\nhttps://github.com/sajari/fuzzy\u003cbr\u003e\nhttps://github.com/eskriett/spell\n\n**Haskell**\u003cbr\u003e\nhttps://github.com/cbeav/symspell\n\n**Java**\u003cbr\u003e\nhttps://github.com/MighTguY/customized-symspell (Version 6.6)\u003cbr\u003e\nhttps://github.com/rxp90/jsymspell (Version 6.6)\u003cbr\u003e\nhttps://github.com/Lundez/JavaSymSpell (Version 6.4)\u003cbr\u003e\nhttps://github.com/rxp90/jsymspell\u003cbr\u003e\nhttps://github.com/gpranav88/symspell\u003cbr\u003e\nhttps://github.com/searchhub/preDict\u003cbr\u003e\nhttps://github.com/jpsingarayar/SpellBlaze\n\n**Javascript**\u003cbr\u003e\nhttps://github.com/MathieuLoutre/node-symspell (Version 6.6, needs Node.js)\u003cbr\u003e\nhttps://github.com/itslenny/SymSpell.js\u003cbr\u003e\nhttps://github.com/dongyuwei/SymSpell\u003cbr\u003e\nhttps://github.com/IceCreamYou/SymSpell\u003cbr\u003e\nhttps://github.com/Yomguithereal/mnemonist/blob/master/symspell.js\n\n**Julia**\u003cbr\u003e\nhttps://github.com/Arkoniak/SymSpell.jl\n\n**Kotlin**\u003cbr\u003e\nhttps://github.com/Wavesonics/SymSpellKt\n\n**Objective-C**\u003cbr\u003e\nhttps://github.com/AmitBhavsarIphone/SymSpell (Version 6.3)\n\n**Python**\u003cbr\u003e\nhttps://github.com/mammothb/symspellpy  (Version 6.7)\u003cbr\u003e\nhttps://github.com/viig99/SymSpellCppPy  (Version 6.5)\u003cbr\u003e\nhttps://github.com/zoho-labs/symspell (Python bindings of Rust version)\u003cbr\u003e\nhttps://github.com/ne3x7/pysymspell/ (Version 6.1)\u003cbr\u003e\nhttps://github.com/Ayyuriss/SymSpell\u003cbr\u003e\nhttps://github.com/ppgmg/github_public/blob/master/spell/symspell_python.py\u003cbr\u003e\nhttps://github.com/rcourivaud/symspellcompound\u003cbr\u003e\nhttps://github.com/Esukhia/sympound-python\u003cbr\u003e\nhttps://www.kaggle.com/yk1598/symspell-spell-corrector\n\n**Ruby**\u003cbr\u003e\nhttps://github.com/PhilT/symspell\n\n**Rust**\u003cbr\u003e\nhttps://github.com/reneklacan/symspell (Version 6.6, compiles to WebAssembly)\u003cbr\u003e\nhttps://github.com/luketpeterson/fuzzy_rocks (persistent datastore backed by RocksDB)\n\n**Scala**\u003cbr\u003e\nhttps://github.com/semkath/symspell\n\n**Swift**\u003cbr\u003e\nhttps://github.com/gdetari/SymSpellSwift\n\n---\n\n### Citations\n\nContextual Multilingual Spellchecker for User Queries\u003cbr\u003e\nSanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora (Adobe)\u003cbr\u003e\nhttps://arxiv.org/abs/2305.01082\n\nA context sensitive real-time Spell Checker with language adaptability\u003cbr\u003e\nPrabhakar Gupta (Amazon)\u003cbr\u003e\nhttps://arxiv.org/abs/1910.11242\n\nSpeakGer: A meta-data enriched speech corpus of German state and federal parliaments\u003cbr\u003e\nKai-Robin Lange and Carsten Jentsch\u003cbr\u003e\nhttps://arxiv.org/pdf/2410.17886\n\nAn Extended Sequence Tagging Vocabulary for Grammatical Error Correction\u003cbr\u003e\nStuart Mesham, Christopher Bryant, Marek Rei, Zheng Yuan\u003cbr\u003e\nhttps://arxiv.org/abs/2302.05913\n\nGerman Parliamentary Corpus (GERPARCOR)\u003cbr\u003e\nGiuseppe Abrami, Mevlüt Bagci, Leon Hammerla, Alexander Mehler\u003cbr\u003e\nhttps://arxiv.org/abs/2204.10422\n\niOCR: Informed Optical Character Recognition for Election Ballot Tallies\u003cbr\u003e\nKenneth U. Oyibo, Jean D. Louis, Juan E. Gilbert\u003cbr\u003e\nhttps://arxiv.org/abs/2208.00865\n\nAmazigh spell checker using Damerau-Levenshtein algorithm and N-gram\u003cbr\u003e\nYouness Chaabi, Fadoua Ataa Allah\u003cbr\u003e\nhttps://www.sciencedirect.com/science/article/pii/S1319157821001828\n\nSurvey of Query correction for Thai business-oriented information retrieval\u003cbr\u003e\nPhongsathorn Kittiworapanya, Nuttapong Saelek, Anuruth Lertpiya, Tawunrat Chalothorn\u003cbr\u003e\nhttps://ieeexplore.ieee.org/document/9376809\n\nSymSpell and LSTM based Spell- Checkers for Tamil\u003cbr\u003e\nSelvakumar MuruganTamil Arasan BakthavatchalamTamil Arasan BakthavatchalamMalaikannan Sankarasubbu\u003cbr\u003e\nhttps://www.researchgate.net/publication/349924975_SymSpell_and_LSTM_based_Spell-_Checkers_for_Tamil\n\nSymSpell4Burmese: Symmetric Delete Spelling Correction Algorithm (SymSpell) for Burmese Spelling Checking\u003cbr\u003e\nEi Phyu Phyu Mon; Ye Kyaw Thu; Than Than Yu; Aye Wai Oo\u003cbr\u003e\nhttps://ieeexplore.ieee.org/document/9678171\n\nSpell Check Indonesia menggunakan Norvig dan SymSpell\u003cbr\u003e\nYasir Abdur Rohman\u003cbr\u003e\nhttps://medium.com/@yasirabd/spell-check-indonesia-menggunakan-norvig-dan-symspell-4fa583d62c24\n\nAnalisis Perbandingan Metode Burkhard Keller Tree dan SymSpell dalam Spell Correction Bahasa Indonesia\u003cbr\u003e\nMuhammad Hafizh Ferdiansyah, I Kadek Dwi Nuryana\u003cbr\u003e\nhttps://ejournal.unesa.ac.id/index.php/jinacs/article/download/50989/41739\n\nImproving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith\u003cbr\u003e\nMuhammad zaky ramadhanKemas M LhaksmanaKemas M Lhaksmana\u003cbr\u003e\nhttps://www.researchgate.net/publication/342390145_Improving_Document_Retrieval_with_Spelling_Correction_for_Weak_and_Fabricated_Indonesian-Translated_Hadith\n\nSymspell을 이용한 한글 맞춤법 교정\u003cbr\u003e\n김희규\u003cbr\u003e\nhttps://heegyukim.medium.com/symspell%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%ED%95%9C%EA%B8%80-%EB%A7%9E%EC%B6%A4%EB%B2%95-%EA%B5%90%EC%A0%95-3def9ca00805\n\nMending Fractured Texts. A heuristic procedure for correcting OCR data\u003cbr\u003e\nJens Bjerring-Hansen, Ross Deans Kristensen-McLachla2, Philip Diderichsen and Dorte Haltrup Hansen\u003cbr\u003e\nhttps://ceur-ws.org/Vol-3232/paper14.pdf\n\nTowards the Natural Language Processing as Spelling Correction for Offline Handwritten Text Recognition Systems\u003cbr\u003e\nArthur Flor de Sousa Neto; Byron Leite Dantas Bezerra; and Alejandro Héctor Toselli\u003cbr\u003e\nhttps://www.mdpi.com/2076-3417/10/21/7711\n\nWhen to Use OCR Post-correction for Named Entity Recognition?\u003cbr\u003e\nVinh-Nam Huynh, Ahmed Hamdi, Antoine Doucet\u003cbr\u003e\nhttps://hal.science/hal-03034484v1/\n\nAutomatic error Correction: Evaluating Performance of Spell Checker Tools\u003cbr\u003e\nA. Tolegenova\u003cbr\u003e\nhttps://journals.sdu.edu.kz/index.php/nts/article/view/690\n\nZHAW-CAI: Ensemble Method for Swiss German Speech to Standard German Text\u003cbr\u003e\nMalgorzata Anna Ulasik, Manuela Hurlimann, Bogumila Dubel, Yves Kaufmann,\u003cbr\u003e\nSilas Rudolf, Jan Deriu, Katsiaryna Mlynchyk, Hans-Peter Hutter, and Mark Cieliebak\u003cbr\u003e\nhttps://ceur-ws.org/Vol-2957/sg_paper3.pdf\n\nCyrillic Word Error Program Based on Machine Learning\u003cbr\u003e\nBattumur, K., Dulamragchaa, U., Enkhbat, S., Altanhuyag, L., \u0026 Tumurbaatar, P.\u003cbr\u003e\nhttps://mongoliajol.info/index.php/JIMDT/article/view/2661\n\nFast Approximate String Search for Wikification\u003cbr\u003e\nSzymon Olewniczak, Julian Szymanski\u003cbr\u003e\nhttps://www.iccs-meeting.org/archive/iccs2021/papers/127440334.pdf\n\nRuMedSpellchecker: Correcting Spelling Errors for Natural Russian Language in Electronic Health Records Using Machine Learning Techniques\u003cbr\u003e\nDmitrii Pogrebnoi, Anastasia Funkner, Sergey Kovalchuk\u003cbr\u003e\nhttps://link.springer.com/chapter/10.1007/978-3-031-36024-4_16\n\nAn Extended Sequence Tagging Vocabulary for Grammatical Error Correction\u003cbr\u003e\nStuart Mesham, Christopher Bryant, Marek Rei, Zheng Yuan\u003cbr\u003e\nhttps://aclanthology.org/2023.findings-eacl.119.pdf\n\nLightning-fast adaptive immune receptor similarity search by symmetric deletion lookup\u003cbr\u003e\nTouchchai Chotisorayuth, Andreas Tiffeau-Mayer\u003cbr\u003e\nhttps://arxiv.org/html/2403.09010v1\n\nUnveiling Disguised Toxicity: A Novel Pre-processing Module for Enhanced Content Moderation\u003cbr\u003e\nJohnny Chan, Yuming Li\u003cbr\u003e\nhttps://www.sciencedirect.com/science/article/pii/S2215016124001225\n\nBeyond the dictionary attack: Enhancing password cracking efficiency through machine learning-induced mangling rules\u003cbr\u003e\nRadek Hranický, Lucia Šírová, Viktor Rucký\u003cbr\u003e\nhttps://www.sciencedirect.com/science/article/pii/S2666281725000046\n\n---\n\n### Upcoming changes\n\n1. Utilizing the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle) by partitioning both query and dictionary terms will result in 5x less memory consumption and 3x faster precalculation time. \n2. Option to preserve case (upper/lower case) of input term.\n3. Open source the code for creating custom frequency dictionaries in any language and size as intersection between Google Books Ngram data (Provides representative word frequencies) and SCOWL Spell Checker Oriented Word Lists (Ensures genuine English vocabulary).\n\n#### Changes in v6.7.3\n\n- TargetFrameworks changed from `netstandard2.0;net461;net47;netcoreapp3.0` to `netstandard2.0;net9.0`.\n- PackageReferences updated.\n- In SymSpell.Test all Assert.AreEqual changed to Assert.That\n- Incorporates PR #126 that fixes null reference exception in CommitStaged (#139).\n\n#### Changes in v6.7.2\n\n1. Exception fixed in WordSegmentation\n2. Platform changed from netcore 2.1 to netcore 3.0\n\n#### Changes in v6.7.1\n\n1. Framework target changed from net472 to net47\u003cbr\u003e\n2. Framework target added netcoreapp3.0\u003cbr\u003e\n3. More common contractions added to frequency_dictionary_en_82_765.txt\u003cbr\u003e\n\n#### Changes in v6.7\n\n1. WordSegmentation did not work correctly if input string contained words in uppercase.\u003cbr\u003e\n2. WordSegmentation now retains/preserves case.\u003cbr\u003e\n3. WordSegmentation now keeps punctuation or apostrophe adjacent to previous word.\u003cbr\u003e\n4. WordSegmentation now normalizes ligatures: \"scientiﬁc\" -\u003e \"scientific\".\u003cbr\u003e\n5. WordSegmentation now removes hyphens prior to word segmentation (as they might be caused by syllabification).\u003cbr\u003e\n6. American English word forms added to dictionary in addition to British English e.g. favourable -\u003e favorable.\u003cbr\u003e\n\n#### Changes in v6.6\n\n1. IMPROVEMENT: LoadDictionary and LoadBigramDictionary now have an optional separator parameter, which defines the separator characters (e.g. '\\t') between term(s) and count. Default is defaultSeparatorChars=null for white space.\u003cbr\u003e\nThis allows the dictionaries to contain space separated phrases.\u003cbr\u003e\nIf in LoadBigramDictionary no separator parameter is stated or defaultSeparatorChars (whitespace) is stated as separator parameter, then take two term parts, otherwise take only one (which then itself is a space separated bigram).\n\n#### Changes in v6.5\n\n1. IMPROVEMENT: Better SymSpell.LookupCompound correction quality with existing single term dictionary by using Naive Bayes probability for selecting best word splitting.\u003cbr\u003e\n`bycycle` -\u003e `bicycle` (instead of  `by cycle` )\u003cbr\u003e\n`inconvient` -\u003e `inconvenient` (instead of `i convent`)\u003cbr\u003e\n2. IMPROVEMENT: Even better SymSpell.LookupCompound correction quality, when using the optional bigram dictionary in order to use sentence level context information for selecting best spelling correction.\u003cbr\u003e\n3. IMPROVEMENT: English bigram frequency dictionary included\n\n#### Changes in v6.4\n\n1.\tLoadDictioary(Stream, ...) and CreateDictionary(Stream) methods added (contibution by [ccady](https://github.com/ccady))\u003cbr\u003e\n\tAllows to get dictionaries from network streams, memory streams, and resource streams in addition to previously supported files.\n\n#### Changes in v6.3\n\n1. IMPROVEMENT: WordSegmentation added:\u003cbr\u003e\n   WordSegmentation divides a string into words by inserting missing spaces at appropriate positions.\u003cbr\u003e\n   Misspelled words are corrected and do not prevent segmentation.\u003cbr\u003e\n   Existing spaces are allowed and considered for optimum segmentation.\u003cbr\u003e\n   SymSpell.WordSegmentation uses a [novel approach to word segmentation **without** recursion](https://seekstorm.com/blog/fast-word-segmentation-noisy-text/).\u003cbr\u003e\n   While each string of length n can be segmented into **2^n−1** possible [compositions](https://en.wikipedia.org/wiki/Composition_(combinatorics)),\u003cbr\u003e \n   SymSpell.WordSegmentation has a **linear runtime O(n)** to find the optimum composition.\n2. IMPROVEMENT: New CommandLine parameters:\u003cbr\u003e\n   LookupType: lookup, lookupcompound, wordsegment.\u003cbr\u003e\n   OutputStats: switch to show only corrected string or corrected string, edit distance, word frequency/probability.\n3. IMPROVEMENT: Lookup with maxEditDistance=0 faster.\n\n#### Changes in v6.2\n\n1. IMPROVEMENT: SymSpell.CommandLine project added. Allows pipes and redirects for Input \u0026 Output.\n   Dictionary/Copus file, MaxEditDistance, Verbosity, PrefixLength can be specified via Command Line. \n   No programming required.\n2. IMPROVEMENT: DamerauOSA edit distance updated, Levenshtein edit distance added (in SoftWx.Match by [Steve Hatchett](https://github.com/softwx))\n3. CHANGE: Other projects in the SymSpell solution now use references to SymSpell instead of links to the source files.\n\n#### Changes in v6.1\n\n1. IMPROVEMENT: [SymSpellCompound](https://github.com/wolfgarbe/SymSpellCompound) has been refactored from static to instantiated class and integrated into [SymSpell](https://github.com/wolfgarbe/SymSpell)\n   Therefore SymSpellCompound is now also based on the latest SymSpell version with all fixes and performance improvements\n2. IMPROVEMENT: symspell.demo.csproj, symspell.demoCompound.csproj, symspell.Benchmark.csproj have been recreated from scratch \n   and target now .Net Core instead of .Net Framework for improved compatibility with other platforms like MacOS and Linux\n3. CHANGE: The testdata directory has been moved from the demo folder into the benchmark folder\n4. CHANGE: License changed from LGPL 3.0 to the more permissive MIT license to allow frictionless commercial usage.\n\n#### Changes in v6.0\n\n1. IMPROVEMENT: SymSpell internal dictionary has been refactored by [Steve Hatchett](https://github.com/softwx).\u003cbr\u003e\n   2x faster dictionary precalculation and 2x lower memory consumption.\n\n#### Changes in v5.1\n\n1. IMPROVEMENT: SymSpell has been refactored from static to instantiated class by [Steve Hatchett](https://github.com/softwx).\n2. IMPROVEMENT: Added benchmarking project. \n3. IMPROVEMENT: Added unit test project.\n4. IMPROVEMENT:\tDifferent maxEditDistance for dictionary precalculation and for Lookup. \n5. CHANGE: Removed language feature (use separate SymSpell instances instead).\n6. CHANGE: Verbosity parameter changed from Int to Enum\n7. FIX: Incomplete lookup results, if maxEditDistance=1 AND input.Length\u003eprefixLength.\n8. FIX: count overflow protection fixed.\n\n#### Changes in v5.0\n1. FIX: Suggestions were not always complete for input.Length \u003c= editDistanceMax.\n2. FIX: Suggestions were not always complete/best for verbose \u003c 2.\n3. IMPROVEMENT: Prefix indexing implemented: more than 90% memory reduction, depending on prefix length and edit distance.\n   The discriminatory power of additional chars is decreasing with word length. \n   By restricting the delete candidate generation to the prefix, we can save space, without sacrificing filter efficiency too much. \n   Longer prefix length means higher search speed at the cost of higher index size.\n4. IMPROVEMENT: Algorithm for DamerauLevenshteinDistance() changed for a faster one.\n5. ParseWords() without LINQ\n6. CreateDictionaryEntry simplified, AddLowestDistance() removed.\n7. Lookup() improved.\n8. Benchmark() added: Lookup of 1000 terms with random spelling errors.\n\n#### Changes in v4.1\n1. symspell.csproj Generates a [SymSpell NuGet package](https://www.nuget.org/packages/symspell) (which can be added to your project)\n2. symspelldemo.csproj Shows how SymSpell can be used in your project (by using symspell.cs directly or by adding the [SymSpell NuGet package](https://www.nuget.org/packages/symspell) )\n\n#### Changes in v4.0\n1. Fix: previously not always all suggestions within edit distance (verbose=1) or the best suggestion (verbose=0) were returned : e.g. \"elove\" did not return \"love\"\n2. Regex will not anymore split words at apostrophes\n3. Dictionary\u003cstring, object\u003e dictionary   changed to   Dictionary\u003cstring, Int32\u003e dictionary\n4. LoadDictionary() added to load a frequency dictionary. CreateDictionary remains and can be used alternatively to create a dictionary from a large text corpus.\n5. English word frequency dictionary added (wordfrequency_en.txt). Dictionary quality is paramount for correction quality. In order to achieve this two data sources were combined by intersection:\n   Google Books Ngram data which provides representative word frequencies (but contains many entries with spelling errors) and SCOWL — Spell Checker Oriented Word Lists which ensures genuine English vocabulary (but contained no word frequencies required for ranking of suggestions within the same edit distance).\n6. dictionaryItem.count was changed from Int32 to Int64 for compatibility with dictionaries derived from Google Ngram data.\n\n---\n\n**SymSpell** is contributed by [**SeekStorm** - the high performance Search as a Service \u0026 search API](https://seekstorm.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwolfgarbe%2Fsymspell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwolfgarbe%2Fsymspell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwolfgarbe%2Fsymspell/lists"}