{"id":17156912,"url":"https://github.com/dynom/tysug","last_synced_at":"2025-04-13T13:22:43.888Z","repository":{"id":44499159,"uuid":"136226996","full_name":"Dynom/TySug","owner":"Dynom","description":"A project around helping to prevent typing typos. TySug (Typo Suggestions) suggests alternative words with respect to keyboard layouts","archived":false,"fork":false,"pushed_at":"2023-03-07T02:17:25.000Z","size":451,"stargazers_count":19,"open_issues_count":2,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-10T08:11:15.712Z","etag":null,"topics":["algorithm","cors","docker","go","golang","jaro","jaro-winkler","keyboard","keyboard-layout","library","spelling-errors","string-distance","suggestions","toml","typing","typo","webservice","words"],"latest_commit_sha":null,"homepage":"https://tysug.net","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dynom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-06-05T19:46:29.000Z","updated_at":"2025-02-01T08:22:16.000Z","dependencies_parsed_at":"2024-01-13T04:12:02.629Z","dependency_job_id":"89994337-9397-428e-8dfd-ec23e60e9d0f","html_url":"https://github.com/Dynom/TySug","commit_stats":{"total_commits":184,"total_committers":2,"mean_commits":92.0,"dds":0.08152173913043481,"last_synced_commit":"708d0917d75b114dded2e9fc9b485f988c6edf88"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dynom%2FTySug","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dynom%2FTySug/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dynom%2FTySug/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dynom%2FTySug/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dynom","download_url":"https://codeload.github.com/Dynom/TySug/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248718298,"owners_count":21150552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","cors","docker","go","golang","jaro","jaro-winkler","keyboard","keyboard-layout","library","spelling-errors","string-distance","suggestions","toml","typing","typo","webservice","words"],"created_at":"2024-10-14T22:07:44.440Z","updated_at":"2025-04-13T13:22:43.845Z","avatar_url":"https://github.com/Dynom.png","language":"Go","readme":"# TySug\n\n[![CircleCI](https://circleci.com/gh/Dynom/TySug.svg?style=svg)](https://circleci.com/gh/Dynom/TySug)\n[![Go Report Card](https://goreportcard.com/badge/github.com/Dynom/TySug)](https://goreportcard.com/report/github.com/Dynom/TySug)\n[![GoDoc](https://godoc.org/github.com/Dynom/TySug?status.svg)](https://godoc.org/github.com/Dynom/TySug)\n[![codecov](https://codecov.io/gh/Dynom/TySug/branch/master/graph/badge.svg)](https://codecov.io/gh/Dynom/TySug)\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge-flat.svg)](https://github.com/avelino/awesome-go)\n\nTySug is collection of packages, together they form a keyboard layout aware alternative word suggester. It can be used as both a library and a webservice.\n\n![shcool](https://raw.githubusercontent.com/Dynom/TySug/master/docs/shcool.jpg)\n\nThe primary supported use-case is to help with spelling mistakes against short popular word lists (e.g. domain names). \nWhich is useful in helping to prevent typos in e.g. e-mail addresses, detect spam, phishing ([Typosquatting](https://en.m.wikipedia.org/wiki/Typosquatting)), etc. \n\nThe goal is to provide an extensible library that helps with finding possible spelling errors. You can use it \nout-of-the-box as a library, a webservice or as a set of packages to build your own application.\n\nCurrently, it's a fairly naive approach and not (yet) backed by ML.\n\n\n# Using TySug\n\nYou can use TySug as stand-alone webservice to match against a known-list. If you have Docker you'll have it up and running in a few minutes. \n\n## TL;DR\n\nIf you have Docker installed, and you quickly want to tinker, just run:\n\n```bash\ndocker run --rm -it dynom/tysug:latest\n```\n\n_If you don't have Docker, you can download the binary from the [releases](https://github.com/Dynom/TySug/releases) page._\n\nIn a different terminal, run:\n\n```bash\ncurl -s \"http://127.0.0.1:1337/list/domains\" --data-binary '{\"input\": \"gmail.co\"}'\n```\n\n## As Webservice\n\n_`curl -s \"http://host:port/list/domains\" --data-binary '{\"input\": \"gmail.co\"}' | jq .`_\n```json\n{\n  \"result\": \"gmail.com\",\n  \"score\": 0.9777777777777777,\n  \"exact_match\": false\n}\n```\n\n- The webservice uses [Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) to calculate similarity.\n- The example uses [jq](https://stedolan.github.io/jq/), just omit it if you don't have it installed.\n\n\n### The path /list/\u003c name \u003e\n\nThe name corresponds with a list definition in the [config.toml](https://github.com/Dynom/TySug/blob/master/config.toml). Using this approach the service can be used for various \ntypes of data. This is both for efficiency (shorter lists to iterate over) and to be more opinionated. when no list by \nthat name is found, a 404 is returned.\n\n\n## As a library\nTySug is a collection of stand-alone packages. In each library you can find a README covering the details.\n```go\nimport \"github.com/Dynom/TySug/finder\"\n```\n```go\nreferenceList := []string{\"example\", \"amplifier\", \"ample\"}\nts := finder.New(referenceList, finder.WithAlgorithm(myAlgorithm))\n\nalt, score, exact := ts.Find(\"exampel\")\n// alt   = example\n// score = 0.9714285714285714\n// exact = false (not an exact match in our reference list)\n```\n\n### Using a different algorithm\n\nif you want to use a different algorithm, simply wrap your algorithm in a `finder.Algorithm` compatible type and pass \nit as argument to the Finder. You can find inspiration in the unit-tests / examples.\n\nPossible considerations:\n - [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance)\n - [Damerau-Levenshtein](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)\n - [LCS](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem)\n - [q-gram](https://en.wikipedia.org/wiki/N-gram)\n - [Cosine](https://en.wikipedia.org/wiki/Cosine_similarity)\n - [Jaccard](https://en.wikipedia.org/wiki/Jaccard_index)\n - [Jaro / Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)\n - [Smith-Waterman](https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm)\n - [Sift4](https://siderite.dev/blog/super-fast-and-accurate-string-distance.html) (used in [mailcheck.js](https://github.com/mailcheck/mailcheck))\n \nSources:\n - [joyofdata.de/blog/comparison-of-string-distance-algorithms/](https://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/)\n\n### Dealing with confidence\n\nWhen adding your own algorithm, you'll need to handle the \"confidence\" element yourself. By default, TySug's finder will \nhandle it just fine, but depending on the scale the algorithm uses you'll need to either normalize the scale or deal \nwith the score. \n\n_Note: Be careful not to introduce bias when converting scale._\n```go\nvar someAlgorithm finder.AlgWrapper = func(a, b string) float64 {\n\n    // Result is, in this instance, the amount of steps taken to achieve equality\n    // Algorithms like Jaro produce a value between 0.0 and 1.0\n    score := someAlgorithm.CalculateDistance(a, b)\n    \n    // Finding the longest string\n    var ml = len(b)\n    if len(a) \u003e= len(b) {\n        ml = len(a)\n    }\n    \n    // This introduces a bias. Inputs of longer lengths get a slight favour over shorter ones, causing deletions to weigh less.\n    return 1 - (score / float64(ml))\n}\n\nsug := finder.New([]list, finder.WithAlgorithm(someAlgorithm))\nbestMatch, score := sug.Find(input)\n// Here score might be 0.8 for a string of length 10, with 2 mutations\n```\n\nWithout converting the scale, you'll have no bias, however you need to deal with a range where closer to 0 means fewer changes:\n```go\n// This will produce a range from (-1 * maximumInputLength) to 0\nreturn -1 * score\n```\n# Details\n\n## Reference lists\n\nThe reference list is a list with known/approved words. TySug's webservice is not optimised to deal with large lists, \ninstead it aims for \"opinionated\" lists. This way you can have a list of domain names or country names. This keeps the \nservice snappy and less prone to false-positives.\n\nLarge is relative. The size is strongly related to the processing time, longer lists take more time \n[O(N)](http://bigocheatsheet.com/). Test and keep the list within your response-time limits :-). \n\n### Case-sensitivity\n\nTySug does not normalise words. This means that words are treated in a case-sensitive matter. This is done mostly to\navoid doing unnecessary work in the hot-path. Typically, you'll want to make sure both your lists and your input uses the\nsame casing.\n\n### Ordering\n\nThe reference list order is significant. The first of an equal score wins the election. So you'll want to put more \ncommon, popular, etc. words first in the list. \n\n## Keyboard layout awareness\n\nTysug's webservice is keyboard layout aware. This means that when the input is 'bee5' and the reference list contains the \nwords 'beer' and 'beek', the word 'beer' is favoured on a Query-US keyboard.\n\nThis happens because of a two-pass approach. In the first pass a list of words is collected with 1 or more words with the\nsame score. If more than 1 word is found with the same score, the keyboard algorithm is applied. Most string-distance\nalgorithms factor in the \"cost\" of reaching equality. The amount of \"cost\" it takes with one letter difference, in the \nsame location within a word (E.g.: bee5 versus beer or beek) is typically the same. Making in the assumption that a \nword is typed by a human on a keyboard and that fingers need to travel a distance to reach certain buttons. Factoring in\nthis assumption could produce better results in the right context.\n\n# Examples\n\n## Finding common e-mail domain typos\n\nTo help people avoid submitting an incorrect e-mail address, one could try the following:\n\n```go\nfunc SuggestAlternative(email string, domains []string) (string, float64) {\n    i := strings.LastIndex(email, \"@\")\n    if 0 \u003e= i || i \u003e= len(email) {\n        return email, 0\n    }\n\n    // Extracting the local and domain parts\n    localPart := email[:i]\n    hostname := email[i+1:]\n\n    sug, _ := finder.New(domains)\n    alternative, score, exact := sug.Find(hostname)\n\n    if exact || score \u003e 0.9 {\n        combined := localPart + \"@\" + alternative\n        return combined, score\n    }\n\n    return email, score\n}\n```\n\n# Typos\nDealing with typos is complicated and heavily context dependent.\n\n- Atomic typos -- Typing a (contextual) incorrect, but correctly spelled word (e.g.: _beer_ where you meant: _beet_).\n- Intentional typos -- Typing \"[teh](https://en.m.wikipedia.org/wiki/Teh)\" instead of \"the\".\n- Marking Typos -- Intentional \"typos\" (e.g.: Bee5^Hr -\u003e _Beer_ or \"World Wide Mess^WWeb\" -\u003e _World Wide Web_.) \n\n# Resources\n\n- [https://www.digitalcoding.com/tools/typo-generator.html](https://www.digitalcoding.com/tools/typo-generator.html)\n- [http://aspell.net](http://aspell.net) \n\n# Further reading\n\n- How Difficult is it to Develop a Perfect Spell-checker? A Cross-linguistic Analysis through Complex Network Approach - [https://aclanthology.org/W07-0212/](https://aclanthology.org/W07-0212/) (_[archive.ph link](https://archive.ph/nppY3)_)\n- Typographical and Orthographical Spelling Error Correction - [https://aclanthology.org/L00-1169/](https://aclanthology.org/L00-1169/) (_[archive.ph link](https://archive.ph/O74pN)_)\n- How to Write a Spelling Corrector - [https://norvig.com/spell-correct.html](https://norvig.com/spell-correct.html)\n- Using the Web for Language Independent Spellchecking and Autocorrection - [http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36180.pdf](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36180.pdf)\n- Spellchecking by computer - [https://www.dcs.bbk.ac.uk/..roger/spellchecking.html](https://www.dcs.bbk.ac.uk/~roger/spellchecking.html)\n\n# Contributing\n\nFirst of all: Awesome!\n\nBefore contributing, _please create an issue with the thing you'd like to contribute_.\n\nAny contribution must be provided in the form of a PR and the CI build must pass. Any contribution, when relevant, must have tests proving correctness. The coding-style must be the Go standard, complemented by the community \"[Code Review Comments](https://github.com/golang/go/wiki/CodeReviewComments)\" laundry list.\n\n# Security\n\nAny security related issues can be submitted as regular issues in the issue tracker. E-mail me directly if you don't want to disclose it publicly.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdynom%2Ftysug","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdynom%2Ftysug","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdynom%2Ftysug/lists"}