{"id":28939415,"url":"https://github.com/circargs/mrs_spellings","last_synced_at":"2025-10-10T15:08:31.093Z","repository":{"id":57443638,"uuid":"271392578","full_name":"CircArgs/mrs_spellings","owner":"CircArgs","description":"a micro utility to generate plausible misspellings","archived":false,"fork":false,"pushed_at":"2020-06-11T22:42:53.000Z","size":145,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-01T19:35:06.598Z","etag":null,"topics":["augmentation","misspellings","nlp","procedural-generation","qwerty-based-char-distance","tokenization"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CircArgs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-10T21:47:45.000Z","updated_at":"2021-11-10T21:25:55.000Z","dependencies_parsed_at":"2022-09-05T09:31:52.382Z","dependency_job_id":null,"html_url":"https://github.com/CircArgs/mrs_spellings","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CircArgs/mrs_spellings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CircArgs%2Fmrs_spellings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CircArgs%2Fmrs_spellings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CircArgs%2Fmrs_spellings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CircArgs%2Fmrs_spellings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CircArgs","download_url":"https://codeload.github.com/CircArgs/mrs_spellings/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CircArgs%2Fmrs_spellings/sbom","scorecard":{"id":30129,"data":{"date":"2025-08-11","repo":{"name":"github.com/CircArgs/mrs_spellings","commit":"5bf5ffa8f6fe1774809bfbdda62a20d376a68220"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.5,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/push.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/push.yml:14: update your workflow using https://app.stepsecurity.io/secureworkflow/CircArgs/mrs_spellings/push.yml/master?enable=pin","Warn: containerImage not pinned by hash: .github/actions/testing/Dockerfile:1: pin your Docker image by updating python:3.7-buster to python:3.7-buster@sha256:2539f956bcccbac5e4a48ebdafbbbfbd26b4ab56e65b96076ae9cd1188b119b3","Warn: downloadThenRun not pinned by hash: .github/actions/testing/entrypoint.sh:5","Warn: downloadThenRun not pinned by hash: .github/actions/testing/entrypoint.sh:11","Info:   0 out of   1 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 containerImage dependencies pinned","Info:   0 out of   2 downloadThenRun dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"32 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2023-120 / GHSA-45c4-8wx5-qw6w","Warn: Project is vulnerable to: PYSEC-2024-24 / GHSA-5h86-8mv2-jq9f","Warn: Project is vulnerable to: GHSA-5m98-qgg9-wh84","Warn: Project is vulnerable to: GHSA-7gpw-8wmc-pm8g","Warn: Project is vulnerable to: GHSA-8495-4g3g-x7pr","Warn: Project is vulnerable to: PYSEC-2024-26 / GHSA-8qpw-xqxj-h4r2","Warn: Project is vulnerable to: GHSA-9548-qrrj-x5pj","Warn: Project is vulnerable to: PYSEC-2023-246 / GHSA-gfw2-4jvh-wgfg","Warn: Project is vulnerable to: GHSA-pjjw-qhg8-p2p9","Warn: Project is vulnerable to: PYSEC-2023-250 / GHSA-q3qx-c6g2-7pw2","Warn: Project is vulnerable to: PYSEC-2023-251 / GHSA-qvrw-v9rv-5rjx","Warn: Project is vulnerable to: PYSEC-2021-76 / GHSA-v6wp-4m6f-gcjg","Warn: Project is vulnerable to: PYSEC-2023-247 / GHSA-xx9p-xxvh-7g8j","Warn: Project is vulnerable to: PYSEC-2022-42986 / GHSA-43fp-rhv2-5gv8","Warn: Project is vulnerable to: PYSEC-2023-135 / GHSA-xqr8-7jwr-rhp7","Warn: Project is vulnerable to: PYSEC-2024-60 / GHSA-jjg7-2v4v-x38h","Warn: Project is vulnerable to: GHSA-cpwx-vrp4-4pq7","Warn: Project is vulnerable to: PYSEC-2021-66 / GHSA-g3rq-g295-4j3m","Warn: Project is vulnerable to: GHSA-h5c8-rqwp-cp95","Warn: Project is vulnerable to: GHSA-h75v-3vvj-5mfj","Warn: Project is vulnerable to: GHSA-q2x7-8rv6-6q7h","Warn: Project is vulnerable to: PYSEC-2020-92 / GHSA-hj5v-574p-mj7c","Warn: Project is vulnerable to: PYSEC-2022-42969","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2023-74 / GHSA-j8r2-6x86-q33q","Warn: Project is vulnerable to: GHSA-34jh-p97f-mpxf","Warn: Project is vulnerable to: PYSEC-2023-212 / GHSA-g4mx-q9vg-27p4","Warn: Project is vulnerable to: GHSA-pq67-6m6q-mj2v","Warn: Project is vulnerable to: PYSEC-2021-108 / GHSA-q2q7-5pp4-w6pg","Warn: Project is vulnerable to: PYSEC-2023-192 / GHSA-v845-jxx5-vc9f","Warn: Project is vulnerable to: GHSA-jfmj-5v4g-7637"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-14T18:51:57.038Z","repository_id":57443638,"created_at":"2025-08-14T18:51:57.039Z","updated_at":"2025-08-14T18:51:57.039Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279004564,"owners_count":26083734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["augmentation","misspellings","nlp","procedural-generation","qwerty-based-char-distance","tokenization"],"created_at":"2025-06-23T00:09:19.799Z","updated_at":"2025-10-10T15:08:31.087Z","avatar_url":"https://github.com/CircArgs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MrS SpELliNgS\na micro utility to procedurally generate plausible misspellings\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://badge.fury.io/py/mrs-spellings\"\u003e\u003cimg src=\"https://badge.fury.io/py/mrs-spellings.svg\" alt=\"PyPI version\" height=\"18\"\u003e\u003c/a\u003e\n\u003ca href=\"https://codecov.io/gh/CircArgs/mrs_spellings\"\u003e\n  \u003cimg src=\"https://codecov.io/gh/CircArgs/mrs_spellings/branch/master/graph/badge.svg\" /\u003e\n\u003c/a\u003e\n \n\u003cimg alt=\"Build Status\" src=\"https://github.com/CircArgs/mrs_spellings/workflows/test/badge.svg\"\u003e\n\u003cimg alt=\"Code style: black\" src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"\u003e\n\u003cimg alt=\"Language Python\" src=\"https://img.shields.io/badge/language-Python-blue\"\u003e\n\u003c/div\u003e\n\n---\n# [Table of Contents](#table-of-contents)\n- [MrS SpELliNgS](#mrs-spellings)\n- [Install](#install)\n    + [from pypi](#from-pypi)\n    + [from source](#from-source)\n- [Use Cases](#use-cases)\n- [Usage](#usage)\n- [Methods](#methods)\n  * [deletion](#deletion)\n  * [swapping](#swapping)\n  * [qwerty distance (taxi-cab) based swapping](#qwerty-distance-taxi-cab-based-swapping)\n  * [What is QWERTY distance?](#what-is-qwerty-distance)\n---\n# Install\n\n### from pypi\n\n`pip install mrs-spellings`\n\n### from source\n\n`python -m pip install git+https://github.com/CircArgs/mrs_spellings.git`\n\n# Use Cases\n\n- Generate misspellings to replace during the text cleaning process with low overhead\n- Replace words with their potential misspellings as an augmentation during\n  - training to make your model less susceptible to misspellings\n  - during test time as part of TTA\n- Supplement an existing solution for out-of-vocabulary words/ words that do not appear in an existing replacement dictionary\n\n# Usage\n\nThere are 3 primary methods currently supported:\n  * [deletion](#deletion)\n  * [swapping](#swapping)\n  * [qwerty distance (taxi-cab) based swapping](#qwerty-distance-taxi-cab-based-swapping)\n```python\nIn [1]: from mrs_spellings import MrsWord, MrsSpellings                                                                                                                                                            \n#methods return MrsSpellings\nIn [2]: MrsWord(\"hello\").swap()                                                                                                                                                                      \nOut[2]: {'ehllo', 'hello', 'helol', 'hlelo'}\n\nIn [3]: MrsWord(\"hello\").delete(number_deletes=1)                                                                                                                                                    \nOut[3]: {'ello', 'hell', 'helo', 'hllo'}\n\nIn [4]: MrsWord(\"hello\").qwerty_swap(max_distance=1)                                                                                                                                                 \nOut[4]: \n{'gello',\n 'h3llo',\n 'hdllo',\n 'he,lo',\n 'he:lo',\n  ...\n 'jello',\n 'nello',\n 'yello'}\n# simply chain methods\nIn [5]: MrsWord(\"hello\").swap().delete()                                                                                                                                                             \nOut[5]: \n{'ehll',\n 'ehlo',\n 'ello',\n  ...\n 'hllo',\n 'hlol',\n 'lelo'}\n \n# MrsWord is a string\nIn [6]: MrsWord(\"Hello\") + \" \" + MrsWord(\"World\")                                                                                                                                                        \nOut[6]: 'Hello World'\n\nIn [7]: MrsWord(\"Hello {}\").format(\"world\")                                                                                                                                                      \nOut[7]: 'Hello world'\n\n# MrsSpellings work as sets\nIn [8]: MrsWord(\"hello\").swap().union(MrsWord(\"world\").delete())                                                                                                                        \nOut[8]: {'ehllo', 'hello', 'helol', 'hlelo', 'orld', 'wold', 'word', 'worl', 'wrld'}\n\nIn [9]: MrsWord(\"hello\").delete(1)-MrsWord(\"hello\").delete(1)                                                                                                                                        \nOut[9]: set()\n\nIn [10]: \" \".join(MrsWord(\"Hello\").qwerty_swap())                                                                                                                                                     \nOut[10]: 'Helko Hdllo Yello He,lo Helll Hellp Hel,o Nello Heklo Hrllo H3llo Gello Heolo He:lo Helli Hell9 Heloo Hel:o Jello Hwllo'\n```\n\n# Methods\n\n## deletion\n```python\nSignature: MrsWord.delete(number_deletes=1)\nDocstring:\ndelete some number `number_deletes` from this word\n\nArgs:\n    number_deletes (int): number of deletions to perform\n\nReturns:\n    MrsSpellings (set): all possible misspellings that form as a result of `number_deletes` deletions\n```\n\n## swapping\n```python\nSignature: MrsWord.swap()\nDocstring:\nswap some consecutive characters\n\nArgs:\n\nReturns:\n    MrsSpellings (set): all possible misspellings that form as a result of swapping consecutive characters\n```\n\n## qwerty distance (taxi-cab) based swapping\n```python\nSignature: MrsWord.qwerty_swap(max_distance=1)\nDocstring:\n\nswap characters with their qwerty neighbors\n\nArgs:\n    max_distance (int): the max distance (taxi-cab) of keys on the keyboard to swap\n                        e.g. `max_distance=1` then \"g\" could become one of [\"f\", \"h\"]\n                            `max_distance=2` then \"g\" could become one of ['f', 'h', 't', 'y', 'v', 'b']\n                            Note: The number of swaps possible increases with distance however the increase is not always uniform.\n                            For example, the 3rd set of keys from g is ['6', 'd', 'j'] while the second was ['t', 'y', 'v', 'b']\nReturns:\n    MrsSpellings (set): all possible misspellings that form as a result of swapping characters with qwerty neighbors\n\n```\n\n### what is qwerty distance?\n\nQwerty distance is the distance between keys on the typical keyboard. For the purposes of this package, the following assumptions are made:\n\n- each row has half a key offset\n- the l1 distance is a good estimate of the natural travel distance between keys on the keyboard\n- the shift key can add distance by virtue of requiring a hold-down\n\nHere is an example of the results of these assumptions. The closest keys grouped by equal distance (groups in ascending order to furthest distance) to the `g` key are:\n```python\n[['f', 'h'],\n ['t', 'y', 'v', 'b'],\n ['6', 'd', 'j'],\n ['r', 'u', 'c', 'n'],\n ['^', '5', '7', 's', 'k'],\n ['e', 'i', 'x', 'm'],\n ['%', '\u0026', '4', '8', 'a', 'l'],\n ['w', 'o', 'z', '\u003c'],\n ['$', '*', '3', '9', ':'],\n ['q', 'p', ','],\n ['#', '(', '2', '0', ';'],\n ['[', '\u003e'],\n ['@', ')', '1', '-', '\"'],\n [']', '.'],\n ['!', '_', '`', '=', \"'\"],\n ['\\\\', '?'],\n ['~', '+', '{'],\n ['/'],\n ['}'],\n ['|']]\n ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcircargs%2Fmrs_spellings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcircargs%2Fmrs_spellings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcircargs%2Fmrs_spellings/lists"}