{"id":21116788,"url":"https://github.com/molybdenum-99/mormor","last_synced_at":"2025-07-08T19:33:30.721Z","repository":{"id":56884520,"uuid":"193127621","full_name":"molybdenum-99/mormor","owner":"molybdenum-99","description":"Morfologik dictionaries client in pure Ruby: POS tagging \u0026 spellcheck","archived":false,"fork":false,"pushed_at":"2023-01-21T13:22:34.000Z","size":19400,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-04-25T04:02:44.500Z","etag":null,"topics":["morphology","part-of-speech-tagger","ruby"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/molybdenum-99.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-21T16:13:41.000Z","updated_at":"2022-09-18T13:57:23.000Z","dependencies_parsed_at":"2023-02-12T10:16:08.582Z","dependency_job_id":null,"html_url":"https://github.com/molybdenum-99/mormor","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/molybdenum-99%2Fmormor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/molybdenum-99%2Fmormor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/molybdenum-99%2Fmormor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/molybdenum-99%2Fmormor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/molybdenum-99","download_url":"https://codeload.github.com/molybdenum-99/mormor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225458215,"owners_count":17477410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["morphology","part-of-speech-tagger","ruby"],"created_at":"2024-11-20T02:34:30.570Z","updated_at":"2024-11-20T02:34:31.275Z","avatar_url":"https://github.com/molybdenum-99.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MorMor\n\n[![Gem Version](https://badge.fury.io/rb/mormor.svg)](http://badge.fury.io/rb/mormor)\n\n**MorMor** is pure Ruby [morfologik](https://github.com/morfologik/morfologik-stemming) dictionary client that could be used for POS (part of speech) tagging and simplistic spellchecking. _Morfologik_ format's distinguishing feature is it is primary dictionary format for [LanguageTool](https://github.com/languagetool-org/languagetool), therefore a lot of ready high-quality dictionaries exist.\n\n## Features/Problems\n\n* **No dependencies¹, pure Ruby**\n* **Fast**: I don't have any detailed numbers, but naive test on my laptop shows 3 mln lookups/second on a very large dictionary (Polish, several million word forms).\n* Relatively **memory-efficient**: Typical dictionary file size is 1-3 Mb, mormor just loads it into memory as bytes (e.g. each byte =\u003e Ruby Integer) and that's all memory it needs.\n* **Dictionaries** for a lot of languages already exist: unlike your typical POS tagger, usage instructions does not start with \"First, take your corpora and train the tagger as you please\" (see \"Dictionaries\" section).\n* To the moment, it is just a **naive** port of original Morfologik Java code, but it works with all the dictionaries I could find:\n  * Of possible dictionary formats, only FSA5 and CFSA2 are implemented (not CFSA);\n  * Of possible dictionary \"encoders\", only \"SUFFIX\" and \"PREFIX\" are implemented;\n* No tests/specs, but it works (and checked thoroughly with existing dictionaries); TBH, original Morfologik doesn't have much, either;\n* Morfologik's spellchecker suggestions/candidates are **not** ported, so mormor can be used only for \"sanity\" spellchecking (\"this word is/is not in the dictionary\")\n\n\u003csmall\u003e¹The only runtime dependency is [backports](https://github.com/marcandre/backports) and that's only because I am too fond of modern Ruby features to sacrifice them to \"no-dependencies\" god.\u003c/small\u003e\n\n## Usage\n\n0. Install `mormor` gem (via bundler or just `[sudo] gem install mormor`)\n1. Take a dictionary for your language (see \"Dictionaries\" section below)\n2. Now...\n\n```ruby\nrequire 'mormor'\n\ndictionary = MorMor::Dictionary.new('path/to/english')\ndictionary.lookup('meowing')\n# =\u003e [#\u003cstruct MorMor::Dictionary::Word stem=\"meow\", tags=\"VBG\"\u003e]\ndictionary.lookup('barks')\n# =\u003e [#\u003cstruct MorMor::Dictionary::Word stem=\"bark\", tags=\"NNS\"\u003e,\n#     #\u003cstruct MorMor::Dictionary::Word stem=\"bark\", tags=\"VBZ\"\u003e]\ndictionary.lookup('borogoves')\n# = nil\n\ndictionary = MorMor::Dictionary.new('path/to/ukrainian')\ndictionary.lookup(\"солов'їна\")\n# =\u003e [#\u003cstruct MorMor::Dictionary::Word stem=\"солов'їний\", tags=\"adj:f:v_kly\"\u003e,\n#     #\u003cstruct MorMor::Dictionary::Word stem=\"солов'їний\", tags=\"adj:f:v_naz\"\u003e]\n```\n\n`Dictionary#lookup` returns an array of structs which describe all possible base forms + part of speech /word form tags. (For example, \"barks\" could be a third person form of the verb \"to bark\", or plural form of noun \"bark\".)\n\nTags are dependent on the particular dictionary used and typically documented in a free form alongside the dictionaries.\n\n## Dictionaries\n\nA lot of dictionaries in Morfologik format could be found at [LanguageTool's repo](https://github.com/languagetool-org/languagetool). For example, for Polish language, [dictionary is at](https://github.com/languagetool-org/languagetool/tree/master/languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl) `languagetool-language-modules/pl/src/main/resources/org/languagetool/resource/pl/`.\n\nWhat you need there, are:\n* `polish.dict` is a dictionary (binary finite-state-automata) itself\n* `polish.info` is dictionary metadata\n\nIn order to use Polish dictionary with mormor, you need to place both files at the same folder, and then\n```ruby\npl = MorMor::Dictionary.new('path/to/that/folder/polish') # without extension\npl.lookup('świetnie')\n```\n\nYou may also be interested in `tagset.txt` file of the same folder, which has an explanation for all POS/forms tags in natural language (Polish language, for that case).\n\nSometimes (for example, in case of German and Ukrainian), LanguageTool repo contains not the dictionary itself, but a link to other repo/site where it can be downloaded.\n\nPlease **carefully consider** dictionary licenses when using them!\n\n\u003e **Note:** mormor repo contains copies of dictionary files from LanguageTool and referred projects, but they are **not** a part of the gem distribution and only used for testing the parser/lookup correctness, and demonstration purposes.\n\n## License and credits\n\nMost of the credit for algorithms and original code belong to original [Morfologik's](https://github.com/morfologik/morfologik-stemming) authors, and author of paper's they based their work on.\n\nRuby version is done by [Victor Shepelev](https://zverok.github.io).\n\nThe license is BSD, the same as the original Morfologik.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmolybdenum-99%2Fmormor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmolybdenum-99%2Fmormor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmolybdenum-99%2Fmormor/lists"}