{"id":22290096,"url":"https://github.com/dvsekhvalnov/mystem-morphtagger","last_synced_at":"2025-10-13T12:38:16.407Z","repository":{"id":148882844,"uuid":"303754302","full_name":"dvsekhvalnov/mystem-morphtagger","owner":"dvsekhvalnov","description":"Russian morphology tagger plugin for GATE based on Yandex's mystem. ","archived":false,"fork":false,"pushed_at":"2020-10-14T19:45:10.000Z","size":2300,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-25T21:29:26.528Z","etag":null,"topics":["gate","morphological-analysis","nlp"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dvsekhvalnov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-13T15:53:22.000Z","updated_at":"2021-07-01T17:31:32.000Z","dependencies_parsed_at":"2023-07-15T22:17:08.840Z","dependency_job_id":null,"html_url":"https://github.com/dvsekhvalnov/mystem-morphtagger","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/dvsekhvalnov/mystem-morphtagger","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvsekhvalnov%2Fmystem-morphtagger","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvsekhvalnov%2Fmystem-morphtagger/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvsekhvalnov%2Fmystem-morphtagger/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvsekhvalnov%2Fmystem-morphtagger/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dvsekhvalnov","download_url":"https://codeload.github.com/dvsekhvalnov/mystem-morphtagger/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dvsekhvalnov%2Fmystem-morphtagger/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279015058,"owners_count":26085643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gate","morphological-analysis","nlp"],"created_at":"2024-12-03T17:11:29.176Z","updated_at":"2025-10-13T12:38:16.391Z","avatar_url":"https://github.com/dvsekhvalnov.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mystem-morphtagger\nImplements GATE (https://gate.ac.uk/) plugin to annotate words with russian morphology, based on Mystem parser output.\nHomonyms support included.\n\n## Status\nThe project was developed in early 2006, during my PhD days to support other research in NLP area.\nIt using quite ancient java setup. Haven't been maintained since 2012.\nOriginally published at https://code.google.com/archive/p/mystem-morphtagger/  Ported to github in 2020.\n\nIt it pretty stable and works just fine on 10-50Gb Corpora (have never tested beyond).\n\nLatest release supports Mystem v2 and GATE v7 on Windows, Linux and OSX.\n\n## Prerequisites\nYou will need to download awesome Yandex's MyStem parser on your own. Look for version 2 here: https://yandex.ru/dev/mystem/\nAnd don't forget about legal disclaimer that may differ from current project license.\n\n## How it works\nIt runs Mystem on text input and then run FSM over GATE tokenizer to match mystem output. Produces morphological annotations for each word in a text.\nSee some screenshots:\n\n\u003cimg src=\"https://github.com/dvsekhvalnov/mystem-morphtagger/blob/main/images/settings.png?raw=true\" width=\"600\" alt=\"Structured json views\" /\u003e\n\u003cimg src=\"https://github.com/dvsekhvalnov/mystem-morphtagger/blob/main/images/annotations.png?raw=true\" width=\"600\" alt=\"Structured json views\" /\u003e\n\u003cimg src=\"https://github.com/dvsekhvalnov/mystem-morphtagger/blob/main/images/homonyms.png?raw=true\" width=\"600\" alt=\"Structured json views\" /\u003e\n\n## How to install into standalone GATE system\n1. Download `mystem` executable for your platform\n2. Place executable under:\n    * **Windows** : native/win32/mystem.exe\n    * **Linux** : native/linux/mystem\n    * **OSX** : native/osx/mystem\n    * **FreeBSD** : native/freebsd/mystem\n3. Unzip distribution, it contains 2 files `ru-morph-tagger.jar` and `creole.xml`\n4. For GATE Developer IDE:\n    * Manage CREOLE Plugins -\u003e add custom repository\n    * select a directory distribution have been unpacked to (where `creole.xml` is located)\n    * select checkbox to load morph-tagger plugin\n    * Right-click on Processing Resources -\u003e New -\u003e Russian MorphTagger\n    * There are 2 configuration options for plugin:\n        * `encoding` (default `utf-8`) input documents encoding, will be passed to mystem\n        * `nativeFolder` (default current working directory) full path to native folder with **end slash**. Example: `c:\\soft\\GATE\\` (if you have `c:\\soft\\GATE\\native\\win32\\mystem.exe`)\n5. Add `ANNIE Tokeniser` to the pipeline before `MorphTagger`. `MorphTagger` annotates text based on `Token {kind=word}` annotations.\n6. Add any processing resources after `MorphTagger` that relies on morphological information.\n\n## How to build\nYou will need Ant build tool, that you can download here: https://ant.apache.org/\n\n1. `cd ./plugin`\n2. `ant clean make.build`\n\n### Build against different GATE version\nReplace `./plugin/lib/gate-commons.jar` and `./plugin/lib/gate.jar` with desired version. (Shipped with version 7).\n\n### Running unit tests \u0026 embedding\n1. Create a project from plugin source using your favorite IDE\n2. Add jar dependencies from `./plugin/lib/`\n3. Add jar (all) dependencies from `$GATE_HOME/lib`\n4. Adjust file pathes inside `gatehome/gate-user.xml`\n\n## Morphological annotation details\n**MorphTagger** produces `Morph` annotation for every word with different features. See below for details.\n\n1. `baseForm` - base form of the word or lemma.\n2. `pos` - part of speech\n    * adjective\n    * adverb\n    * interjection\n    * numeral\n    * substantive\n    * verb\n    * preposition\n    * particle\n    * conjunction\n    * s-pronoun\n    * adv-pronoun\n    * a-pronoun\n    * a-numeral\n    composite - when word is part of composite (see MyStem documentation)\n3. `case` - noun case\n    * nominative\n    * genitive\n    * dative\n    * accusative\n    * instrumental\n    * ablative\n    * partitive\n    * locative\n    * vocative\n4. `multiplicity`\n    * singular\n    * plural\n5. `gender`\n    * feminine\n    * masculine\n    * neuter\n6. `animation`\n    * inanimate\n    * animated\n7. `degree`\n    * superlative\n    * comparative\n8. form - adjective form\n    * brief\n    * full\n    * possessive\n9. `predicate-noun`\n10. `parenthetical`\n11. `aspect` - verb aspect\n    * imperfect\n    * perfect\n12. `person`\n    * person1\n    * person2\n    * person3\n13. `voice` - verb voice\n    * voice\n    * active\n14. `tense` - verb tense\n    * present\n    * nopast\n    * past\n15. `mood` - verb mood\n    * indicative\n    * imperative\n16. `representation` - verb representation\n    * participle\n    * gerund\n    * infinitive\n17. `transitivity` - verb transitivity\n    * transitive\n    * nontransitive\n18. `surname` - word is a surname\n19. `first-name` - word is a name\n20. `last-name` - word is a last name\n21. `geo` - word is the name of geographic location\n22. `impede`\n23. `distorted` - word is in distorted form\n24. `common-gender` - word has common masculine and feminine gender\n25. `colloquial`\n26. `rare`\n27. `abbreviation` - word is an abbreviation\n28. `archaic`\n29. `obscene`\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvsekhvalnov%2Fmystem-morphtagger","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdvsekhvalnov%2Fmystem-morphtagger","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdvsekhvalnov%2Fmystem-morphtagger/lists"}