{"id":37625580,"url":"https://github.com/timarkh/uniparser-grammar-udm","last_synced_at":"2026-01-16T10:47:43.524Z","repository":{"id":17512228,"uuid":"20300166","full_name":"timarkh/uniparser-grammar-udm","owner":"timarkh","description":"Morphological analysis for Udmurt.","archived":false,"fork":false,"pushed_at":"2025-11-05T17:44:16.000Z","size":108499,"stargazers_count":12,"open_issues_count":5,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-11-05T18:10:26.929Z","etag":null,"topics":["analyzer","dictionary","morphology","nlp","udmurt","uralic-languages","wordlist"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"firebase/Firebase-Unity","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timarkh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2014-05-29T16:22:54.000Z","updated_at":"2025-11-05T17:44:22.000Z","dependencies_parsed_at":"2025-05-13T20:50:39.957Z","dependency_job_id":null,"html_url":"https://github.com/timarkh/uniparser-grammar-udm","commit_stats":{"total_commits":154,"total_committers":4,"mean_commits":38.5,"dds":"0.038961038961038974","last_synced_commit":"3d82c71cb58d3ad5cae8e35a09d9a1d650048337"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/timarkh/uniparser-grammar-udm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timarkh%2Funiparser-grammar-udm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timarkh%2Funiparser-grammar-udm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timarkh%2Funiparser-grammar-udm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timarkh%2Funiparser-grammar-udm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timarkh","download_url":"https://codeload.github.com/timarkh/uniparser-grammar-udm/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timarkh%2Funiparser-grammar-udm/sbom","scorecard":{"id":885548,"data":{"date":"2025-08-11","repo":{"name":"github.com/timarkh/uniparser-grammar-udm","commit":"af670a15a7a38278608c4d414ba454beba86ca3a"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.3,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Maintained","score":2,"reason":"3 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}}]},"last_synced_at":"2025-08-24T09:53:29.373Z","repository_id":17512228,"created_at":"2025-08-24T09:53:29.373Z","updated_at":"2025-08-24T09:53:29.373Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478059,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analyzer","dictionary","morphology","nlp","udmurt","uralic-languages","wordlist"],"created_at":"2026-01-16T10:47:42.864Z","updated_at":"2026-01-16T10:47:43.512Z","avatar_url":"https://github.com/timarkh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Udmurt morphological analyzer\nThis is a rule-based morphological analyzer for Udmurt (``udm``; Uralic \u003e Permic). It is based on a formalized description of literary Udmurt morphology, which also includes a number of dialectal elements, and uses [uniparser-morph](https://github.com/timarkh/uniparser-morph) for parsing. It performs full morphological analysis of Udmurt words (lemmatization, POS tagging, grammatical tagging, glossing).\n\n## How to use\n### Python package\nThe analyzer is available as a Python package. If you want to analyze Udmurt texts in Python, install the module:\n\n```\npip3 install uniparser-udmurt\n```\n\nImport the module and create an instance of ``UdmurtAnalyzer`` class. Set ``mode='strict'`` if you are going to process text in the standard orthography (default value). Set ``mode='nodiacritics'`` if you expect some words to lack the diacritics (which often happens in social media), e.g. ``сыче`` instead of the correct ``сыӵе``. Set ``mode='oldorth'`` if you are processing texts written in one of the older, pre-standardized orthographies (earlier than late 1930s). Right now, apostrophes in place of ``ъ`` and some features of the pre-revolution orthography are accounted for, but not all of them.\n\nAfter that, you can either parse tokens or lists of tokens with ``analyze_words()``, or parse a frequency list with ``analyze_wordlist()``. Here is a simple example:\n\n```python\nfrom uniparser_udmurt import UdmurtAnalyzer\na = UdmurtAnalyzer(mode='strict')\n\nanalyses = a.analyze_words('Морфологиез')\n# The parser is initialized before first use, so expect\n# some delay here (usually several seconds)\n\n# You will get a list of Wordform objects\n# The analysis attributes are stored in its properties\n# as string values, e.g.:\nfor ana in analyses:\n        print(ana.wf, ana.lemma, ana.gramm, ana.gloss)\n\n# You can also pass lists (even nested lists) and specify\n# output format ('xml' or 'json')\n# If you pass a list, you will get a list of analyses\n# with the same structure\nanalyses = a.analyze_words([['А'], ['Мон', 'тонэ', 'яратӥсько', '.']],\n\t                       format='xml')\nanalyses = a.analyze_words(['Морфологиез', [['А'], ['Мон', 'тонэ', 'яратӥсько', '.']]],\n\t                       format='json')\n```\n\nRefer to the [uniparser-morph documentation](https://uniparser-morph.readthedocs.io/en/latest/) for the full list of options.\n\n### Disambiguation\nApart from the analyzer, this repository contains a set of [Constraint Grammar](https://visl.sdu.dk/constraint_grammar.html) rules that can be used for partial disambiguation of analyzed Udmurt texts. They reduce the average number of different analyses per analyzed token from about 1.6 to about 1.3. If you want to use them, set ``disambiguation=True`` when calling ``analyze_words``:\n\n```python\nanalyses = a.analyze_words(['Мон', 'тонэ', 'яратӥсько'], disambiguate=True)\n```\n\nIn order for this to work, you have to install the ``cg3`` executable separately. On Ubuntu/Debian, you can use ``apt-get``:\n\n```\nsudo apt-get install cg3\n```\n\nOn Windows, download the binary and add the path to the ``PATH`` environment variable. See [the documentation](https://visl.sdu.dk/cg3/single/#installation) for other options.\n\nNote that each time you call ``analyze_words()`` with ``disambiguate=True``, the CG grammar is loaded and compiled from scratch, which makes the analysis even slower. If you are analyzing a large text, it would make sense to pass the entire text contents in a single function call rather than do it sentence-by-sentence, for optimal performance.\n\n### Word lists\nAlternatively, you can use a preprocessed word list. The ``wordlists`` directory contains a list of words from a 10-million-word [Udmurt corpus](http://udmurt.web-corpora.net/) (``wordlist.csv``), list of analyzed tokens (``wordlist_analyzed.txt``; each line contains all possible analyses for one word in an XML format), and list of tokens the parser could not analyze (``wordlist_unanalyzed.txt``). The recall of the analyzer on the corpus texts is about 96% and the corpus is sufficiently large, so if you just use the analyzed word list, the recall on your texts will almost definitely exceed 90%.\n\n## Description format\nThe description is carried out in the ``uniparser-morph`` format and involves a description of the inflection (paradigms.txt), a grammatical dictionary (udm_lexemes_XXX.txt files), a list of rules that annotate combinations of lexemes and grammatical values with additional Russian translations (lex_rules.txt), and a short list of analyses that should be avoided (bad_analyses.txt). The dictionary contains descriptions of individual lexemes, each of which is accompanied by information about its stem, its part-of-speech tag and some other grammatical/borrowing information, its inflectional type (paradigm), and Russian translation. See more about the format [in the uniparser-morph documentation](https://uniparser-morph.readthedocs.io/en/latest/format.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimarkh%2Funiparser-grammar-udm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimarkh%2Funiparser-grammar-udm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimarkh%2Funiparser-grammar-udm/lists"}