{"id":22967290,"url":"https://github.com/fostroll/mordl","last_synced_at":"2025-09-30T08:31:42.228Z","repository":{"id":57443166,"uuid":"276356372","full_name":"fostroll/mordl","owner":"fostroll","description":"Morphological parser (POS, lemmata, NER etc.) ","archived":false,"fork":false,"pushed_at":"2021-12-09T11:04:59.000Z","size":3934,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-01T10:54:08.186Z","etag":null,"topics":["artificial-intelligence","deep-learning","machine-learning","named-entity-recognition","natural-language-processing","nlp","python","pytorch","universal-dependencies"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fostroll.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-01T11:12:48.000Z","updated_at":"2024-04-18T14:09:34.000Z","dependencies_parsed_at":"2022-09-26T17:21:29.013Z","dependency_job_id":null,"html_url":"https://github.com/fostroll/mordl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fostroll/mordl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fmordl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fmordl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fmordl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fmordl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fostroll","download_url":"https://codeload.github.com/fostroll/mordl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fostroll%2Fmordl/sbom","scorecard":{"id":407767,"data":{"date":"2025-08-11","repo":{"name":"github.com/fostroll/mordl","commit":"992d724f43709483901dd55d1f9aa80791dbccb2"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: BSD 3-Clause \"New\" or \"Revised\" License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-18T21:49:41.559Z","repository_id":57443166,"created_at":"2025-08-18T21:49:41.559Z","updated_at":"2025-08-18T21:49:41.559Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277652904,"owners_count":25854381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-30T02:00:09.208Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","machine-learning","named-entity-recognition","natural-language-processing","nlp","python","pytorch","universal-dependencies"],"created_at":"2024-12-14T21:12:32.086Z","updated_at":"2025-09-30T08:31:41.917Z","avatar_url":"https://github.com/fostroll.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch2 align=\"center\"\u003eMorDL: Morphological Tagger (POS, lemmata, NER etc.)\u003c/h2\u003e\n\u003ca name=\"start\"\u003e\u003c/a\u003e\n\n[![PyPI Version](https://img.shields.io/pypi/v/mordl?color=blue)](https://pypi.org/project/mordl/)\n[![Python Version](https://img.shields.io/pypi/pyversions/mordl?color=blue)](https://www.python.org/)\n[![License: BSD-3](https://img.shields.io/badge/License-BSD-brightgreen.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\n***MorDL*** is a tool to organize the pipeline for complete morphological\nsentence parsing (POS-tagging, lemmatization, morphological feature tagging)\nand Named-entity recognition.\n\nScores (accuracy) on *SynTagRus* test dataset: UPOS: `99.35%`; FEATS: `98.87%`\n(tokens), `99.31%` (tags); LEMMA: `99.50%`. In all experiments, we used\n`seed=42`. Some other `seed` values may help to achive better results. Models'\nhyperparameters are also allowed to tune.\n\nThe validation with the\n[official evaluation script](http://universaldependencies.org/conll18/conll18_ud_eval.py)\nof\n[CoNLL 2018 Shared Task](https://universaldependencies.org/conll18/results.html):\n* For the inference on the *SynTagRus* test corpus, when predicted fields were\nemptied and all other fields were stayed intact, the scores are the same as\noutlined above.\n* The inference of UPOS - FEATS - LEMMA taggers applied serially resulted with\nscores: UPOS: `99.35%`; UFeats: `98.36%`; AllTags: `98.21`; Lemmas: `98.88%`.\n\nFor completeness, we included that script in our distribution, so you can use\nit for your model evaluation, too. To simplify it, we also made a wrapper \n[`mordl.conll18_ud_eval`](https://github.com/fostroll/mordl/blob/master/doc/README_SUPPLEMENTS.md#conll18)\nfor it.\n\n## Installation\n\n### pip\n\n***MorDL*** supports *Python 3.6* and *Transformers 4.3.3* or later. To\ninstall via *pip*, run:\n```sh\n$ pip install mordl\n```\n\nIf you currently have a previous version of ***MorDL*** installed, run:\n```sh\n$ pip install mordl -U\n```\n\n### From Source\n\nAlternatively, you can install ***MorDL*** from the source of this *git\nrepository*:\n```sh\n$ git clone https://github.com/fostroll/mordl.git\n$ cd mordl\n$ pip install -e .\n```\nThis gives you access to examples that are not included in the *PyPI* package.\n\n## Usage\n\nOur taggers use separate models, so they can be used independently. But to\nachieve best results FEATS tagger uses UPOS tags during training. And LEMMA\nand NER taggers use both UPOS and FEATS tags. Thus, for a fully untagged\ncorpus, the tagging pipeline is serially applying the taggers, like shown\nbelow (assuming that our goal is NER and we already have trained taggers of\nall types):\n\n```python\nfrom mordl import UposTagger, FeatsTagger, NeTagger\n\ntagger_u, tagger_f, tagger_n = UposTagger(), FeatsTagger(), NeTagger()\ntagger_u.load('upos_model')\ntagger_f.load('feats_model')\ntagger_n.load('misc-ne_model')\n\ntagger_n.predict(\n    tagger_f.predict(\n        tagger_u.predict('untagged.conllu')\n    ), save_to='result.conllu'\n)\n```\n\nAny tagger in our pipeline may be replaced with a better one if you have it.\nThe weakness of separate taggers is that they take more space. If all models\nwere created with BERT embeddings, and you load them in memory simultaneously,\nthey may eat up to 9Gb on GPU. If it does not fit to your GPU, during loading,\nyou can use params **device** and **dataset_device** to distribute your models\non various GPUs. Alternatively, if you need just to tag some corpus once, you\nmay load models serially:\n\n```python\ntagger = UposTagger()\ntagger.load('upos_model')\ntagger.predict('untagged.conllu', save_to='result_upos.conllu')\ndel tagger  # just for sure\ntagger = FeatsTagger()\ntagger.load('feats_model')\ntagger.predict('result_upos.conllu', save_to='result_feats.conllu')\ndel tagger\ntagger = NeTagger()\ntagger_n.load('misc-ne_model')\ntagger.predict('result_feats.conllu', save_to='result.conllu')\ndel tagger\n```\n\nDon't use identical names for input and output file names when you call the\n`.predict()` methods. Normally, there will be no problem, because the methods\nby default load all the input file in memory before tagging. But if the input\nfile is large, you may want to use the **split** parameter for the methods\nhandle the file by parts. In that case, saving of the first part of the\ntagging data occurs before loading next. So, identical names will entail data\nloss.\n\nThe training process is also simple. If you have training corpora and you\ndon't want any experiments, just run:\n\n```python\nfrom mordl import UposTagger\n\ntagger = UposTagger()\ntagger.load_train_corpus(train_corpus)\ntagger.load_test_corpus(dev_corpus)\n\nstat = tagger.train('upos_model', device='cuda:0',\n                    stage3_params={'save_as': 'upos_bert_model'})\n```\n\nIt is a training pipeline for the UPOS tagger; pipelines for other taggers are\nidentical.\n\nFor a more complete understanding of ***MorDL*** toolkit usage, refer to the\nPython notebook with the pipeline example in the `examples` directory of the\n***MorDL*** GitHub repository. Also, the detailed descriptions are available\nin the docs:\n\n[***MorDL*** Basics](https://github.com/fostroll/mordl/blob/master/doc/README_BASICS.md#start)\n\n[Part of Speech Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_POS.md#start)\n\n[Single Feature Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_FEAT.md#start)\n\n[Multiple Feature Tagging](https://github.com/fostroll/mordl/blob/master/doc/README_FEATS.md#start)\n\n[Lemmata Prediction](https://github.com/fostroll/mordl/blob/master/doc/README_LEMMA.md#start)\n\n[Named-entity Recognition](https://github.com/fostroll/mordl/blob/master/doc/README_NER.md#start)\n\n[Supplements](https://github.com/fostroll/mordl/blob/master/doc/README_SUPPLEMENTS.md#start)\n\nAlso, you can find training pipelines for different taggers in our\n[example notebook](https://github.com/fostroll/mordl/blob/master/examples/mordl.ipynb).\n\nThis project was developed with the focus on Russian language, but a few\nnuances we use for it are unlikely to worsen the quality of processing other\nlanguages.\n\n***MorDL's*** supports\n[*CoNLL-U*](https://universaldependencies.org/format.html) (if input/output is\na file), or\n[*Parsed CoNLL-U*](https://github.com/fostroll/corpuscula/blob/master/doc/README_PARSED_CONLLU.md)\n(if input/output is an object). Also, ***MorDL's*** allows\n[***Corpuscula***'s corpora wrappers](https://github.com/fostroll/corpuscula/blob/master/doc/README_CORPORA.md)\nas input.\n\n## License\n\n***MorDL*** is released under the BSD License. See the\n[LICENSE](https://github.com/fostroll/mordl/blob/master/LICENSE) file for more\ndetails.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffostroll%2Fmordl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffostroll%2Fmordl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffostroll%2Fmordl/lists"}