{"id":45828382,"url":"https://github.com/WZBSocialScienceCenter/germalemma","last_synced_at":"2026-03-12T07:00:53.911Z","repository":{"id":21125768,"uuid":"91801391","full_name":"WZBSocialScienceCenter/germalemma","owner":"WZBSocialScienceCenter","description":"A lemmatizer for German language text","archived":false,"fork":false,"pushed_at":"2023-02-07T22:58:06.000Z","size":76,"stargazers_count":92,"open_issues_count":8,"forks_count":11,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-11-13T22:11:49.438Z","etag":null,"topics":["german","language-processing","lemmatization","lemmatizer","nlp","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WZBSocialScienceCenter.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-05-19T11:58:49.000Z","updated_at":"2025-09-08T15:06:03.000Z","dependencies_parsed_at":"2023-09-30T21:15:09.930Z","dependency_job_id":null,"html_url":"https://github.com/WZBSocialScienceCenter/germalemma","commit_stats":{"total_commits":32,"total_committers":1,"mean_commits":32.0,"dds":0.0,"last_synced_commit":"667bd2937e1d221d8d9dd2966f67e724f17a6e04"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/WZBSocialScienceCenter/germalemma","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WZBSocialScienceCenter%2Fgermalemma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WZBSocialScienceCenter%2Fgermalemma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WZBSocialScienceCenter%2Fgermalemma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WZBSocialScienceCenter%2Fgermalemma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WZBSocialScienceCenter","download_url":"https://codeload.github.com/WZBSocialScienceCenter/germalemma/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WZBSocialScienceCenter%2Fgermalemma/sbom","scorecard":{"id":149498,"data":{"date":"2025-08-11","repo":{"name":"github.com/WZBSocialScienceCenter/germalemma","commit":"667bd2937e1d221d8d9dd2966f67e724f17a6e04"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":0,"reason":"44 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2020-28 / GHSA-m6xf-fq7q-8743","Warn: Project is vulnerable to: PYSEC-2020-27 / GHSA-q65m-pv3f-wr5r","Warn: Project is vulnerable to: PYSEC-2020-340 / GHSA-vqhp-cxgc-6wmm","Warn: Project is vulnerable to: PYSEC-2021-865 / GHSA-vv2x-vrpj-qqpq","Warn: Project is vulnerable to: PYSEC-2022-42986 / GHSA-43fp-rhv2-5gv8","Warn: Project is vulnerable to: PYSEC-2023-135 / GHSA-xqr8-7jwr-rhp7","Warn: Project is vulnerable to: GHSA-3ww4-gg4f-jr7f","Warn: Project is vulnerable to: GHSA-5cpq-8wj7-hf2v","Warn: Project is vulnerable to: GHSA-9v9h-cgj8-h64p","Warn: Project is vulnerable to: PYSEC-2021-62 / GHSA-hggm-jpg3-v476","Warn: Project is vulnerable to: GHSA-jm77-qphf-c4w8","Warn: Project is vulnerable to: GHSA-v8gr-m533-ghj9","Warn: Project is vulnerable to: GHSA-w7pp-m8wf-vj6r","Warn: Project is vulnerable to: GHSA-x4qr-2fvf-3mr5","Warn: Project is vulnerable to: PYSEC-2024-60 / GHSA-jjg7-2v4v-x38h","Warn: Project is vulnerable to: PYSEC-2021-356 / GHSA-2ww3-fxvq-293j","Warn: Project is vulnerable to: PYSEC-2024-167 / GHSA-cgvx-9447-vcch","Warn: Project is vulnerable to: PYSEC-2021-859 / GHSA-f8m6-h2c7-8h9x","Warn: Project is vulnerable to: PYSEC-2022-5 / GHSA-rqjh-jp2r-59cj","Warn: Project is vulnerable to: PYSEC-2021-856 / GHSA-5545-2q6w-2gh6","Warn: Project is vulnerable to: GHSA-6p56-wp2h-9hxr","Warn: Project is vulnerable to: PYSEC-2021-857 / GHSA-f7c7-j99h-c22f","Warn: Project is vulnerable to: GHSA-fpfv-jqm9-f5jm","Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2020-92 / GHSA-hj5v-574p-mj7c","Warn: Project is vulnerable to: PYSEC-2022-42969","Warn: Project is vulnerable to: PYSEC-2021-140 / GHSA-9w8r-397f-prfh","Warn: Project is vulnerable to: PYSEC-2023-117 / GHSA-mrwq-x4v8-fh7p","Warn: Project is vulnerable to: PYSEC-2021-141 / GHSA-pq64-v7f5-gqh8","Warn: Project is vulnerable to: GHSA-9hjg-9r4m-mvj7","Warn: Project is vulnerable to: GHSA-9wx4-h78v-vm56","Warn: Project is vulnerable to: PYSEC-2023-74 / GHSA-j8r2-6x86-q33q","Warn: Project is vulnerable to: PYSEC-2023-102","Warn: Project is vulnerable to: PYSEC-2023-114","Warn: Project is vulnerable to: GHSA-g7vv-2v7x-gj9p","Warn: Project is vulnerable to: GHSA-34jh-p97f-mpxf","Warn: Project is vulnerable to: PYSEC-2023-212 / GHSA-g4mx-q9vg-27p4","Warn: Project is vulnerable to: PYSEC-2020-149 / GHSA-hmv2-79q8-fv6g","Warn: Project is vulnerable to: GHSA-pq67-6m6q-mj2v","Warn: Project is vulnerable to: PYSEC-2021-108 / GHSA-q2q7-5pp4-w6pg","Warn: Project is vulnerable to: PYSEC-2023-192 / GHSA-v845-jxx5-vc9f","Warn: Project is vulnerable to: PYSEC-2020-148 / GHSA-wqvq-5m8c-6g24","Warn: Project is vulnerable to: PYSEC-2024-187 / GHSA-rqc4-2hc7-8c8v","Warn: Project is vulnerable to: GHSA-jfmj-5v4g-7637"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-16T10:12:43.496Z","repository_id":21125768,"created_at":"2025-08-16T10:12:43.496Z","updated_at":"2025-08-16T10:12:43.496Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30417685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T06:40:58.731Z","status":"ssl_error","status_checked_at":"2026-03-12T06:40:40.296Z","response_time":114,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["german","language-processing","lemmatization","lemmatizer","nlp","python"],"created_at":"2026-02-26T21:56:04.254Z","updated_at":"2026-03-12T07:00:53.905Z","avatar_url":"https://github.com/WZBSocialScienceCenter.png","language":"Python","readme":"# GermaLemma\n\nDecember 2019, Markus Konrad \u003cmarkus.konrad@wzb.eu\u003e / \u003cpost@mkonrad.net\u003e / [Berlin Social Science Center](https://www.wzb.eu/en)\n\n**This project is currently not maintained.**\n\n## A lemmatizer for German language text\n\nGermalemma lemmatizes Part-of-Speech-tagged German language words. To do so, it combines a large lemma dictionary (an excerpt of the [TIGER corpus from the University of Stuttgart](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html)), functions from the CLiPS \"Pattern\" package, and an algorithm to split composita.\n\n## Installation\n\n### Easy option: Installing from PyPI via `pip`\n\nYou can install the package from [PyPI](https://pypi.org/project/germalemma/) via `pip`:\n\n```\npip install -U germalemma\n```\n\n### Alternative option: Downloading and installing from source\n\n**Only do this if you don't install germalemma via pip:**\n\nIn order to use GermaLemma, you will need to install some additional packages (see *Requirements* section below) and then download the [TIGER corpus from the University of Stuttgart](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.html). You will need to use the CONLL09 format, *not* the XML format.\nThe corpus is free to use for non-commercial purposes (see [License Agreement](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/license/htmlicense.html)).\n\nThen, you should convert the corpus into pickle format for faster loading by executing `germalemma/__init__.py` and passing the path to the corpus file in CONLL09 format:\n\n```\npython germalemma/__init__.py tiger_release_[...].conll09\n```\n\nThis will place a `lemmata.pickle` file in the `data` directory which is then automatically loaded.\n\n## Part-of-Speech (POS) Tagging\n\nYou will need to apply [Part-of-Speech (POS) tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging) to your text before you can lemmatize its words. See [this blog post](https://datascience.blog.wzb.eu/2016/07/13/accurate-part-of-speech-tagging-of-german-texts-with-nltk/) on how to do that.\n\n## Usage\n\nYou have set up GermaLemma to use the TIGER corpus (as explained above). You have tokenized your text (e.g. with NLTK). You have POS-tagged your tokens. Now you can use GermaLemma:\n\n```python\nfrom germalemma import GermaLemma\n\nlemmatizer = GermaLemma()\n\n# passing the word and the POS tag (\"N\" for noun)\nlemma = lemmatizer.find_lemma('Feinstaubbelastungen', 'N')\nprint(lemma)\n# -\u003e lemma is \"Feinstaubbelastung\"\n```\n\n## Valid POS tags\n\nYou can pass POS tags from the [STTS tagset](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-table.html), however, only four POS tags can be processed:\n\n* 'N...' (nouns)\n* 'V...' (verbs)\n* 'ADJ...' (adjectives)\n* 'ADV...' (adverbs)\n\n**All other POS tags will result in a `ValueError` so you should wrap the call to `find_lemma` in a *try-except block*.**\n\n## Accuracy\n\nGermaLemma's accuracy was evaluated using a sample of 696 POS tagged and manually lemmatized words from a sample of paragraphs from proceedings of the European Parliament, Goethe's \"Werther\", Kafka's \"Verwandlung\" and a news article from the website of the WZB (see samples in folder \"eval_texts\").\n\n**Under the assumption that the POS tag is correct** (only those words were selected), GermaLemma finds the correct lemma in 99.43% of the cases. For comparison, *Pattern* achieved 95.11% for the same sample.\n\n## Requirements\n\n* Python 3.6 or newer\n* required package [*Pyphen*](http://pyphen.org/)\n* optional package [*PatternLite*](https://github.com/WZBSocialScienceCenter/patternlite) (This package is optional but highly recommended as it boosts the lemmatizer's accuracy.)\n\n## License\n\nApache License 2.0. See *LICENSE* file.\n\nThe TIGER corpus is **not** part of this repository and has to be downloaded separately under separate license conditions.\n","funding_links":[],"categories":["Werkzeuge"],"sub_categories":["Textverarbeitung"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWZBSocialScienceCenter%2Fgermalemma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FWZBSocialScienceCenter%2Fgermalemma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FWZBSocialScienceCenter%2Fgermalemma/lists"}