{"id":13410909,"url":"https://github.com/adbar/German-NLP","last_synced_at":"2025-03-14T16:33:17.537Z","repository":{"id":40461797,"uuid":"137056286","full_name":"adbar/German-NLP","owner":"adbar","description":"Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German","archived":false,"fork":false,"pushed_at":"2024-08-28T12:31:54.000Z","size":147,"stargazers_count":440,"open_issues_count":0,"forks_count":63,"subscribers_count":45,"default_branch":"master","last_synced_at":"2024-08-28T13:57:10.626Z","etag":null,"topics":["computational-linguistics","corpus-linguistics","german-language","natural-language-processing","nlp","text-mining"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adbar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-12T10:31:03.000Z","updated_at":"2024-08-28T12:31:57.000Z","dependencies_parsed_at":"2024-06-28T17:12:05.950Z","dependency_job_id":null,"html_url":"https://github.com/adbar/German-NLP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adbar%2FGerman-NLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adbar%2FGerman-NLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adbar%2FGerman-NLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adbar%2FGerman-NLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adbar","download_url":"https://codeload.github.com/adbar/German-NLP/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243610557,"owners_count":20318989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computational-linguistics","corpus-linguistics","german-language","natural-language-processing","nlp","text-mining"],"created_at":"2024-07-30T20:01:10.158Z","updated_at":"2025-03-14T16:33:17.498Z","avatar_url":"https://github.com/adbar.png","language":null,"funding_links":[],"categories":["Others","自然語言處理-德文","Related Resources","Werkzeuge"],"sub_categories":["函式庫","Learning Strategies","Textverarbeitung"],"readme":"# German-NLP\n\nCurated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)\n\nResources and tools which can be used either off-the-shelf or with minor adjustments and which are currently maintained are primarily chosen for this list. It is deliberately biased in terms of usability and user-friendliness.\n\nCommunity support is needed to keep this list up-to-date, pull requests and suggestions are welcome! See [contributing guidelines](contributing.md).\n\n\n\n## Table of Contents\n\n- [Text corpora](#Text-corpora)\n   - [General-purpose](#General-purpose)\n   - [Historical](#Historical)\n   - [Specialized](#Specialized)\n   - [Word lists](#Word-lists)\n   - [Data acquisition](#Data-acquisition)\n   - [Lists of corpora](#Lists-of-corpora)\n- [Generic resources](#Generic-resources)\n   - [Frameworks](#Frameworks)\n   - [Treebanks](#Treebanks)\n   - [Deep learning models and transformers](#Deep-learning-models-and-transformers)\n   - [Annotation](#Annotation)\n   - [Standards](#Standards)\n- [Linguistic processing](#Linguistic-processing)\n   - [Preprocessing](#Preprocessing)\n   - [Tokenization / Sentence boundary detection](#Tokenization--sentence-boundary-detection)\n   - [Stemming](#Stemming)\n   - [Lemmatization](#Lemmatization)\n   - [Morphological analysis](#Morphological-analysis)\n   - [Normalization](#Normalization)\n   - [Phonology](#Phonology)\n   - [POS-tagging](#POS-tagging)\n   - [Syntactical parsing](#Syntactical-parsing)\n   - [Named Entity Recognition](#Named-Entity-Recognition)\n   - [Industry/Applications](#industryapplications)\n   - [Evaluation](#Evaluation)\n- [Semantic analysis](#Semantic-analysis)\n   - [Datasets](#Datasets)\n   - [Word embeddings and senses](#Word-embeddings-and-senses)\n   - [Sentiment analysis datasets / polarity clues](#sentiment-analysis-datasets--polarity-clues)\n   - [Sentiment detection](#Sentiment-detection)\n   - [GermEval](#GermEval)\n   - [Coreference resolution](#Coreference-resolution)\n   - [Summarization and Simplification](#Summarization-and-simplification)\n   - [Psycholinguistics](#Psycholinguistics)\n- [Speech NLP](#Speech-NLP)\n- [Machine Translation](#Machine-Translation)\n- [Large Language Models](#Large-language-models)\n- [Teaching resources and tutorials](#Teaching-resources-and-tutorials)\n- [More lists](#More-lists)\n   - [German](#German)\n   - [General](#General)\n   - [Comparable lists](#Comparable-lists)\n   - [Larger institutional GitHub groups](#Larger-institutional-GitHub-groups)\n\n\n## Text corpora\n\n### General-purpose\n\n* [Araneum Germanicum](http://aranea.juls.savba.sk/aranea_about/_germanicum.html)\n* [CEHugeWebCorpus](https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638)\n* [COW](http://corporafromtheweb.org/category/corpora/german/)\n* [Digitales Wörterbuch der deutschen Sprache (DWDS)](https://dwds.de)\n* [GC4 Corpus](https://german-nlp-group.github.io/projects/gc4-corpus.html) (CommonCrawl)\n* [IDS Corpora](http://www1.ids-mannheim.de/kl/projekte/korpora)\n* [Leipzig Corpora Collection](http://wortschatz.uni-leipzig.de/en/download/)\n* [SdeWaC](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html)\n\n\n### Historical\n\n* [Anselm (14th-16th centuries)](https://www.linguistics.ruhr-uni-bochum.de/anselm/access/index.en.html)\n* [Austrian Newspapers (19th C. NewsEye / READ OCR training dataset)](https://github.com/UB-Mannheim/AustrianNewspapers)\n* [Deutsches Textarchiv](https://deutschestextarchiv.de/)\n* [Elektronische Texte (Thomas Gloning)](http://www.staff.uni-giessen.de/gloning/etexte.htm)\n* [GerManC (1650-1800)](https://ota.bodleian.ox.ac.uk/repository/xmlui/handle/20.500.12024/2544)\n* [German Drama Corpus (GerDraCor)](https://github.com/dracor-org/gerdracor)\n* [German Novels](https://github.com/computationalstylistics/68_german_novels)\n* [German Poetry Corpus (DLK)](https://github.com/thomasnikolaushaider/DLK)\n* [Lesekorpus Altdeutsch (750-1050)](http://titus.uni-frankfurt.de/lea)\n* [LiederCorpus](https://github.com/corpusmusic/liederCorpusAnalysis)\n* [Referenzkorpus Altdeutsch (750-1050)](http://www.deutschdiachrondigital.de/)\n* [Referenzkorpus Mittelhochdeutsch (1050-1350)](https://www.linguistics.rub.de/rem/)\n* [Referenzkorpus Mittelniederdeutsch/Niederrheinisch (1200-1650)](https://corpora.uni-hamburg.de/hzsk/de/islandora/object/text-corpus:ren-0.6)\n* [Referenzkorpus Frühneuhochdeutsch (1350-1650)](https://www.linguistics.rub.de/ref/)\n* [Thesaurus Indogermanischer Text- und Sprachmaterialien (TITUS)](http://titus.uni-frankfurt.de/indexd.htm?/texte/texte.htm)\n* [Transkriptionen von Fibeln (19. Jahrhundert)](https://github.com/UB-Mannheim/Fibeln)\n\n\n### Specialized\n\n* [AGB-DE](https://github.com/DaBr01/AGB-DE)\n* [arg-microtexts](http://angcl.ling.uni-potsdam.de/resources/argmicro.html)\n* [auto-hMDS (multi-document summarization)](https://github.com/AIPHES/auto-hMDS)\n* [DFKI MobIE](https://github.com/DFKI-NLP/MobIE)\n* [DIRNDL -- (D)iscourse (I)nformation (R)adio (N)ews (D)atabase for (L)inguistic Analysis](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/dirndl.en.html)\n* [Dortmunder Chat Korpus](http://www.chatkorpus.tu-dortmund.de/)\n* [Feidegger (Fashion Images and Descriptions)](https://github.com/zalandoresearch/feidegger)\n* [Foodblog-Korpus](https://doi.org/10.5281/zenodo.1410445)\n* [Fußballlinguistik](https://fussballlinguistik.de/korpora/)\n* [German EUROPARL data w/ NE annotation](https://nlpado.de/~sebastian/software/ner_german.shtml)\n* [German Job Reference Corpus](https://github.com/iug-htw/GJRC)\n* [German Political Speeches Corpus](http://purl.org/corpus/german-speeches)\n* [German Recipes Dataset](https://www.kaggle.com/sterby/german-recipes-dataset)\n* [GermaParl (Bundestag)](https://github.com/PolMine/GermaParlTEI)\n* [German Parliamentary Corpus (GerParCor)](https://github.com/texttechnologylab/GerParCor)\n* [German Wikipedia Text Corpus](https://github.com/t-systems-on-site-services-gmbh/german-wikipedia-text-corpus)\n* [GRAIN corpus -- (G)erman-(RA)dio-(IN)terviews](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/grain.html)\n* [Legal Entity Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition)\n* [One Million Posts Corpus](https://ofai.github.io/million-post-corpus/)\n* [Open Legal Data Corpus (German laws and court decisions)](http://openlegaldata.io/research/2019/02/19/court-decision-dataset.html)\n* [Pegida Facebook Comments](http://0x0a.li/wp-content/uploads/2015/01/pegida_korpus.zip)\n* [Potsdam Commentary Corpus (PCC)](http://angcl.ling.uni-potsdam.de/resources/pcc.html)\n* [Songkorpus](http://songkorpus.de/)\n* [Survey of Corpora for Germanic Low-Resource Languages and Dialects](https://github.com/mainlp/germanic-lrl-corpora)\n* [TeCoPhy: A Text Corpus of German Physics Texts]{https://zenodo.org/records/8316079}\n* [Ten Thousand German News Articles Dataset](https://github.com/tblock/10kGNAD)\n* [TTLab StadtWiki Corpus](https://vlo.clarin.eu/?7\u0026fq=collection:CEDIFOR.Corpus.StadtWikis\u0026fqType=collection:or)\n* [GSM-1k-de (translated german subset of the first 1000 items of GSM8K)](https://huggingface.co/datasets/D4ve-R/gsm-1k-de)\n\n#### Swiss German\n\n* [ArchiMob Corpus](https://www.spur.uzh.ch/en/departments/research/textgroup/ArchiMob.html)\n* [NOAH's Corpus: Part-of-Speech Tagging for Swiss German](https://noe-eva.github.io/NOAH-Corpus/)\n* [SpinningBytes Swiss German Sentiment Corpus](https://github.com/spinningbytes/SB-CH)\n* [Swiss SMS Corpus](http://www.sms4science.ch/Main/WebHome)\n\n\n#### Learner and Error Corpora\n\n* [C-WEP](http://lingured.info/linguistic-resources/cwep/)\n* [DysList (list of dyslexic errors)](https://github.com/Rauschii/DysListGerman)\n* [Falko](https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko)\n* [Litkey](https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/)\n* [OpinionSpam](https://github.com/hdaSprachtechnologie/OpinionSpam)\n\n\n#### Word lists\n\n* [Analogies in German Particle Verb Meaning Shifts](http://www.ims.uni-stuttgart.de/data/pv-meaning-shift)\n* [Degree of Grammaticalization for German Prepositions](https://www.ims.uni-stuttgart.de/forschung/ressourcen/experiment-daten/gramm-prepositions.html)\n* [DWDS lemma list](https://www.dwds.de/lemma/list)\n* [DeReWo](http://www1.ids-mannheim.de/kl/projekte/methoden/derewo.html)\n* [Diachronic Usage Relatedness (DURel)](http://www.ims.uni-stuttgart.de/data/durel)\n* [DiMLex (lexicon of German discourse markers)](https://github.com/discourse-lab/dimlex)\n* [German Compound Database](https://www.webcorpora.org/opendata/gecodb/)\n* [German derivational lexicons](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/DErivBase.html)\n* [German nouns from Wiktionary](https://github.com/gambolputty/german-nouns)\n* [german_stopwords](https://github.com/solariz/german_stopwords)\n* [German Wiktionary Lexicon Graph](https://vlo.clarin.eu/record?10\u0026count=2\u0026docId=21.11105_47_0000-000B-D244-B\u0026fq=collection:CEDIFOR.Lexicon\u0026fqType=collection:or\u0026index=0)\n* [German word list for GNU Aspell](https://sourceforge.net/projects/germandict/files/)\n* [Metaphoric Change (annotated lexemes)](http://www.ims.uni-stuttgart.de/forschung/ressourcen/experiment-daten/metaphoric_change.html)\n* [Morphological Dictionaries (DEMorphy)](https://github.com/DuyguA/german-morph-dictionaries)\n* [OpenThesaurus](https://www.openthesaurus.de/about/download)\n* [Stopwords German (DE)](https://github.com/stopwords-iso/stopwords-de)\n* [VulGer](https://github.com/ee-2/VulGer/)\n* [wiktextract](https://github.com/tatuylonen/wiktextract)\n* [wiktionary-de-parser](https://github.com/gambolputty/wiktionary-de-parser)\n\n\n### Data acquisition\n\n* [bundestag](https://github.com/bundestag)\n* [bundestweets](https://github.com/michi-d/bundestweets)\n* [DKPro C4Corpus](https://github.com/dkpro/dkpro-c4corpus)\n* [german-reddit](https://github.com/adbar/german-reddit)\n* [news-crawler](https://github.com/theSoenke/news-crawler)\n* [news-please](https://github.com/fhamborg/news-please)\n* [pattern](https://github.com/clips/pattern/wiki/pattern-de)\n  * [patternlite](https://github.com/WZBSocialScienceCenter/patternlite)\n* [scrape-gutenberg-de](https://github.com/jfilter/scrape-gutenberg-de)\n* [SwigSpot Schwyzertuutsch-Spotting](https://github.com/derlin/SwigSpot_Schwyzertuutsch-Spotting)\n* Twitter\n  * [German April 2013 Twitter Corpus](https://github.com/TScheffler/GermanTwitterApril2013)\n* [trafilatura](https://github.com/adbar/trafilatura)\n\n\n### Lists of corpora\n\n* [CLARIN-D list](https://www.clarin-d.net/en/corpora)\n* [Corpora at the IMS](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/index.en.html)\n* [CorpusExplorer's list of corpora](https://notes.jan-oliver-ruediger.de/korpora/)\n* [Korpusarchiv (IDS Mannheim)](http://www1.ids-mannheim.de/kl/projekte/korpora/archiv.html)\n* [Laudatio (Long-term Access and Usage of Deeply Annotated Information)](https://www.laudatio-repository.org/)\n* [Parallel corpora (see below)](#parallel-corpora)\n* [Treebanks (see below)](#treebanks)\n* [ZAS list](http://www.zas.gwz-berlin.de/katalog00100.html?\u0026L=1)\n\n\n## Generic resources\n\n### Frameworks\n\n* [AmbiverseNLU](https://github.com/ambiverse-nlu/ambiverse-nlu)\n* [CLARIN-D web tools](https://www.clarin-d.net/en/analysing)\n* [CorpusExplorer](http://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/)\n* [DKPro Core](https://dkpro.github.io/dkpro-core)\n* [DKPro Similarity](https://dkpro.github.io/dkpro-similarity)\n* [DKPro Text Classification (TC)](https://dkpro.github.io/dkpro-tc)\n* [DKPro Word Sense Disambiguation (WSD)](https://dkpro.github.io/dkpro-wsd)\n* [flair](https://github.com/zalandoresearch/flair)\n* [FreeLing](http://nlp.lsi.upc.edu/freeling/)\n* [ixa pipes](http://ixa2.si.ehu.es/ixa-pipes/)\n* [Mate Tools](http://hdl.handle.net/11022/1007-0000-0000-8E4E-A), webservice via [WebLicht](https://weblicht.sfs.uni-tuebingen.de/)\n* [NLP-Cube](https://github.com/adobe/NLP-Cube)\n* [nlptasks](https://github.com/ulf1/nlptasks)\n* [spaCy](https://github.com/explosion/spaCy)\n* [Sparv](https://spraakbanken.gu.se/sparv/docs/)\n* [Stanford CoreNLP](https://github.com/stanfordnlp/CoreNLP)\n* [textblob-de](https://github.com/markuskiller/textblob-de)\n* [TextImager](https://vlo.clarin.eu/record?4\u0026count=1\u0026docId=21.11105_47_0000-000B-CAE6-E\u0026index=0\u0026q=TextImager)\n\n\n### Treebanks\n\n* [German Universal Dependency Treebank](https://github.com/UniversalDependencies/UD_German-GSD/tree/master)/[UD German GSD](https://universaldependencies.org/treebanks/de_gsd/index.html)\n* [Hamburg Dependency Treebank](https://corpora.uni-hamburg.de/hzsk/de/islandora/object/treebank:hdt)\n* [NEGRA](http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/)\n* [TIGER Corpus](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html)\n   * [SALSA (role semantic annotation)](http://www.coli.uni-saarland.de/projects/salsa/corpus/)\n   * [Tiger2Dep (dependency parses)](http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/Tiger2Dep.en.html)\n* [TGermaCorp (literary texts)](https://vlo.clarin.eu/record?1\u0026count=2\u0026docId=21.11105_47_0000-000B-D4D9-1\u0026index=0\u0026q=TGermaCorp)\n* [TüBa-D/Z](http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html)\n\n\n## Deep learning models and transformers\n\n* [LAION LeoLM Llama v2 German Foundation Language Model 7B Parameters](https://huggingface.co/LeoLM/leo-hessianai-7b)\n* [LAION  LeoLM Llama v2 German Foundation Language Model 13B Parameters](https://huggingface.co/LeoLM/leo-hessianai-13b)\n* [dbmdz BERT models](https://github.com/dbmdz/berts)\n* [Deepset German BERT model](https://deepset.ai/german-bert)\n* [Evaluating German Transformer Language Models with Syntactic Agreement Tests](https://github.com/DFKI-NLP/gevalm)\n* [German ELMo Model](https://github.com/t-systems-on-site-services-gmbh/german-elmo-model)\n* [german-transformer-training](https://github.com/PhilipMay/german-transformer-training)\n* [GermLM](https://github.com/tonianelope/Multilingual-BERT) (NER exploration)\n* [GerPT2](https://github.com/bminixhofer/gerpt2)\n* [Sentence Transformers](https://github.com/UKPLab/sentence-transformers)\n\n\n### Annotation\n\n* [cora](https://github.com/comphist/cora)\n* [CorefAnnotator](https://www.ims.uni-stuttgart.de/en/research/resources/tools/corefannotator/)\n* [corpus-tools.org (HU Berlin)](http://corpus-tools.org/home/)\n* [INCEpTION](https://inception-project.github.io/)\n* [satzify](https://github.com/michdr/satzify)\n* [TreeAnno](https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/treeanno/)\n* [WebAnno](https://webanno.github.io/webanno/)\n\n\n### Standards\n\n* [DTA Basisformat](http://www.deutschestextarchiv.de/doku/basisformat/)\n* [ISO TC 37 SC 4](https://www.iso.org/committee/297592.html)\n* [UIMA](http://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html)\n* [UIMA CAS XMI](https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xmi)\n\n\n## Linguistic processing\n\n### Preprocessing\n\n* [clean-text](https://github.com/jfilter/clean-text)\n* [german-preprocessing](https://github.com/jfilter/german-preprocessing)\n* [german_transliterate](https://github.com/repodiac/german_transliterate)\n\n\n### Tokenization / Sentence boundary detection\n\n* [Cutter](https://pub.cl.uzh.ch/wiki/public/cutter/start)\n* [Datok](https://github.com/korap/datok)\n* [deep-eos](https://github.com/dbmdz/deep-eos) (sentence boundary detection only)\n* [FullStop](https://github.com/oliverguhr/fullstop-deep-punctuation-prediction) (sentence boundary detection only)\n* [JTok](https://github.com/DFKI-MLT/JTok)\n* [KorAP-Tokenizer](https://github.com/KorAP/KorAP-Tokenizer)\n* [nnsplit](https://github.com/bminixhofer/nnsplit) (sentence boundary detection only)\n* [SoMaJo](https://github.com/tsproisl/SoMaJo)\n* [syntok](https://github.com/fnl/syntok)\n* [waste](http://kaskade.dwds.de/waste/)\n* [german-abbreviations](https://github.com/jfilter/german-abbreviations) (resource)\n\n\n### Stemming\n\n* [CISTEM](https://github.com/LeonieWeissweiler/CISTEM)\n* [german-go-stemmer](https://github.com/antonbaumann/german-go-stemmer)\n* [Snowball](http://snowballstem.org)\n\n\n### Lemmatization\n\n* [cstlemma](https://github.com/kuhumcst/cstlemma)\n* [germalemma](https://github.com/WZBSocialScienceCenter/germalemma)\n* [GermaLemma++ (ensemble)](https://github.com/rubcompling/germalemmaplusplus)\n* [german-lemmatizer](https://github.com/jfilter/german-lemmatizer)\n* [HanTa](https://github.com/wartaal/HanTa)\n* [IWNLP](https://github.com/Liebeck/IWNLP)\n   * [spacy-iwnlp](https://github.com/Liebeck/spacy-iwnlp)\n* [LemmaTag](https://github.com/Hyperparticle/LemmaTag)\n* [simplemma](https://github.com/adbar/simplemma)\n\n\n### Morphological analysis\n\n* [CharSplit](https://github.com/dtuggener/CharSplit)\n* [DEMorphy](https://github.com/DuyguA/DEMorphy)\n* [dehyphen](https://github.com/jfilter/dehyphen)\n* [deep-german](https://github.com/aakhundov/deep-german) (classification of nouns by genders)\n* [Durm Lemmatizer](http://www.semanticsoftware.info/durm-german-lemmatizer)\n* [german_compound_splitter](https://github.com/repodiac/german_compound_splitter)\n* [GermanNumerus](https://github.com/ulrischa/GermanNumerus)\n* [HypheNN-de](https://github.com/msiemens/HypheNN-de)\n* [jwordsplitter](https://github.com/danielnaber/jwordsplitter)\n* [lang-deu](https://github.com/giellalt/lang-deu)\n* [Low German morphology and tools](https://github.com/giellalt/lang-nds)\n* [MarMoT](http://cistern.cis.lmu.de/marmot/)\n* [MOP Compound Splitter](https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/mcs/)\n* [Morphy](http://morphy.wolfganglezius.de/)\n* [morphisto](https://code.google.com/archive/p/morphisto/)\n* [nnsplit](https://github.com/bminixhofer/nnsplit)\n* [SECOS (unsupervised compound splitter)](https://github.com/riedlma/SECOS)\n* [SFST](http://www.cis.uni-muenchen.de/~schmid/tools/SFST/)\n* [SMOR](http://www.cis.uni-muenchen.de/~schmid/tools/SMOR/), webservice via [WebLicht](https://weblicht.sfs.uni-tuebingen.de/)\n* [timur](https://github.com/wrznr/timur)\n* [zmorge](https://github.com/rsennrich/zmorge)\n\n\n### Normalization\n\n* [CAB](http://www.deutschestextarchiv.de/cab)\n* [dehyphen](https://github.com/pd3f/dehyphen)\n* [norma](https://github.com/comphist/norma)\n* [transnormer](https://github.com/ybracke/transnormer)\n\n\n### Phonology\n\n* [gramophone](https://github.com/wrznr/gramophone)\n\n\n### POS-tagging\n\n* [clevertagger](https://github.com/rsennrich/clevertagger)\n* [HanTa](https://github.com/wartaal/HanTa/)\n* [hunpos](https://github.com/mivoq/hunpos)\n* [LemmaTag](https://github.com/Hyperparticle/LemmaTag)\n* [moot](http://kaskade.dwds.de/~jurish/projects/moot)\n* [pattern.de](https://www.clips.uantwerpen.be/pages/pattern-de)\n* [RFTagger](http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/), webservice via [WebLicht](https://weblicht.sfs.uni-tuebingen.de/)\n  * [Java interface](http://sifnos.sfs.uni-tuebingen.de/resource/A4/rftj/)\n* [RNNTagger](https://www.cis.uni-muenchen.de/~schmid/tools/RNNTagger/)\n* [SoMeWeTa](https://github.com/tsproisl/SoMeWeTa)\n* [TnT](http://www.coli.uni-saarland.de/~thorsten/tnt/)\n* [TreeTagger (including models)](http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)\n  * [Demo for middle high German](http://clarin05.ims.uni-stuttgart.de/mhdtt/index.html)\n\n\n### Syntactical parsing\n\n* [Berkeley Parser](https://github.com/slavpetrov/berkeleyparser)\n* [BitPar](http://www.cis.uni-muenchen.de/~schmid/tools/BitPar/), webservice via [WebLicht](https://weblicht.sfs.uni-tuebingen.de/)\n* [CDG](https://nats-www.informatik.uni-hamburg.de/CDG/DownloadPage)\n   * [jwcdg](https://gitlab.com/nats/jwcdg)\n* [IMSTrans (dependency parser)](http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/imstrans.en.html)\n* [ParZu](https://github.com/rsennrich/parzu)\n* [Stanford Parser](https://nlp.stanford.edu/software/lex-parser.shtml)\n* [STEPS Parser](https://github.com/boschresearch/steps-parser)\n\n\n### Named Entity Recognition\n\n* [AmbiverseNLU KnowNER](https://github.com/ambiverse-nlu/ambiverse-nlu)\n* [flair](https://github.com/zalandoresearch/flair)\n* [GermaNER](https://github.com/tudarmstadt-lt/GermaNER)\n* [GERNERMED](https://github.com/frankkramer-lab/GERNERMED)\n* [historic-ner](https://github.com/dbmdz/historic-ner)\n* [LSTM+CRF+FastText with models for (historic) German](https://github.com/riedlma/sequence_tagging)\n* [microNER](https://uhh-lt.github.io/microNER/)\n* [Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German](https://github.com/riedlma/sequence_tagging)\n* [ner-corpora](https://github.com/EuropeanaNewspapers/ner-corpora)\n* [NER-datasets](https://github.com/davidsbatista/NER-datasets)\n* [(Faruqui \u0026 Pado 2010) Components and evaluation data](https://nlpado.de/~sebastian/software/ner_german.shtml)\n* [Towards Robust Named Entity Recognition for Historic German](https://github.com/dbmdz/historic-ner)\n\n\n### Misc\n\n* [German Preprocessing](https://github.com/jfilter/german-preprocessing)\n* [ICARUS (query tool)](http://hdl.handle.net/11022/1007-0000-0000-8E56-0)\n   * [ICARUS2](http://hdl.handle.net/11022/1007-0000-0007-C635-E)\n* [nlprule (error correction)](https://github.com/bminixhofer/nlprule)\n\n\n### Text generation\n\n* [fake_text](https://github.com/fhswf/fake_text)\n* [ngen](https://github.com/Fedjmike/ngen)\n* [pypolibox](https://github.com/arne-cl/pypolibox)\n\n\n### Industry/Applications\n\n* [German Decompounder for Apache Lucene / Apache Solr / Elasticsearch](https://github.com/uschindler/german-decompounder)\n* [holmes-extractor](https://github.com/msg-systems/holmes-extractor)\n* [LanguageTool](https://languagetool.org)\n* [Plenum First Said](https://github.com/ungeschneuer/plenum_first_said)\n\n\n### Evaluation\n\n* [DKPro Statistics](https://dkpro.github.io/dkpro-statistics)\n* [Evaluating Off-the-Shelf NLP Tools for German](https://github.com/rubcompling/konvens2019)\n* [Evaluation of different NLP toolkits](https://github.com/goerlitz/nlp-german)\n\n\n## Semantic analysis\n\n### Datasets\n\n* [Complex Word Identification (DE, EN, ES, FR)](https://sites.google.com/view/cwisharedtask2018/home)\n* Distributional memories: [DM.de](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/dm-de.html) [TransDM.de](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/transdmde.html)\n* [Distributional thesauri (includes German)](https://sourceforge.net/projects/jobimtext/files/data/models/)\n* [Downloads page](https://sites.google.com/site/iggsahome/home) of the Interest Group on German Sentiment Analysis\n* [Lexical Chains](https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/lexical-chains.html)\n* [Logical metonymy database](http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/GLMD.html)\n* [schulteimwalde.de/resources.html](http://www.schulteimwalde.de/resources.html)\n* [Semantic Relations in Context](https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/semreldata.html)\n* [UKP Darmstadt data list](https://www.informatik.tu-darmstadt.de/ukp/research_6/data/index.en.jsp)\n\n\n### Word embeddings and senses\n\n* [disco (semantic similarity)](https://github.com/linguatools/disco)\n* [GermaNet](http://www.sfs.uni-tuebingen.de/GermaNet/)\n   * [pygermanet](https://github.com/wroberts/pygermanet)\n* [german2vec](https://github.com/Bachfischer/german2vec)\n* [GermanWordEmbeddings](https://github.com/devmount/GermanWordEmbeddings)\n* [German ELMO model](https://github.com/t-systems-on-site-services-gmbh/german-elmo-model)\n* [Open German WordNet](https://github.com/hdaSprachtechnologie/odenet)\n* [sensegram](https://github.com/tudarmstadt-lt/sensegram)\n* [SpinningBytes word embeddings (tweets)](https://www.spinningbytes.com/resources/wordembeddings/)\n* [UBY Linked Lexical Resource](https://dkpro.github.io/dkpro-uby/)\n* [WECHSEL (subword embeddings)](https://github.com/CPJKU/wechsel)\n\n\n### Sentiment analysis datasets / polarity clues\n\n* [Affective norms: abstractness, arousal, imageability and valence ratings](http://www.ims.uni-stuttgart.de/data/affective_norms)\n* [German Sentiment Classification Model for Dialog Systems](https://github.com/oliverguhr/german-sentiment)\n* [GermanPolarityClues](http://www.ulliwaltinger.de/sentiment/)\n* [HeiST – Heidelberg Sentiment Treebank](http://www.cl.uni-heidelberg.de/~versley/HeiST/)\n* [(Non-)Literalness Ratings for complex verbs](http://www.ims.uni-stuttgart.de/data/pv_nonlit )\n* [Potsdam Twitter Sentiment Corpus (PotTS)](https://github.com/WladimirSidorenko/PotTS)\n* [Sentiment dictionary for German political language](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BKBXWD)\n* [Sentiment Lexicon (Univ. Zurich)](http://bics.sentimental.li/files/8614/2462/8150/german.lex)\n* [SentimentWortschatz](http://wortschatz.uni-leipzig.de/en/download/)\n* [SpinningBytes Swiss German Sentiment Corpus](https://github.com/spinningbytes/SB-CH)\n\n\n### Sentiment detection\n\n* [3x8emotions](https://github.com/tweedmann/3x8emotions)\n* [EmotiKLUE](https://github.com/tsproisl/EmotiKLUE)\n* [germansentiment: A simple python package for sentiment classification](https://github.com/oliverguhr/german-sentiment-lib)\n* [LT-ABSA: Aspect-based Sentiment Analysis](https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/software/lt-absa.html)\n* [sentiment-analyser](https://github.com/syzer/sentiment-analyser)\n* [spacy-sentiws](https://github.com/Liebeck/spacy-sentiws)\n\n\n### GermEval\n\n*(category to improve)*\n* [Official GermEval tools list](https://projects.fzai.h-da.de/iggsa/resources-tools-and-literature/)\n* [GermEval 2015 data (Lexical Substitution)](https://sites.google.com/site/germeval2015/)\n* [Germeval Task 2017](https://sites.google.com/view/germeval2017-absa/home)\n* [GermEval-2018 data](https://github.com/uds-lsv/GermEval-2018-Data)\n* [germeval-rug](https://github.com/malvinanissim/germeval-rug)\n* [IWG_hatespeech_public](https://github.com/UCSM-DUE/IWG_hatespeech_public)\n* [jpadillamontani/germeval2018](https://github.com/jpadillamontani/germeval2018)\n* [uhh-lt/GermEval2017-Baseline](https://github.com/uhh-lt/GermEval2017-Baseline)\n* [UKP embeddings for GermEval 2017](https://github.com/UKPLab/germeval2017-sentiment-detection)\n\n\n### Discourse\n\n* [Bilingual formality (T/V) corpus (EN/DE)](https://nlpado.de/~sebastian/data/tv_data.shtml)\n* [Bilingual FrameNet frame embeddings (EN/DE)](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/XLFrameEmbed.html)\n* [Bilingual parallel frame-semantic annotation (EN/DE)](https://nlpado.de/~sebastian/data/srl_data.shtml)\n* [Coreferee](https://github.com/msg-systems/coreferee)\n* [CorZu (coreference resolution)](https://github.com/dtuggener/CorZu)\n* [Discourse Segmenter](https://github.com/WladimirSidorenko/DiscourseSegmenter)\n* [Frame Identification](https://github.com/UKPLab/naacl18-multimodal-frame-identification)\n* [German social media textual entailment dataset](https://www.cl.uni-heidelberg.de/~zeller/res/te-ger/index.mhtml)\n* [HotCoref DE (coreference resolution)](https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/HotCorefDe)\n* [PropS-DE (proposition structures)](https://github.com/UKPLab/props-de)\n* [Tense-mood-voice annotation system](https://github.com/aniramm/tmv-annotator)\n\n\n### Summarization and Simplification\n\n* [DEPlain](https://github.com/rstodden/DEPlain)\n* [Klexikon](https://github.com/dennlinger/klexikon) (Joint Summarization and Simplification)\n* [Tools and corpora for summarization of German texts](https://github.com/AIPHES)\n\n\n### Psycholinguistics\n\n* [Noun Associations for German](http://www.psycholing.es.uni-tuebingen.de/nag/index.php)\n\n\n## Speech NLP\n\n* [Archiv für gesprochenes Deutsch](http://agd.ids-mannheim.de/korpus_index.shtml)\n* [BAS ressources](http://www.bas.uni-muenchen.de/Bas/BasSpeechresourceseng.html)\n* [Bochumer Korpus der gesprochenen Sprache im Ruhrgebiet](https://www.ruhr-uni-bochum.de/kgsr/)\n* [Database for Spoken German (IDS Mannheim)](https://dgd.ids-mannheim.de/dgd/pragdb.dgd_extern.welcome)\n* [deepspeech-german](https://github.com/AASHISHAG/deepspeech-german)\n* [(D)iscourse (I)nformation (R)adio (N)ews (D)atabase for (L)inguistic Analysis ](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/dirndl.en.html)\n* [Hamburger Zentrum für Sprachkorpora](https://corpora.uni-hamburg.de/hzsk/)\n* [kaldi-tuda-de](https://github.com/uhh-lt/kaldi-tuda-de)\n* [Open Speech Data Corpus](http://voxforge.org/home/forums/other-languages/german/open-speech-data-corpus-for-german)\n* [Thorsten (Emotional) - Open German Voice Dataset](https://github.com/thorstenMueller/deep-learning-german-tts/#emotional-dataset-information-and-samples-microphone)\n* [Thorsten (Neutral) - Open German Voice Dataset](https://github.com/thorstenMueller/deep-learning-german-tts/#samples-of-my-neutral-voice)\n\n\n## Machine Translation\n\n*(category to improve)*\n* [Tensorflow NMT DE-EN](https://github.com/tensorflow/nmt)\n   * [NMT English to German](https://github.com/thomasschmied/Neural_Machine_Translation_Tensorflow)\n* [Unsupervised Word Segmentation for NMT](https://github.com/rsennrich/subword-nmt)\n\n\n#### Parallel corpora\n\n* [Linguatools Webcrawl German-English 2015](http://linguatools.org/tools/corpora/webcrawl-parallel-corpus-german-english-2015/)\n* [MuchMore Springer Bilingual Corpus](http://muchmore.dfki.de/resources1.htm)\n* [OPUS collection](http://opus.nlpl.eu/)\n\n\n## Large Language Models\n\n* [EM_German](https://github.com/jphme/EM_German)\n* [German Alpaca Dataset](https://github.com/LEL-A/GerAlpacaDataCleaned)\n* [German Benchmark Datasets](https://github.com/bjoernpl/GermanBenchmark)\n* [German Language Models](https://github.com/malteos/german-language-models)\n* [GermanRAG](https://github.com/rasdani/germanrag)\n* [German Text Embedding Clustering Benchmark](https://github.com/ClimSocAna/tecb-de)\n* [Swiss German Text Encoders](https://github.com/ZurichNLP/swiss-german-text-encoders)\n* [Vox Populi, Vox AI](https://github.com/leahvdh/Vox-Populi-Vox-AI)\n\n\n## Teaching resources and tutorials\n\n* [bubenhofer.com/korpuslinguistik/kurs/](http://www.bubenhofer.com/korpuslinguistik/kurs/)\n* [CorpusExplorer v2.0 – Seminartauglich in einem halben Tag](https://lernen-mit.jan-oliver-ruediger.de/)\n* [deeplearning4nlp-tutorial](https://github.com/UKPLab/deeplearning4nlp-tutorial)\n* [deutsch-nlp (text classification)](https://github.com/taylorhawks/deutsch-nlp)\n* [German Text Classification Tutorial Series](https://github.com/realjanpaulus/german_text_classification_nlp)\n* [Statistics for linguists (S. Vasishth)](https://github.com/vasishth/Statistics-lecture-notes-Potsdam)\n* [Stilometrie](https://github.com/realjanpaulus/stylometry)\n* Uni Zürich: Sprachtechnologie in den Digital Humanities – MOOC [Youtube](https://www.youtube.com/channel/UChb3Rd5vo3WEgMSy99VInaw) \u0026 [Coursera](http://www.coursera.org/learn/digital-humanities)\n\n\n## More lists\n\n### German\n\n* [CLARIN VLO (DE+public)](https://vlo.clarin.eu/search?2\u0026fq=languageCode:code:deu\u0026fq=licenseType:PUB)\n* [computerlinguistik.org](http://www.computerlinguistik.org/portal/portal.html?s=Ressourcen)\n* [Learn German as a foreign language](https://github.com/willianpaixao/awesome-german)\n* [LRE Map](http://lremap.elra.info/?\u0026selected_facets=languageFilter_exact%3AGerman)\n* [MetaShare Language Resources](http://metashare.ilsp.gr:8080/repository/search/?q=\u0026selected_facets=languageNameFilter_exact%3AGerman)\n* [Peter Kolb's list](http://www.ling.uni-potsdam.de/~kolb/nlp-tools.html)\n* [Swiss German Language Processing](http://kitt.cl.uzh.ch/kitt/noah/resources)\n\n\n### General\n\n* GitHub topics [corpus-linguistics](https://github.com/topics/corpus-linguistics) \u0026 [nlp](https://github.com/topics/nlp)\n* [nlp-datasets](https://github.com/niderhoff/nlp-datasets)\n* [NLP-progress](https://github.com/sebastianruder/NLP-progress)\n* [/r/LanguageTechnology/](https://www.reddit.com/r/LanguageTechnology/)\n\n\n### Comparable lists\n\n* [awesome-nlp](https://github.com/keon/awesome-nlp)\n* [Awesome Community-Curated NLP List](https://github.com/alvations/awesome-community-curated-nlp)\n* [awesome-chinese-nlp](https://github.com/crownpku/Awesome-Chinese-NLP)\n* [awesome-danish](https://github.com/fnielsen/awesome-danish)\n* [awesome-hungarian-nlp](https://github.com/oroszgy/awesome-hungarian-nlp)\n* [awesome Information Retrieval](https://github.com/harpribot/awesome-information-retrieval)\n* [Indonesian NLP](https://github.com/kmkurn/id-nlp-resource)\n* [Norwegian NLP resources](https://github.com/web64/norwegian-nlp-resources)\n* [awesome-nlp-polish](https://github.com/ksopyla/awesome-nlp-polish)\n* [awesome-spanish-nlp](https://github.com/dav009/awesome-spanish-nlp)\n* [NLP-Pandect](https://github.com/ivan-bilan/The-NLP-Pandect)\n* [M. Weisser's list of NLP/Computational Linguistics Resources](http://martinweisser.org/corpora_site/comp_ling_resources.html)\n* [NLP tools (Saarland University)](http://www.coli.uni-saarland.de/~csporled/page.php?id=tools)\n* [W. Roberts' Computational Linguistics Links](http://amor.cms.hu-berlin.de/~robertsw/links.html)\n\n\n### Larger institutional GitHub groups\n\n\n* [DFKI-NLP](https://github.com/DFKI-NLP)\n* [Language Technology Group, Universität Hamburg](https://github.com/uhh-lt)\n* [Saarland University Spoken Language Systems Group](https://github.com/uds-lsv)\n* [Ubiquitous Knowledge Processing Lab, TU Darmstadt](https://github.com/ukplab)\n* [Webis](https://github.com/webis-de)\n\n\n## Contributors\n\nSee the [list of contributors](https://github.com/adbar/German-NLP/graphs/contributors).\n\n\n## License\n\n[![CC-BY](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by.svg)](https://creativecommons.org/licenses/by/4.0/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadbar%2FGerman-NLP","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadbar%2FGerman-NLP","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadbar%2FGerman-NLP/lists"}