Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/theimpossibleastronaut/awesome-linguistics

A curated list of anything remotely related to linguistics
https://github.com/theimpossibleastronaut/awesome-linguistics

List: awesome-linguistics

awesome-list language linguistics resources

Last synced: about 1 month ago
JSON representation

A curated list of anything remotely related to linguistics

Host: GitHub
URL: https://github.com/theimpossibleastronaut/awesome-linguistics
Owner: theimpossibleastronaut
License: cc0-1.0
Created: 2014-10-18T12:20:07.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2024-01-01T18:02:01.000Z (5 months ago)
Last Synced: 2024-04-14T12:38:43.272Z (about 2 months ago)
Topics: awesome-list, language, linguistics, resources
Homepage:
Size: 64.5 KB
Stars: 353
Watchers: 27
Forks: 29
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Lists

awesome - Linguistics
more-awesome - Linguistics - Anything remotely related to linguistics. (Science)
awesome-projects - Linguistics
lists - awesome-linguistics
awesome-cn - 语言学
collection - awesome-linguistics
collection - awesome-linguistics
awesome - Linguistics
awesome-possum - Linguistics
Awesome-Web3 - Linguistics
awesome-Stuff - Linguistics
awesome - Linguistics
fucking-awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesomelist - awesome-linguistics
awesome - Linguistics
awesome-of-awesome-lists - Linguistics
awesome-copy - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome-lists-awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
fucking-lists - awesome-linguistics
awesome - Linguistics
awesome-list - Linguistics
awesome - Linguistics
sindresorhus-awesome - Linguistics
awesome - Linguistics
awesome-lists-Definative-Lists - awesome-linguistics
awesome - Linguistics
awesome-awesome - Linguistics
awesome - Linguistics
awesome - Linguistics
uva-awesome-search-old - **theimpossibleastronaut/awesome-linguistics**
awesome - Linguistics
awesome-digital-scholarship - Awesome Linguistics - Curated list of linguistics and natural language processing resources (Related Awesome Lists)
awesome - Linguistics
awesome-cn - Linguistics

README

        ### Awesome Linguistics

[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

A curated list of anything remotely related to linguistics, sorted in alphabetical order.

- [Programming](#programming)

    - [Platforms and toolkits](#platforms-and-toolkits)

    - [Algorithms](#algorithms)

    - [Data sets](#data-sets)

- [Resources](#resources)

    - [Deep learning models and transformers](#deep-learning-models-and-transformers)

    - [On Wikipedia](#on-wikipedia)

    - [On Youtube](#on-youtube)

    - [Books](#books)

        - [Free](#free)

        - [Non free](#non-free)

        - [Lists](#lists)

- [Standards](#standards)

- [Lists](#lists)

- [Communities](#communities)

### Programming

*Libraries, frameworks and applications useful for developing applications.*

### Platforms and toolkits

* [CLARIN-D web tools](https://www.clarin-d.net/en/analysing) - Tools for Analysing Research Data 

* [CorpusExplorer](https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/) - Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface.

* [Haxe-linguistics](https://github.com/sexybiggetje/haxe-linguistics) - Early linguistical analysis and natural language processing library for Haxe.

* [Natural](https://github.com/NaturalNode/natural) - General natural language tools for Node.js.

* [Natural Language ToolKit (NLTK)](http://www.nltk.org/) - The most complete platform for building Python programs to work with human language data.

* [Snowball](https://snowballstem.org/) - Snowball is a language in which stemming algorithms can be easily represented.

* [Spacy](https://spacy.io/) - Industrial-strength  National Language Processing in Python.

* [Mate Tools](http://hdl.handle.net/11022/1007-0000-0000-8E4E-A), webservice via [WebLicht](https://weblicht.sfs.uni-tuebingen.de/)

* [UBIAI](https://ubiai.tools/) - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling.

* [textblob-de](https://github.com/markuskiller/textblob-de) - Nice alternative for spacy (see above).

* [UralicNLP](https://github.com/mikahama/uralicNLP) - An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization, disambiguation and lexical lookup for a great many Uralic languages.

### Algorithms

* [Stemming algorithms for various European languages](http://snowball.tartarus.org/texts/stemmersoverview.html) - Various stemming algorithms from snowball.

* [The Porter Stemmer Algorithm](http://tartarus.org/martin/PorterStemmer/) - The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter.

### Data sets

* [EuroRomCom Data](https://github.com/kirkins/euroromcom) - JSON formatted Pan-Romance word lists.

* [Araneum Germanicum](http://aranea.juls.savba.sk/aranea_about/_germanicum.html)

* [CEHugeWebCorpus](https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638) - German corpus based on CommonCrawl

* [Digitales Wörterbuch der deutschen Sprache (DWDS)](https://dwds.de)

* [GC4 Corpus](https://german-nlp-group.github.io/projects/gc4-corpus.html) (CommonCrawl)

* [IDS Corpora](https://www1.ids-mannheim.de/kl/projekte/korpora) - German Reference Corpus

* [Leipzig Corpora Collection](https://wortschatz.uni-leipzig.de/en/download/) - sampled sentences in different languages.

* [SdeWaC](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html) - big german internet corpus

* [C-WEP](http://lingured.info/linguistic-resources/cwep/)

* [DysList (list of dyslexic errors)](https://github.com/Rauschii/DysListGerman)

* [Falko](https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko)

* [Litkey](https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/)

* [OpinionSpam](https://github.com/hdaSprachtechnologie/OpinionSpam)

### Resources

* [How To Label Data](https://www.lighttag.io/how-to-label-data/) - Guide on managing large scale linguistic annotation projects.

* [Low Resource Languages](https://github.com/RIchardLitt/low-resource-languages) - A list of resources for conservation, development, and documentation of low resource (human) languages.

* [Language Science Press](https://langsci-press.org/) - Language Science Press is a born-digital scholar-led open access publisher in linguistics.

### Deep learning models and transformers

* [dbmdz BERT models](https://github.com/dbmdz/berts)

* [Deepset German BERT model](https://deepset.ai/german-bert)

* [Evaluating German Transformer Language Models with Syntactic Agreement Tests](https://github.com/DFKI-NLP/gevalm)

* [German ELMo Model](https://github.com/t-systems-on-site-services-gmbh/german-elmo-model)

* [german-transformer-training](https://github.com/PhilipMay/german-transformer-training)

* [GermLM](https://github.com/tonianelope/Multilingual-BERT) (NER exploration)

* [GerPT2](https://github.com/bminixhofer/gerpt2)

* [Sentence Transformers](https://github.com/UKPLab/sentence-transformers)

### On Wikipedia

* [Bag of words model](https://en.wikipedia.org/wiki/Bag-of-words_model)

* [Document classification](https://en.wikipedia.org/wiki/Document_classification)

* [Language models](https://en.wikipedia.org/wiki/Language_model)

* [Naive Bayes classification](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)

* [Natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing)

* [Outline of natural language processing](https://en.wikipedia.org/wiki/Outline_of_natural_language_processing)

* [Parts of speech tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging)

* [Sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis)

* [Term frequency - inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)

* [Vector space model](https://en.wikipedia.org/wiki/Vector_space_model)

### On Youtube

* [Computational Linguistics Lecture Playlist (Youtube)](https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i) - Lectures for University of Maryland class on computational linguistics.

* [The Virtual Linguistics Campus](https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g) - CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name.

### Books

*Some of the more interesting and complete books.*

#### Free

* [Essentials of Linguistics, 2nd edition](https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/) - An introductory book (2nd edition).

* [Introduction to Linguistics](https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf)

* [Natural Language Processing with Python](https://www.nltk.org/book/) - The book from the NLTK package.

* [Text Mining with R](https://www.tidytextmining.com)

#### Non free

* [Foundations of Computational Linguistics](https://books.google.com/books?id=o9iGAgAAQBAJ&dq=Foundations+of+Computational+Linguistics&hl=nl&source=gbs_navlinks_s)

* [Foundations of Statistical Natural Language Processing](https://books.google.nl/books?id=YiFDxbEX3SUC)

* [Semisupervised Learning for Computational Linguistics](https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&redir_esc=y)

* [Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition](https://books.google.nl/books?id=fZmj5UNK8AQC)

* [The Oxford Handbook of Computational Linguistics](https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349)

### Standards

* [DTA Basisformat](https://www.deutschestextarchiv.de/doku/basisformat/)

* [ISO TC 37 SC 4](https://www.iso.org/committee/297592.html)

* [UIMA](https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html)

### Lists

* [15 most popular books on good reads](https://www.goodreads.com/shelf/show/natural-language-processing)

* GitHub topics [corpus-linguistics](https://github.com/topics/corpus-linguistics) & [nlp](https://github.com/topics/nlp)

* [nlp-datasets](https://github.com/niderhoff/nlp-datasets)

* [NLP-progress](https://github.com/sebastianruder/NLP-progress)

* [/r/LanguageTechnology/](https://www.reddit.com/r/LanguageTechnology/)

* [awesome-nlp](https://github.com/keon/awesome-nlp)

* [Awesome Community-Curated NLP List](https://github.com/alvations/awesome-community-curated-nlp)

* [awesome-chinese-nlp](https://github.com/crownpku/Awesome-Chinese-NLP)

* [awesome-danish](https://github.com/fnielsen/awesome-danish)

* [awesome-hungarian-nlp](https://github.com/oroszgy/awesome-hungarian-nlp)

* [awesome Information Retrieval](https://github.com/harpribot/awesome-information-retrieval)

* [Indonesian NLP](https://github.com/kmkurn/id-nlp-resource)

* [Norwegian NLP resources](https://github.com/web64/norwegian-nlp-resources)

* [German NLP resources](https://github.com/adbar/German-NLP/)

* [awesome-nlp-polish](https://github.com/ksopyla/awesome-nlp-polish)

* [awesome-spanish-nlp](https://github.com/dav009/awesome-spanish-nlp)

* [M. Weisser's list of NLP/Computational Linguistics Resources](https://martinweisser.org/corpora_site/comp_ling_resources.html)

* [NLP tools (Saarland University)](https://www.coli.uni-saarland.de/~csporled/page.php?id=tools)

### Communities

* [Linguistics Stack Exchange](https://linguistics.stackexchange.com/)

* [Untranslatable.co, Multilingual urban dictionary](https://untranslatable.co/)