Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kili-technology/awesome-datasets
A comprehensive list of annotated training datasets classified by use case.
annotation awesome-data-science awesome-datasets awesome-public-datasets corpora data dataset datasets document-processing entity-extraction entity-recognition ner nlp ocr open-datasets opendata opendatasets public-data public-dataset public-datasets
Last synced: 25 Jun 2024
![](https://github.com/kili-technology.png)
https://github.com/maidis/turkish-parallel-corpora
Turkish Parallel Corpora
corpora corpus english machine-translation nlp parallel-texts turkish
Last synced: 20 Jun 2024
![](https://github.com/maidis.png)
https://github.com/M4t1ss/parallel-corpora-tools
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
cleaning corpora corpus-tools data-processing data-science filtering language language-processing machine machine-translation natural-language natural-language-processing neural neural-machine-translation nlp nmt translation
Last synced: 20 Jun 2024
![](https://github.com/M4t1ss.png)
https://github.com/kgjerde/corporaexplorer
An R package for dynamic exploration of text collections
corpora corpus r shiny text-analysis
Last synced: 20 May 2024
![](https://github.com/kgjerde.png)
https://github.com/dkalpakchi/awesome-swedish-nlp
A curated list of resources for natural language processing (NLP) in Swedish
awesome-list corpora corpus dataset datasets natural-language-generation natural-language-processing nlp resource-list swedish swedish-language
Last synced: 14 May 2024
![](https://github.com/dkalpakchi.png)
https://github.com/PlanTL-GOB-ES/lm-spanish
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
benchmarks corpora embeddings language-model nlp transformers
Last synced: 13 May 2024
![](https://github.com/PlanTL-GOB-ES.png)
https://github.com/JuliaText/CorpusLoaders.jl
A variety of loaders for various NLP corpora.
Last synced: 11 May 2024
![](https://github.com/JuliaText.png)
https://kgjerde.github.io/corporaexplorer/
An R package for dynamic exploration of text collections
corpora corpus r shiny text-analysis
Last synced: 02 May 2024
![](https://github.com/kgjerde.png)
https://github.com/josecannete/spanish-corpora
Unannotated Spanish 3 Billion Words Corpora
corpora linguistics natural-language-processing nlp spanish spanish-language
Last synced: 27 Apr 2024
![](https://github.com/josecannete.png)
https://github.com/alexeykosh/lingcorpora.py
API for corpora
api corpora corpus national-corpus package
Last synced: 22 Apr 2024
![](https://github.com/alexeykosh.png)
https://github.com/saidziani/Arabic-News-Article-Classification
Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.
arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization
Last synced: 22 Apr 2024
![](https://github.com/saidziani.png)
https://github.com/piskvorky/gensim-data
Data repository for pretrained NLP models and NLP corpora.
corpora dataset gensim glove-model lda-model lsi-model pretrained-models word2vec-model
Last synced: 22 Apr 2024
![](https://github.com/piskvorky.png)
https://github.com/digitallinguistics/data-format
The Data Format for Digital Linguistics (DaFoDiL)
corpora corpus-linguistics daffodil digital-humanities digital-linguistics dlx dlx-format json json-schema language languages linguistics natural-language schema
Last synced: 17 Apr 2024
![](https://github.com/digitallinguistics.png)
https://github.com/AI4Bharat/indicnlp_catalog
A collaborative catalog of NLP resources for Indic languages
awesome-list corpora indian-languages libraries models
Last synced: 28 Mar 2024
![](https://github.com/AI4Bharat.png)
https://github.com/nonamestreet/weixin_public_corpus
微信公众号语料库
chinese-nlp corpora corpus linguistics natural-language-processing nlp wei-xin weixin weixin-data yu-liao yu-liao-ku
Last synced: 21 Mar 2024
![](https://github.com/nonamestreet.png)