Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/M4t1ss/parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

cleaning corpora corpus-tools data-processing data-science filtering language language-processing machine machine-translation natural-language natural-language-processing neural neural-machine-translation nlp nmt translation

Last synced: 20 Jun 2024

https://github.com/kgjerde/corporaexplorer

An R package for dynamic exploration of text collections

corpora corpus r shiny text-analysis

Last synced: 20 May 2024

https://github.com/PlanTL-GOB-ES/lm-spanish

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

benchmarks corpora embeddings language-model nlp transformers

Last synced: 13 May 2024

https://github.com/JuliaText/CorpusLoaders.jl

A variety of loaders for various NLP corpora.

corpora nlp

Last synced: 11 May 2024

https://kgjerde.github.io/corporaexplorer/

An R package for dynamic exploration of text collections

corpora corpus r shiny text-analysis

Last synced: 02 May 2024

https://github.com/saidziani/Arabic-News-Article-Classification

Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.

arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization

Last synced: 22 Apr 2024

https://github.com/piskvorky/gensim-data

Data repository for pretrained NLP models and NLP corpora.

corpora dataset gensim glove-model lda-model lsi-model pretrained-models word2vec-model

Last synced: 22 Apr 2024

https://github.com/natasha/corus

Links to Russian corpora + Python functions for loading and parsing

corpora datasets nlp python russian

Last synced: 13 Apr 2024

https://github.com/AI4Bharat/indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages

awesome-list corpora indian-languages libraries models

Last synced: 28 Mar 2024

https://github.com/mozillasecurity/fuzzdata

Fuzzing resources for feeding various fuzzers with input. 🔧

browser corpora corpus firefox fuzzing seeds settings

Last synced: 16 Mar 2024