Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with corpora

A curated list of projects in awesome lists tagged with corpora .

https://github.com/juand-r/entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

annotations corpora datasets entity-extraction entity-recognition named-entity-recognition natural-language-processing ner nlp nlp-resources

Last synced: 19 Dec 2024

https://github.com/piskvorky/gensim-data

Data repository for pretrained NLP models and NLP corpora.

corpora dataset gensim glove-model lda-model lsi-model pretrained-models word2vec-model

Last synced: 18 Dec 2024

https://github.com/natasha/corus

Links to Russian corpora + Python functions for loading and parsing

corpora datasets nlp python russian

Last synced: 20 Dec 2024

https://github.com/PlanTL-GOB-ES/lm-spanish

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

benchmarks corpora embeddings language-model nlp transformers

Last synced: 22 Nov 2024

https://github.com/saidziani/arabic-news-article-classification

Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.

arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization

Last synced: 28 Oct 2024

https://github.com/saidziani/Arabic-News-Article-Classification

Automatic categorization of documents, consists in assigning a category to a text based on the information it contains. We'll follow different approach of Supervised Machine Learning.

arabic-language arabic-nlp corpora machine-learning nlp nltk python3 text-categorization

Last synced: 14 Nov 2024

https://github.com/kgjerde/corporaexplorer

An R package for dynamic exploration of text collections

corpora corpus r shiny text-analysis

Last synced: 22 Nov 2024

https://github.com/writecrow/corpus_text_processor

A desktop application for preparing files for use in a corpus

corpora corpus-linguistics desktop-app text-processing

Last synced: 26 Nov 2024

https://github.com/richardlitt/gaelic-resources

A list of computational resources for Gaelic

corpora corpus gaelic irish language nlp resources scots scottish scottish-gaelic

Last synced: 09 Dec 2024

https://github.com/khashashin/chechen_corpora

This repository contains the source code for the Chechen Language Corpora website.

chechen corpora corpus nlp

Last synced: 02 Nov 2024

https://github.com/made2591/cognitive-system-postagger

A pos-tagging library with Viterbi, CYK and SVO -> XSV translator made as part of my final exam for the Cognitive System course in Department of Computer Science.

cky cognitive-services cognitive-systems computer-science corpora cyk department lemmatizer nlp nlp-library nlp-parsing nlp-stemming nltk nltk-grammar nlu postagger postagging sentence stemmer viterbi

Last synced: 13 Nov 2024

https://github.com/litee/tts-asr-corpora

Catalogue of TTS and ASR corpora that can be used for machine learning

asr corpora corpus corpus-linguistics machine-learning text-to-speech tts

Last synced: 21 Nov 2024

https://github.com/fostroll/corpuscula

Toolkit that simplifies corpus processing

conllu corpora natural-language-processing nlp universal-dependencies

Last synced: 21 Dec 2024

https://github.com/zsxkib/ttds-g35-cw3

TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad

corpora data-science neural-ranking-models pagerank query search-engine technologies text text-analysis text-classification ttds web-search

Last synced: 30 Oct 2024

https://github.com/digitallinguistics/concordance

A Node.js library for performing concordance-related tasks on a corpus in DLx JSON format

corpora corpus corpus-linguistics digital-linguistics dlx linguistics

Last synced: 30 Nov 2024

https://github.com/yash22222/terrorist-activity-forecasting-and-risk-assessment-system

In an era marked by global security challenges, the "TAFRAS" emerges as a cutting-edge solution to tackle the ever-evolving threat of terrorism. The project is grounded in the urgent need for predictive systems that can anticipate, assess, and mitigate potential terrorist activities.

corpora data-vizualisation folium-maps gensim global-terrorism-database lda machine-learning matplotlib networkx nltk nmf numpy pandas python random-forest-classifier seaborn sklearn spacy textblob vader-sentiment-analysis

Last synced: 09 Nov 2024

https://github.com/alhadis/silos

Dumping ground of search results collected for GitHub Linguist.

corpora file-formats github-linguist harvester linguist

Last synced: 20 Dec 2024

https://github.com/tanaikech/corporaapp

This is a Google Apps Script library for managing the corpora of Gemini API.

corpora gemini gemini-api google-apps-script google-apps-script-library semantic-search

Last synced: 11 Nov 2024

https://github.com/dohliam/corpus-tools

A collection of scripts for working with multilingual text corpora

corpora corpus corpus-linguistics frequency language linguistics ngram ngrams ruby salience stoplist stopwords

Last synced: 27 Nov 2024

https://github.com/neroist/nimcorpora

A Nim interface for Darius Kazemi's Corpora project.

corpora nim

Last synced: 12 Dec 2024

https://github.com/zlib-ng/corpora

Common corpora used for lossless compression testing and benchmarking.

compression corpora testing

Last synced: 07 Nov 2024

https://github.com/writecrow/crow_frontend

The user interface for the Corpus & Repository of Writing, built in Angular

angular corpora corpus corpus-builder corpus-linguistics natural-language-processing

Last synced: 26 Nov 2024

https://github.com/dwhieb/dissertation

My Ph.D. dissertation in linguistics at the University of California, Santa Barbara

corpora corpus-linguistics functionalism language lexical-categories lexical-flexibility lexicography linguistics parts-of-speech typology word-classes

Last synced: 08 Dec 2024

https://github.com/skyl/corpora

Corpora is a self-building corpus that can help build other arbitrary corpora

agpl ai api cli corpora corpus django markdown monorepo openapi pgvector postgresql python

Last synced: 10 Nov 2024

https://github.com/digitallinguistics/tags2dlx

A JavaScript (Node.js) library that converts a tagged (monolinear) text to DLx JSON format

corpora corpus corpus-linguistics digital-linguistics dlx linguistics

Last synced: 30 Nov 2024

https://github.com/jamnicki/split-corpus

Split-corpus package that provide dividing text corpora into the meaningful parts as close to specified size as possible.

corpora corpus-processing large-files natural-language-processing nlp processing

Last synced: 21 Dec 2024

https://github.com/ololobus/slavic_text_scht

St. Petersburg corpus of hagiographic texts

corpora hagiographic-texts linguistics slavic-languages

Last synced: 21 Dec 2024

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 12 Nov 2024

https://github.com/qanastek/ner-mmtd

Named-entity recognition corpora for multilingual voice recognition in the music industry based on the Million Musical Tweets dataset

corpora dataset english french million-musical-tweets mmtd music named-entity-recognition ner neural-network recognition voice

Last synced: 17 Nov 2024

https://github.com/miweru/vrt_generator

Python class for creating vrt-annotated corpora

corpora linguistic-corpora linguistics vrt wrapper

Last synced: 17 Nov 2024

https://github.com/richardlitt/fortune-cookie-corpus

A growing corpus of fortune cookies (for NLP and fun). Add your fortunes!

corpora corpus corpus-linguistics fortune fortune-cookie fortune-cookies

Last synced: 05 Dec 2024

https://github.com/dohliam/aligned-corpus-search

Simple aligned corpus search tool

corpora corpus

Last synced: 27 Nov 2024