Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with linguistics
A curated list of projects in awesome lists tagged with linguistics .
https://github.com/psychopy/psychopy
For running psychology and neuroscience experiments
experiment experiment-control experimental-design linguistics neuroscience psycholinguistics psychology psychophysics psychopy python science
Last synced: 25 Oct 2024
https://github.com/nltk/nltk_data
NLTK Data
corpora linguistics natural-language-processing nlp nltk
Last synced: 17 Dec 2024
https://github.com/LexPredict/lexpredict-lexnlp
LexNLP by LexPredict
analytics contracts data law legal legaltech linguistics ml nlp
Last synced: 27 Oct 2024
https://github.com/Tatoeba/tatoeba2
Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
Last synced: 01 Nov 2024
https://github.com/rime/rime-cantonese
Rime Cantonese input schema | 粵語拼音輸入方案
cantonese cantonese-dictionary cantonese-language chinese chinese-language chinese-nlp input-method jyutping linguistics rime rime-schema
Last synced: 21 Dec 2024
https://github.com/open-dict-data/ipa-dict
Monolingual wordlists with pronunciation information in IPA
dictionaries g2p grapheme-to-phoneme ipa ipa-data ipa-dictionary language linguistics phonemic-transcription phonetic-transcriptions wordlist
Last synced: 29 Nov 2024
https://github.com/proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing
Last synced: 15 Dec 2024
https://github.com/jacksonllee/pycantonese
Cantonese Linguistics and NLP
cantonese computational-linguistics jyutping linguistics natural-language-processing nlp part-of-speech-tagging pycantonese python stop-words word-segmentation
Last synced: 21 Nov 2024
https://github.com/CUNY-CL/wikipron
Massively multilingual pronunciation mining
computational-linguistics g2p language linguistics nlp phonetics phonology pronunciation python-api scraped-data speech
Last synced: 04 Nov 2024
https://github.com/tshatrov/ichiran
Linguistic tools for texts in Japanese language
common-lisp dictionary grammar japanese japanese-language language linguistics
Last synced: 19 Nov 2024
https://github.com/quadrismegistus/prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
finnish-language-analysis linguistics metrical-parser nlp poetry rhythm
Last synced: 18 Dec 2024
https://github.com/hangulize/hangulize
Hangulize transcribes non-Korean words into Hangul
korean linguistics transcription
Last synced: 14 Nov 2024
https://github.com/MaxBittker/nyt-first-said
Tweets when words are published for the first time in the NYT
civic-tech journalism linguistics newsroom politics python scraper twitter
Last synced: 02 Dec 2024
https://github.com/sublee/hangulize
Korean Alphabet Transcription
hangul korean linguistics localization python transcription translation
Last synced: 17 Nov 2024
https://github.com/what-studio/tossi
Chooses correct Korean particle morphs for arbitrary words.
korean linguistics localization python
Last synced: 17 Nov 2024
https://github.com/google/corpuscrawler
Crawler for linguistic corpora
corpus-builder corpus-linguistics crawling linguistics minority-language
Last synced: 26 Oct 2024
https://github.com/CoEDL/elpis
🙊 software for creating speech recognition models.
automatic-speech-recognition computational-linguistics docker kaldi linguistics python transcription
Last synced: 15 Nov 2024
https://github.com/pyconll/pyconll
A minimal, pure Python library to interface with CoNLL-U format files.
annotation conllu dependency-parsing linguistics minimal python universal-dependencies
Last synced: 27 Nov 2024
https://github.com/hbuschme/TextGridTools
Read, write, and manipulate Praat TextGrid files with Python
annotation data-analysis elan linguistics praat python textgrid
Last synced: 27 Nov 2024
https://github.com/proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing
Last synced: 17 Dec 2024
https://github.com/proycon/flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
annotation-tool clariah clarin computational-linguistics folia javascript linguistic-annotation-framework linguistics nlp python web-application
Last synced: 17 Dec 2024
https://github.com/yohasebe/rsyntaxtree
Syntax tree generator for linguistic research
linguistics ruby rubynlp svg syntax-tree visualization
Last synced: 16 Dec 2024
https://github.com/eliranwong/opengnt
Open Greek New Testament Project; NA28 / NA27 Equivalent Text & Resources
bdag bible biblebento biblical-language chinese english free gk-number greek greek-new-testament lexicon linguistics louw-nida morphology na27 na28 scripture spanish strong-number variant
Last synced: 06 Dec 2024
https://github.com/ars-linguistica/mlconjug3
A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.
conjugation conjugator devops linguistics machine-learning nlp nlp-library nlp-machine-learning python3 test-driven-development
Last synced: 20 Dec 2024
https://github.com/bretttolbert/verbecc
Complete Conjugation of any Verb(e) in Catalan, French, Italian, Portuguese, Romanian or Spanish and conjugate unknown verbs using Machine Learning
catalan catalan-language conjugation conjugator french french-language french-nlp linguistics machine-learning natural-language-processing nlp portuguese-language portuguese-verbs romanian romanian-language scikit-learn spanish-language spanish-verbs verb-conjugation verbs
Last synced: 19 Dec 2024
https://github.com/koskenni/beta
An open source reimplementation of Benny Brodda's BETA in Python
benny-brodda beta corpus-tools hyphenation linguistics open-source string-manipulation string-rewriting
Last synced: 12 Nov 2024
https://github.com/proycon/folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
computational-linguistics corpus file-format folia language library linguistic-annotation-framework linguistics nlp python xml
Last synced: 14 Oct 2024
https://github.com/maxent-ai/zeroshot_topics
Topic Inference with Zeroshot models
bert data-science huggingface hypernymy-extraction keybert keyword-extraction knowledge-graph labelled-data labelling linguistics machine-learning nli nlp taxonomy text text-classification transformers weak-supervision weakly-supervised-learning zeroshot-learning
Last synced: 07 Nov 2024
https://github.com/kyubyong/koparadigm
KoParadigm: Korean Inflectional Paradigm Generator
inflection korean linguistics morphology nlp paradigm
Last synced: 10 Nov 2024
https://github.com/knadh/indic.page
A directory of Indic (Indian) language computing resources.
datasets indian-language indic-languages language linguistics nlp
Last synced: 16 Dec 2024
https://github.com/ropensci/lingtypology
R package for linguistic cartography and typological databases search
abvd afbo atlas autotype bivaltyp clld glottolog-database linguistic-maps linguistics phoible r r-package sails typology wals
Last synced: 22 Nov 2024
https://github.com/anna-hope/phonemes
Jason Riggle's chart of phonological features in JSON format + extras
computational-linguistics ipa-symbols linguistics phonemes phonetics phonological-features phonology
Last synced: 19 Dec 2024
https://github.com/sillsdev/libpalaso
Palaso Library: A set of .Net libraries useful for developers of Language Software.
hacktoberfest languages linguistics linux windows
Last synced: 21 Dec 2024
https://github.com/zamgi/lingvo--ner-ru
Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
linguistics lingvo named-entity-recognition natural-language-processing ner nlp nlp-machine-learning
Last synced: 05 Nov 2024
https://github.com/yuhr/langue
A modern platform for conlanging. Currently in the planning stage.
conlang conlinguistics conscript conworld dictionary language langue linguistics ontology speech-recognition speech-synthesis translation
Last synced: 14 Oct 2024
https://github.com/dialpad/inclusive-language
Inclusive language guide as developed by Dialpad linguists
inclusion inclusive-language linguistics
Last synced: 09 Dec 2024
https://github.com/quadrismegistus/poesy
Poetic processing, for Python.
linguistics literary-studies metrical-parser metrics natural-language-processing poetry
Last synced: 31 Oct 2024
https://github.com/proiel/proiel-treebank
Official releases of the PROIEL treebank of ancient Indo-European languages
ancient-greek ancient-languages armenian corpus gothic2 language latin linguistics new-testament old-church-slavonic treebank
Last synced: 28 Oct 2024
https://github.com/milangritta/Pragmatic-Guide-to-Geoparsing-Evaluation
Full resources supporting the publication "A Pragmatic Guide to Geoparsing Evaluation."
analysis data evaluation geocoder geocoding geography geoparser geoparsing google-cloud linguistics location machine-learning named-entity-recognition places spacy-nlp taxonomy toponym-resolution toponyms toponymy training-data
Last synced: 06 Nov 2024
https://github.com/kdelwat/onset
A language evolution simulator, using realistic phonetic changes.
flask linguistics phonetics phonology python vue
Last synced: 19 Nov 2024
https://github.com/omarsar/clinical_nlp_elastic
Clinical NLP Analysis with Elasticsearch and Kibana
elastic elasticsearch kibana linguistics machine-learning mental-health nlp
Last synced: 28 Oct 2024
https://github.com/dveselov/mystem
CGo bindings to Yandex.Mystem
cgo-bindings linguistics mystem russian-specific
Last synced: 26 Oct 2024
https://github.com/agmmnn/syn
🌾 Get synonyms and antonyms of words from Thesaurus.com and other sources in your terminal, with rich output.
cli command-line datamuse dictionary linguistics python rich synonyms terminal thesaurus wordsearch
Last synced: 27 Oct 2024
https://github.com/liulalemx/felig-toolkit
A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus indexer and Term weighter.
amharic amharic-corpus amharic-nlp amharic-stemmer corpus lexical-analyzer linguistics stopword-removal transliterator
Last synced: 18 Nov 2024
https://github.com/orgtre/google-books-ngram-frequency
Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code
google language-learning linguistics ngrams wordlist
Last synced: 14 Oct 2024
https://github.com/dbklim/stressrnn
Modified version of RusStress (https://github.com/MashaPo/russtress) — python package for placing stress in Russian text using RNN (BiLSTM) and the "Grammatical Dictionary" by A. A. Zaliznyak (from http://odict.ru/).
accent bilstm emphasis linguistic linguistics lstm nlp rnn russian russian-accent russian-stress russtress rustress stress
Last synced: 11 Nov 2024
https://github.com/korpling/pepper
A highly extensible plattform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
annotations converter format java linguistic-formats linguistics nlp pepper
Last synced: 15 Nov 2024
https://github.com/bretttolbert/verbecc-svc
Dockerized Python microservice with REST API for verbs conjugation in French, Spanish and Portuguese
conjugation conjugator french french-language french-nlp linguistics machine-learning natural-language natural-language-processing nlp portuguese-language portuguese-verbs romanian romanian-language scikit-learn spanish-language spanish-verbs verb-conjugation
Last synced: 18 Oct 2024
https://github.com/derintelligence/en-az-parallel-corpus
English-Azerbaijani parallel language corpus
azerbaijan azerbaijani-translation corpus language linguistics nlp parallel translation
Last synced: 13 Nov 2024
https://github.com/bramvanroy/astred
An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original text, to find differences and similarities between two different translations, or to see how a machine translation differs from a reference translation.
alignment linguistics nlp parallel-corpus parsing spacy stanza translation
Last synced: 14 Oct 2024
https://github.com/digitallinguistics/data-format
The Data Format for Digital Linguistics (DaFoDiL)
corpora corpus-linguistics daffodil digital-humanities digital-linguistics dlx dlx-format json json-schema language languages linguistics natural-language schema
Last synced: 14 Oct 2024
https://github.com/orgtre/top-open-subtitles-sentences
Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code
language-learning linguistics opensubtitles wordlist
Last synced: 14 Oct 2024
https://github.com/DevelopersTree/KurdishResources
A repository for resources in Kurdish Language
bot kurdish kurdish-oss linguistics wordlist
Last synced: 14 Nov 2024
https://github.com/willianantunes/transcriber-wrapper
Wrapper of well-known transcribers that transform text into phoneme codes
arpabet espeak-ng festival-speech-synthesis international-phonetic-alphabet ipa linguistics mypy pytest transcriber transcription
Last synced: 22 Nov 2024
https://github.com/delph-in/pydmrs
A library for manipulating DMRS structures
computational-linguistics delph-in dependency-graph dmrs formal-semantics hpsg linguistics minimal-recursion-semantics mrs natural-language natural-language-processing nlp python semantics
Last synced: 26 Nov 2024
https://github.com/xylous/grzegorz
A comand-line phonetics tool for finding minimal pairs
anki cli command-line language-learning linguistics minimal-pairs phonology python utility
Last synced: 14 Oct 2024
https://github.com/shujian2015/neural-net-linguistics
Papers about NN and linguistics
Last synced: 09 Nov 2024
https://github.com/alvations/expletives
Expletives vomiting library...
bad-words expletives linguistics nlp python vulgarities
Last synced: 29 Nov 2024
https://github.com/dativebase/dative
Dative: software for linguistic fieldwork
Last synced: 15 Nov 2024
https://gitlab.com/smc/mlmorph
Malayalam Morphological Analyzer using Finite State Transducer https://morph.smc.org.in
Malayalam fst hfst linguistics morphology analyser sfst
Last synced: 19 Nov 2024
https://github.com/zamgi/lingvo--classify
Автоклассификация текста на русском языке
classification linguistics lingvo natural-language-processing nlp nlp-machine-learning text-classification
Last synced: 05 Nov 2024
https://github.com/tallguyjenks/runes
🧙♀️ ᚱᚢᚾᛖᛋ in your R Documents!
bryan-jenks cran elder-futhark-runes futhark futhark-runes linguistics nordic r rstats rstudio rune runes
Last synced: 04 Dec 2024
https://github.com/rshrc/varnamala
A personal app to teach oneself any language, Duolingo style.
flutter-apps kannada language learning linguistics
Last synced: 25 Nov 2024
https://github.com/eliranwong/etcbc-recycle
ETCBC (version 4c) data on Hebrew bible (csv and SQLite3)
bhs bible biblia-hebraica-stuttgartensia csv eliran eliranwong etcbc etcbc-data etcbc-recycle etcbc-remix hebrew linguistics morphology parsing scripture shebanq sqlite sqlite3 tanakh text-fabric
Last synced: 06 Dec 2024
https://github.com/digitallinguistics/transliterate
A small JavaScript library for transliterating strings between different orthographies
digital-humanities digital-linguistics dlx linguistics transliteration
Last synced: 30 Nov 2024
https://github.com/derintelligence/az-summarization
Abstractive summarization for Azerbaijani language
azerbaijan dataset language linguistics nlp summarization
Last synced: 13 Nov 2024
https://github.com/adamliter/latex-workshop
Materials for workshop on LaTeX aimed at linguists
latex latex-examples linguistics tutorial
Last synced: 11 Oct 2024
https://github.com/PaddiM8/GlossVisualiser
Displays interlinear gloss in a more readable way with HTML.
Last synced: 13 Nov 2024
https://github.com/zamgi/lingvo--ner-en
Named entity recognition (NER) in English texts
linguistics lingvo named-entity-recognition natural-language-processing ner nlp nlp-machine-learning
Last synced: 05 Nov 2024
https://github.com/zamgi/lingvo--textsegmenter
Text segmentation into separate words using a simple unigram model and the Viterbi algorithm
linguistics lingvo natural-language-processing nlp text-segmentation viterbi-algorithm
Last synced: 05 Nov 2024
https://github.com/dativebase/old-pyramid
Online Linguistic Database (OLD)
linguistics linguistics-databases pyramid-framework python3
Last synced: 15 Nov 2024
https://github.com/kostaspt/go-datamuse
Go library for Datamuse API
api-wrapper datamuse datamuse-api dictionary go golang linguistics rhymes suggestions words wordsearch
Last synced: 02 Nov 2024
https://github.com/rexshijaku/chatgpt-generated-text-detection-corpus
ChatGPT Generated Text Detection Corpus
chatgpt corpus dataset linguistics text-classification text-detection
Last synced: 15 Oct 2024
https://github.com/groverburger/sapling
An intuitive graphical linguistics syntax tree editor that runs in your browser.
editor linguistics sapling syntax tree
Last synced: 14 Oct 2024
https://github.com/matthias-stemmler/annimate
Annimate - Your Friendly ANNIS Match Exporter
application desktop linguistics react rust tauri typescript
Last synced: 18 Nov 2024
https://github.com/alicerunsonfedora/sniglet
Generate sniglets with machine learning!
abysima linguistics machine-learning word-generation
Last synced: 23 Oct 2024
https://github.com/ggteixeira/plural-generator
Linguistic algorithm which main goal is to generate plurals for Brazilian Portuguese.
linguistics morphology nlp plural python
Last synced: 12 Nov 2024
https://github.com/tmalsburg/selfhost_ling_expts
A guide and templates for self-hosted experiments designed with jsPsych and served using Python
behavioral-research crowdsourcing jspsych linguistics psycholinguistics
Last synced: 28 Oct 2024
https://github.com/mounta11n/vowelreconstruct
An easy to use and understand method for the average user to test various aspects of intelligence of your LLM in only one run.
guanaco linguistics llama llamacpp llm
Last synced: 07 Nov 2024
https://github.com/zamgi/lingvo--syntax-ru
Определение синтаксических ролей слов в предложении в тексте на русском языке
linguistics lingvo natural-language-processing nlp nlp-machine-learning pos-tagging syntax syntax-analysis
Last synced: 05 Nov 2024
https://github.com/zamgi/lingvo--postagger-ru
Определение частей речи / Нормализация текста: приведение всех слов к словарной форме в тексте на русском языке
linguistics lingvo morphological-analysis morphologies morphology natural-language-processing nlp nlp-machine-learning part-of-speech-tagging pos-tagger pos-tagging
Last synced: 05 Nov 2024
https://github.com/zamgi/lingvo--languagedetector
Implementation of detection a few language
language-detection linguistics lingvo natural-language-processing nlp nlp-machine-learning
Last synced: 05 Nov 2024
https://github.com/digitallinguistics/scription
A specification for formatting interlinear glossed texts in a way that is computationally parseable
digital-humanities digital-linguistics dlx documentary-linguistics glosses language language-documentation linguistics scription scription-files
Last synced: 30 Nov 2024
https://github.com/rshrc/words625
A personal app to teach oneself any language, Duolingo style.
flutter-apps kannada language learning linguistics
Last synced: 10 Nov 2024
https://github.com/davidfoerster/synesketch
Software library with synesthetic abilities, made for Processing digital artists. Its code serves as a medium between words, emotions, and images.
affective-computing java linguistics processing-library synesthesia
Last synced: 10 Nov 2024
https://github.com/boltomli/MyShinyApps
R apps that run on shinyapps.io or RStudio Connect
audio linguistics python r rstudio rstudio-connect shinyapps speech
Last synced: 22 Nov 2024
https://github.com/arjo129/langcluster
A visuallization for cognates in various languages and how they spread
artificial-intelligence azure-functions clustering d3-visualization linguistic-analysis linguistics
Last synced: 10 Nov 2024
https://github.com/bretttolbert/verbecc-web
Une interface web pour verbecc-svc | Web front-end for verbecc-svc
conjugation conjugator french french-language french-nlp french-verbs italian-language italian-verbs italiano linguistics machine-learning nlp portuguese-language portuguese-verbs romanian romanian-language spanish spanish-language verb-conjugation verb-conjugations
Last synced: 18 Oct 2024
https://github.com/bluebie/nzsl-training-data-generator
Tool for reading NZSL-Dictionary dataset, and using PoseNet ML model to extract information and images from video of NZSL sign performances, to generate datasets to train CNNs to recognise traits of visual signed languages
linguistics ml nzsl posenet sign-language
Last synced: 22 Oct 2024
https://github.com/jweinst1/corplet
A binary-corpus system for word tagging
corpus-linguistics database linguistics nlp nlp-library
Last synced: 08 Nov 2024
https://github.com/dcavar/geoling
GeoLing: GIS app for mailing list announcements via LINGUIST List
django gis linguist-list linguistics listserver python
Last synced: 07 Nov 2024
https://github.com/orgtre/google-books-words
Words in the Google Books Ngram Corpus (v3, all languages) with metadata and Python code
dictionary google language-learning linguistics wordlist words
Last synced: 29 Nov 2024
https://github.com/nanxstats/tea-sea-cha-land
Spatial-temporal dataset on how the word "tea" spread around the globe: tea if by sea, cha if by land.
dataset linguistics map-visualization
Last synced: 16 Nov 2024
https://github.com/astariul/sentencize.jl
Smallish library for sentence splitting in Julia
english hacktoberfest julia language linguistics nlp regex sentence sentence-splitter sentence-splitting sentences
Last synced: 28 Oct 2024
https://github.com/wmentor/lang
language detection Go library
armenian-language english-language go go-library golang golang-library language language-detection language-model languages linguistics nlp nlp-libraries nlp-library russian russian-language
Last synced: 14 Nov 2024
https://github.com/davidfoerster/kaleidok-examples
KaleidOk invites participants to use a new kind of interactive media tool and take part in an emerging experience which explores speech recognition, media retrieval and visuals generating in a collaborative context (between people, and between people and machines).
affective-computing art java linguistics processing-library speech-processing synesthesia
Last synced: 10 Nov 2024
https://github.com/sergeyt/scraper
Declarative web scraper in JavaScript primarily designed to extract linguistics data
Last synced: 02 Nov 2024
https://github.com/stdlib-js/nlp
Standard library natural language processing.
javascript language lib library linguistics modeling natural nlp node node-js nodejs standard stdlib
Last synced: 20 Nov 2024
https://github.com/zamgi/lingvo--postagger-ner-ru-dnn
Part of speech tagging of words and Named-entity recognition in Russian language using deep neural network in C# for .NET
csharp deep-learning linguistics lingvo machine-learning morphology named-entity-recognition natural-language-processing ner net neural-network nlp nlp-machine-learning pos-tagger pos-tagging russian
Last synced: 05 Nov 2024
https://github.com/zamgi/lingvo--sentsplitter
Detection borders of a sentence
detection-borders linguistics lingvo natural-language-processing sentences
Last synced: 05 Nov 2024