Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/LanguageMachines/PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation

computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow

Last synced: 30 Jun 2024

https://github.com/M4t1ss/parallel-corpora-tools

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

cleaning corpora corpus-tools data-processing data-science filtering language language-processing machine machine-translation natural-language natural-language-processing neural neural-machine-translation nlp nmt translation

Last synced: 20 Jun 2024

https://github.com/lennes/spect

SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/

analysis annotation conversational-speech corpus-linguistics corpus-tools praat spect speech speech-analysis speech-corpus spoken-language transcript transcription

Last synced: 07 Jun 2024

https://github.com/jaytimm/corpuslingr

A library of functions enabling complex corpus search in context (KWIC), search aggregation, bag-of-words building & keyphrase extraction.

corpus-processing corpus-search corpus-tools

Last synced: 20 May 2024

https://github.com/koskenni/beta

An open source reimplementation of Benny Brodda's BETA in Python

benny-brodda beta corpus-tools hyphenation linguistics open-source string-manipulation string-rewriting

Last synced: 01 Apr 2024