Projects in Awesome Lists by LanguageMachines
A curated list of projects in awesome lists by LanguageMachines .
https://github.com/LanguageMachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing
Last synced: 27 Mar 2025
https://github.com/languagemachines/frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing
Last synced: 09 Apr 2025
https://github.com/LanguageMachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
computational-linguistics folia language natural-language-processing nlp punctuation tokeniser
Last synced: 27 Mar 2025
https://github.com/languagemachines/ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
computational-linguistics folia language natural-language-processing nlp punctuation tokeniser
Last synced: 06 Apr 2025
https://languagemachines.github.io/timbl/
TiMBL implements several memory-based learning algorithms.
c-plus-plus classification decision-tree ib1 ib1-ig igtree k-nearest-neighbours knn learning-algorithm learning-algorithms machine-learning nearest-neighbours timbl
Last synced: 08 May 2025
https://github.com/languagemachines/timbl
TiMBL implements several memory-based learning algorithms.
c-plus-plus classification decision-tree ib1 ib1-ig igtree k-nearest-neighbours knn learning-algorithm learning-algorithms machine-learning nearest-neighbours timbl
Last synced: 07 Apr 2025
https://github.com/languagemachines/piccl
A set of workflows for corpus building through OCR, post-correction and normalisation
computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow
Last synced: 12 Jun 2025
https://github.com/LanguageMachines/PICCL
A set of workflows for corpus building through OCR, post-correction and normalisation
computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow
Last synced: 02 Apr 2025
https://github.com/languagemachines/luiginlp
A workflow system for Natural Language Processing.
natural-language-processing nlp workflow-management-system
Last synced: 30 Jul 2025
https://github.com/LanguageMachines/libfolia
FoLiA library for C++
folia library natural-language-processing nlp
Last synced: 27 Mar 2025
https://github.com/languagemachines/libfolia
FoLiA library for C++
folia library natural-language-processing nlp
Last synced: 30 Jul 2025
https://github.com/languagemachines/clin28_st_spelling_correction
Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction
Last synced: 25 Jul 2025
https://github.com/languagemachines/lamaevents
Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.
django event-detection html-css python-3 twitter
Last synced: 30 Jul 2025
https://github.com/languagemachines/mbt
MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.
c-plus-plus machine-learning natural-language-processing nlp tagger timbl
Last synced: 30 Jul 2025
https://github.com/languagemachines/uctodata
Datafiles for the tokenizer ucto.
Last synced: 25 Jun 2025
https://github.com/languagemachines/ticcutils
Ticcutils, a generic utility library shared by our software.
Last synced: 08 Oct 2025
https://github.com/languagemachines/wopr
Memory Based Word Predictor/Language Model http://ilk.uvt.nl/wopr/
Last synced: 16 Jul 2025
https://github.com/languagemachines/foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
computational-linguistics folia nlp
Last synced: 30 Jul 2025
https://github.com/languagemachines/timblserver
TiMBL implements several memory-based learning algorithms. This is the server part.
Last synced: 11 Sep 2025
https://github.com/languagemachines/dimbl
Distributed Tilburg Memory Based Learner
cpp memory-based-learning multithreading
Last synced: 30 Jul 2025
https://github.com/languagemachines/icdar2017-postocr-ticcl
Wrapper scripts for processing ICDAR2017 PostOCR data given a TICCL ranked input list
Last synced: 27 Jul 2025
https://github.com/languagemachines/dialect2keywords
Webinterface designed to convert words in Dutch dialects ("dialectopgaven") into standard Dutch keywords ("vernederlandste trefwoorden").
Last synced: 23 Jun 2025
https://github.com/languagemachines/toad
Toad: Trainer Of All Data, the Frog training collection
Last synced: 30 Jul 2025
https://github.com/languagemachines/bp-som
BP-SOM: A hybrid of back-propagation learning in multi-layered perceptrons and self-organizing maps
Last synced: 17 Sep 2025
https://github.com/languagemachines/paramsearch
Automated parameter optimisation for Timbl
Last synced: 26 Mar 2025
https://github.com/languagemachines/tadpole
The good old predecessor of Frog
Last synced: 28 Jan 2026
https://github.com/languagemachines/fambl
Family Memory Based Learning (original in ILK SVN)
Last synced: 26 Jul 2025
https://github.com/languagemachines/clariah-plus-tasks
An overview of CLARIAH-PLUS tasks at CLST, Radboud University, Nijmegen
Last synced: 03 Feb 2026
https://github.com/languagemachines/actiontests
small program to test travis issues. Like OSX and Clang OpenMP support
Last synced: 05 Jan 2026
https://github.com/languagemachines/svn-timblmanual
copy from the old ILK svn
Last synced: 05 Jan 2026
https://github.com/languagemachines/jasmin-bliss-negation
Documentation of a corpus sample of Dutch human-computer dialogues annotated with negation cues.
Last synced: 04 Feb 2026
https://github.com/languagemachines/knngraph
KNN graph software originally in TiCC SVN
Last synced: 26 Mar 2025
https://github.com/languagemachines/svn-ticclopstools
Ols ticclopstools from the TiCC svn
Last synced: 26 Mar 2025
https://github.com/languagemachines/svn-sonar
Old Sonar stuff from the TiCC svn
Last synced: 26 Mar 2025
https://github.com/languagemachines/bioport
Scrape pages about persons ('biographies') from Wikipedia.
Last synced: 26 Mar 2025
https://github.com/languagemachines/clst-webservices-meta
CLST webservices software metadata, only for those webservices/webapplications that are not included in LaMachine
Last synced: 17 Oct 2025
https://github.com/languagemachines/ticcactions
collection of githib actions for use in ticc software
Last synced: 26 Oct 2025