An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by LanguageMachines

A curated list of projects in awesome lists by LanguageMachines .

https://github.com/LanguageMachines/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing

Last synced: 27 Mar 2025

https://github.com/languagemachines/frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

computational-linguistics dependency-parser dutch folia lemmatiser morphological-analyser morphology named-entity-recognition natural-language-processing nlp pos-tagger syntax text-processing

Last synced: 09 Apr 2025

https://github.com/LanguageMachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

computational-linguistics folia language natural-language-processing nlp punctuation tokeniser

Last synced: 27 Mar 2025

https://github.com/languagemachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

computational-linguistics folia language natural-language-processing nlp punctuation tokeniser

Last synced: 06 Apr 2025

https://github.com/languagemachines/piccl

A set of workflows for corpus building through OCR, post-correction and normalisation

computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow

Last synced: 12 Jun 2025

https://github.com/LanguageMachines/PICCL

A set of workflows for corpus building through OCR, post-correction and normalisation

computational-linguistics corpus-linguistics corpus-tools folia nlp ocr workflow

Last synced: 02 Apr 2025

https://github.com/languagemachines/luiginlp

A workflow system for Natural Language Processing.

natural-language-processing nlp workflow-management-system

Last synced: 30 Jul 2025

https://github.com/languagemachines/ticcltools

Tools for TICCL

Last synced: 30 Jul 2025

https://github.com/languagemachines/clin28_st_spelling_correction

Scripts that were used for preparing and converting the Wikipedia documents that are part of the CLIN28 shared task on spelling correction

Last synced: 25 Jul 2025

https://github.com/languagemachines/lamaevents

Lama Events is a calendar application listing events in the near future. The events are detected and selected by a fully automatic procedure in the Dutch Twitter stream.

django event-detection html-css python-3 twitter

Last synced: 30 Jul 2025

https://github.com/languagemachines/mbt

MBT: Memory-based tagger generation and tagging MBT is a memory-based tagger-generator and tagger in one.

c-plus-plus machine-learning natural-language-processing nlp tagger timbl

Last synced: 30 Jul 2025

https://github.com/languagemachines/uctodata

Datafiles for the tokenizer ucto.

Last synced: 25 Jun 2025

https://github.com/languagemachines/ticcutils

Ticcutils, a generic utility library shared by our software.

Last synced: 08 Oct 2025

https://github.com/languagemachines/wopr

Memory Based Word Predictor/Language Model http://ilk.uvt.nl/wopr/

language-modelling lm nlp

Last synced: 16 Jul 2025

https://github.com/languagemachines/foliautils

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)

computational-linguistics folia nlp

Last synced: 30 Jul 2025

https://github.com/languagemachines/timblserver

TiMBL implements several memory-based learning algorithms. This is the server part.

Last synced: 11 Sep 2025

https://github.com/languagemachines/dimbl

Distributed Tilburg Memory Based Learner

cpp memory-based-learning multithreading

Last synced: 30 Jul 2025

https://github.com/languagemachines/icdar2017-postocr-ticcl

Wrapper scripts for processing ICDAR2017 PostOCR data given a TICCL ranked input list

Last synced: 27 Jul 2025

https://github.com/languagemachines/frogdata

Data for Frog, mandatory

Last synced: 06 Jan 2026

https://github.com/languagemachines/dialect2keywords

Webinterface designed to convert words in Dutch dialects ("dialectopgaven") into standard Dutch keywords ("vernederlandste trefwoorden").

Last synced: 23 Jun 2025

https://github.com/languagemachines/toad

Toad: Trainer Of All Data, the Frog training collection

Last synced: 30 Jul 2025

https://github.com/languagemachines/bp-som

BP-SOM: A hybrid of back-propagation learning in multi-layered perceptrons and self-organizing maps

Last synced: 17 Sep 2025

https://github.com/languagemachines/paramsearch

Automated parameter optimisation for Timbl

Last synced: 26 Mar 2025

https://github.com/languagemachines/tadpole

The good old predecessor of Frog

Last synced: 28 Jan 2026

https://github.com/languagemachines/fambl

Family Memory Based Learning (original in ILK SVN)

Last synced: 26 Jul 2025

https://github.com/languagemachines/timbltests

Unit tests for Timbl

Last synced: 21 Sep 2025

https://github.com/languagemachines/clariah-plus-tasks

An overview of CLARIAH-PLUS tasks at CLST, Radboud University, Nijmegen

Last synced: 03 Feb 2026

https://github.com/languagemachines/actiontests

small program to test travis issues. Like OSX and Clang OpenMP support

Last synced: 05 Jan 2026

https://github.com/languagemachines/svn-timblmanual

copy from the old ILK svn

Last synced: 05 Jan 2026

https://github.com/languagemachines/jasmin-bliss-negation

Documentation of a corpus sample of Dutch human-computer dialogues annotated with negation cues.

Last synced: 04 Feb 2026

https://github.com/languagemachines/mbttests

Unit tests for Mbt

Last synced: 06 Jan 2026

https://github.com/languagemachines/knngraph

KNN graph software originally in TiCC SVN

Last synced: 26 Mar 2025

https://github.com/languagemachines/svn-ticclopstools

Ols ticclopstools from the TiCC svn

Last synced: 26 Mar 2025

https://github.com/languagemachines/svn-sonar

Old Sonar stuff from the TiCC svn

Last synced: 26 Mar 2025

https://github.com/languagemachines/bioport

Scrape pages about persons ('biographies') from Wikipedia.

Last synced: 26 Mar 2025

https://github.com/languagemachines/frogtests

Unit tests for Frog

Last synced: 13 Oct 2025

https://github.com/languagemachines/foliatest

Test suite for libfolia

cpp folia linguistic-analysis

Last synced: 14 Oct 2025

https://github.com/languagemachines/clst-webservices-meta

CLST webservices software metadata, only for those webservices/webapplications that are not included in LaMachine

Last synced: 17 Oct 2025

https://github.com/languagemachines/ticcactions

collection of githib actions for use in ticc software

Last synced: 26 Oct 2025