Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by proycon
A curated list of projects in awesome lists by proycon .
https://github.com/proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing
Last synced: 19 Oct 2024
https://github.com/proycon/vocage
A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
anki command-line flashcards language-learning leitner terminal-based tsv vocabulary
Last synced: 14 Oct 2024
https://github.com/proycon/clam
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
nlp python rest webservice wrapper
Last synced: 30 Oct 2024
https://github.com/proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing
Last synced: 12 Oct 2024
https://github.com/proycon/flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
annotation-tool clariah clarin computational-linguistics folia javascript linguistic-annotation-framework linguistics nlp python web-application
Last synced: 31 Oct 2024
https://github.com/proycon/lamachine
LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
clam computational-linguistics docker-image flat folia frog installer linux linux-distribution natural-language-processing nlp python software-distribution vagrant virtual-machine webservices
Last synced: 12 Oct 2024
https://github.com/proycon/folia
FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
computational-linguistics corpus file-format folia language library linguistic-annotation-framework linguistics nlp python xml
Last synced: 14 Oct 2024
https://github.com/proycon/python-frog
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
Last synced: 19 Oct 2024
https://github.com/proycon/analiticcl
an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
approximate-string-matching fuzzy-matching nlp normalization spelling-correction
Last synced: 30 Oct 2024
https://github.com/proycon/python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
computational-linguistics folia nlp nlp-library python text-processing tokenizer
Last synced: 31 Oct 2024
https://github.com/proycon/codemetapy
A Python package for generating and working with codemeta
codemeta linked-data metadata metadata-extractor schema-org scientific
Last synced: 31 Oct 2024
https://github.com/proycon/gecco
Generic Environment for Context-Aware Correction of Orthography
nlp python spelling-correction
Last synced: 08 Nov 2024
https://github.com/proycon/homeassistant-config
My elaborate home automation configuration + scripts
domotica home-assistant home-assistant-config home-automation
Last synced: 08 Nov 2024
https://github.com/proycon/hanzigrid
Hanzi grids for studying mandarin chinese (tool & output data)
chinese chinese-characters chinese-language hanzi hsk hsk-vocabulary learning-chinese mandarin
Last synced: 19 Oct 2024
https://github.com/proycon/deepfrog
An NLP-suite powered by deep learning
deep-learning deep-neural-networks dutch folia frog nlp transformers
Last synced: 08 Nov 2024
https://github.com/proycon/foliapy
An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
clariah clarin computational-linguistics folia nlp pynlpl xml
Last synced: 31 Oct 2024
https://github.com/proycon/python-timbl
python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
k-nearest-neighbours knn machine-learning python timbl
Last synced: 08 Nov 2024
https://github.com/proycon/procmapgen
A small toy project written in Rust: procedural generation of various kinds of grid-based maps.
gamedev maps pipes procedural-generation rust
Last synced: 08 Nov 2024
https://github.com/proycon/spacy2folia
Use spaCy for NLP and output to the FoLiA XML format.
Last synced: 08 Nov 2024
https://github.com/proycon/pbmbmt
Phrase-based Memory-based Machine Translation
Last synced: 19 Oct 2024
https://github.com/proycon/foliatools
A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
clariah clarin computational-linguistics conllu converters folia nlp
Last synced: 01 Nov 2024
https://github.com/proycon/codemeta-harvester
Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
Last synced: 01 Nov 2024
https://github.com/proycon/valkuil-gecco
Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco
Last synced: 19 Oct 2024
https://github.com/proycon/nederlab-pipeline
Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project
dutch historical-dutch historical-linguistics natural-language-processing nederlab nextflow nlp workflow
Last synced: 19 Oct 2024
https://github.com/proycon/colibri
THIS PROJECT IS BEING RENDERED OBSOLETE BY NEWER VERSIONS colibri-core and colibri-mt !!
Last synced: 19 Oct 2024
https://github.com/proycon/lingua-cli
Very small simple command-line interface for language detection using lingua-rs
Last synced: 08 Nov 2024
https://github.com/proycon/anavec
Proof-of-concept spelling correction/normalisation system based on anagram vectors
Last synced: 19 Oct 2024
https://github.com/proycon/foliadocserve
FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
document-server folia nlp python
Last synced: 08 Nov 2024
https://github.com/proycon/colibri-mt
A Machine Translation framework that wraps around the Moses Decoder and enables k-NN classifier techniques to be used for modelling source-side-context
Last synced: 19 Oct 2024
https://github.com/proycon/piereling
Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.
Last synced: 19 Oct 2024
https://github.com/proycon/semeval2014task5
This is the official repository for SemEval 2014 Task 5: L2 Translation Assistant. It contains the gold standard learner corpus, evaluation results and the Python program library needed for the task. It does not contain a full translation assistance system.
Last synced: 30 Oct 2024
https://github.com/proycon/labirinto
A web front-end portal for a virtual laboratory of NLP tools
codemeta lamachine portal scientific-software
Last synced: 19 Oct 2024
https://github.com/proycon/babelente
BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org
babelfy babelnet computational-linguistics nlp
Last synced: 19 Oct 2024
https://github.com/proycon/sesdiff
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
diff levenshtein nlp shortest-edit-script
Last synced: 08 Nov 2024
https://github.com/proycon/clamservices
A collection of CLAM webservices for various of our Natural Language Processing tools
Last synced: 30 Oct 2024
https://github.com/proycon/codemeta-server
Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user
codemeta json-ld schema-org software-metadata
Last synced: 19 Oct 2024
https://github.com/proycon/sxmo-docs
my fork of https://git.sr.ht/~mil/sxmo-docs
Last synced: 30 Oct 2024
https://github.com/proycon/spreek2schrijf
Scripts voor Spreek2Schrijf, een project met de Tweede Kamer
Last synced: 19 Oct 2024
https://github.com/proycon/parseme-support
FoLiA & FLAT support for PARSEME
Last synced: 30 Oct 2024
https://github.com/proycon/alpino_clam_webservice
A CLAM-powered webservice for Alpino, a dependency parser for Dutch
Last synced: 19 Oct 2024
https://github.com/proycon/svkbd
my fork of suckless' simple virtual keyboard: https://tools.suckless.org/x/svkbd/
Last synced: 19 Oct 2024
https://github.com/proycon/wikiente
A named entity recogniser and linker based on DBPedia Spotlight, with support for the FoLiA format
Last synced: 30 Oct 2024
https://github.com/proycon/antilope
A collection of NLP pipelines powered by Nextflow
Last synced: 30 Oct 2024
https://github.com/proycon/colibri-apps
Contains NLP applications using Colibri Core, suited for end-users. The applications are generally web-based.
Last synced: 30 Oct 2024
https://github.com/proycon/nlpsandbox
Natural Language Processing Sandbox - An experimental playground for all kinds of NLP tasks
Last synced: 30 Oct 2024
https://github.com/proycon/colibri-utils
NLP utilities that rely on Colibri Core: currently only language identification
computational-linguistics historical-linguistics language-detection language-identification nlp
Last synced: 30 Oct 2024
https://github.com/proycon/wrexp
Experiment Wrapper - A framework for launching and keeping track of experiments. Wrexp takes care of storing all stdout/stderr logs and mails you when experiments are completed.
Last synced: 30 Oct 2024
https://github.com/proycon/sxmo-utils
my fork of https://git.sr.ht/~mil/sxmo-utils/
Last synced: 30 Oct 2024
https://github.com/proycon/ssam
split sampler: split your data into multiple sets (e.g. train/test/development)
Last synced: 19 Oct 2024
https://github.com/proycon/colloquery
Web application for searching for phrases/collocations/synonyms in phrase translation tables
computational-linguistics machine-translation mt natural-language-processing nlp
Last synced: 30 Oct 2024
https://github.com/proycon/oersetter-models
Models for Oersetter, a Frisian<->Dutch Machine Translation system
fries frisian frysk machine-translation moses
Last synced: 30 Oct 2024
https://github.com/proycon/hyphertool
Command-line tool for syllabification and hyphenisation for multiple languages
hyphenation nlp syllabification
Last synced: 30 Oct 2024
https://github.com/proycon/colibrita
Colibrita is a proof-of-concept translation assistance system, translating L1 fragments in an L2 context, using machine learning and statistical machine translation techniques
Last synced: 30 Oct 2024
https://github.com/proycon/campyon
Campyon is both a command-line tool as well as Python library for viewing and manipulating columned data files. It supports various filters, statistics, visualisations, and plotting.
Last synced: 30 Oct 2024
https://github.com/proycon/valkuil
Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.
Last synced: 30 Oct 2024
https://github.com/proycon/charfreq
Very simply command-line tool that counts (unicode) character frequency from standard input
Last synced: 30 Oct 2024
https://github.com/proycon/lamachine-docker-test
Meta repository for docker testing of LaMachine on Travis-CI
Last synced: 30 Oct 2024
https://github.com/proycon/cwrap
Small C wrapper to turn a C function into a very simple webservice
Last synced: 30 Oct 2024
https://github.com/proycon/chira
Chinese Reading Assistant, pop-up translations for Linux
Last synced: 30 Oct 2024
https://github.com/proycon/aur-packages
Arch User Repository packages I maintain
Last synced: 30 Oct 2024
https://github.com/proycon/sxmo-svkbd
My fork of https://git.sr.ht/~mil/sxmo-svkbd
Last synced: 30 Oct 2024
https://github.com/proycon/unilang_ulr
Collection of open language resources from UniLang; containing mostly phrasebooks and stories
Last synced: 30 Oct 2024
https://github.com/proycon/lamastats
Generates statistical reports on the usage of our software and webservices
Last synced: 30 Oct 2024
https://github.com/proycon/kaldi-installer
Script to install kaldi (and optionally kaldi_nl), geared towards ponyland servers at Radboud University Nijmegen
Last synced: 30 Oct 2024
https://github.com/proycon/counttriples
small hackerrank toy project in rust
Last synced: 30 Oct 2024
https://github.com/proycon/lisgd
Libinput synthetic gesture daemon: my fork of https://git.sr.ht/~mil/lisgd
Last synced: 30 Oct 2024
https://github.com/proycon/lrswitchboard
Code Repository for the Language Resources Switchboard of CLARIN
Last synced: 30 Oct 2024
https://github.com/proycon/cixue
词学 - Chinese Word Trainer in the terminal, using a spaced repetion system
Last synced: 30 Oct 2024
https://github.com/proycon/textshift
a terminal gadget to let text emerge from noise
Last synced: 30 Oct 2024
https://github.com/proycon/ner112
Scripts and evaluation for Named Entity Recognition for dutch emergency calls
Last synced: 30 Oct 2024
https://github.com/proycon/codemeta2html
Convert software metadata descriptions in codemeta to html
Last synced: 19 Oct 2024
https://github.com/proycon/hascl
Computatlonal Linguistics/Natural Language Processing library for Haskell (just a small toy learning project until further notice!)
Last synced: 30 Oct 2024
https://github.com/proycon/fowlt-gecco
English Spelling Correction system, powered by Gecco
Last synced: 30 Oct 2024
https://github.com/proycon/foliaindexer
Create an index over one or more FoLiA XML documents, can produce SQL output for use in relational databases
Last synced: 30 Oct 2024
https://github.com/proycon/flat_configuration_radboud
Configuration for the FLAT instance at CLST, Radboud University, Nijmegen
Last synced: 30 Oct 2024
https://github.com/proycon/phd-thesis
PhD dissertation: Context as Linguistic Bridges
Last synced: 30 Oct 2024
https://github.com/proycon/oersetter-webservice
CLAM Webservice for Oersetter (Middleware of a Frisian-Dutch Translation System)
clam fries frisian frysk machine-translation webservice
Last synced: 30 Oct 2024
https://github.com/proycon/ucto_webservice
Webservice for the ucto, a rule-based tokeniser for multiple languages
Last synced: 30 Oct 2024
https://github.com/proycon/frog_webservice
Webservice and web interface for Frog, a dutch NLP suite
dutch folia frog named-entity-recognition nlp part-of-speech-tagger webservice
Last synced: 30 Oct 2024