Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by proycon

A curated list of projects in awesome lists by proycon .

https://github.com/proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

computational-linguistics evaluation-metrics folia language-modelling library linguistics machine-learning natural-language-processing nlp nlp-library python search-algorithms text-processing

Last synced: 19 Oct 2024

https://github.com/proycon/vocage

A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal

anki command-line flashcards language-learning leitner terminal-based tsv vocabulary

Last synced: 14 Oct 2024

https://github.com/proycon/clam

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

nlp python rest webservice wrapper

Last synced: 30 Oct 2024

https://github.com/proycon/colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

c-plus-plus computational-linguistics corpus library linguistics ngram ngrams nlp pattern-recognition python skipgram text-processing

Last synced: 12 Oct 2024

https://github.com/proycon/flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

annotation-tool clariah clarin computational-linguistics folia javascript linguistic-annotation-framework linguistics nlp python web-application

Last synced: 31 Oct 2024

https://github.com/proycon/lamachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script

clam computational-linguistics docker-image flat folia frog installer linux linux-distribution natural-language-processing nlp python software-distribution vagrant virtual-machine webservices

Last synced: 12 Oct 2024

https://github.com/proycon/folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions

computational-linguistics corpus file-format folia language library linguistic-annotation-framework linguistics nlp python xml

Last synced: 14 Oct 2024

https://github.com/proycon/python-frog

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Last synced: 19 Oct 2024

https://github.com/proycon/analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction

approximate-string-matching fuzzy-matching nlp normalization spelling-correction

Last synced: 30 Oct 2024

https://github.com/proycon/python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

computational-linguistics folia nlp nlp-library python text-processing tokenizer

Last synced: 31 Oct 2024

https://github.com/proycon/codemetapy

A Python package for generating and working with codemeta

codemeta linked-data metadata metadata-extractor schema-org scientific

Last synced: 31 Oct 2024

https://github.com/proycon/gecco

Generic Environment for Context-Aware Correction of Orthography

nlp python spelling-correction

Last synced: 08 Nov 2024

https://github.com/proycon/homeassistant-config

My elaborate home automation configuration + scripts

domotica home-assistant home-assistant-config home-automation

Last synced: 08 Nov 2024

https://github.com/proycon/hanzigrid

Hanzi grids for studying mandarin chinese (tool & output data)

chinese chinese-characters chinese-language hanzi hsk hsk-vocabulary learning-chinese mandarin

Last synced: 19 Oct 2024

https://github.com/proycon/deepfrog

An NLP-suite powered by deep learning

deep-learning deep-neural-networks dutch folia frog nlp transformers

Last synced: 08 Nov 2024

https://github.com/proycon/foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

clariah clarin computational-linguistics folia nlp pynlpl xml

Last synced: 31 Oct 2024

https://github.com/proycon/python-timbl

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.

k-nearest-neighbours knn machine-learning python timbl

Last synced: 08 Nov 2024

https://github.com/proycon/tuir

Browse Reddit from your terminal

Last synced: 22 Aug 2024

https://github.com/proycon/procmapgen

A small toy project written in Rust: procedural generation of various kinds of grid-based maps.

gamedev maps pipes procedural-generation rust

Last synced: 08 Nov 2024

https://github.com/proycon/spacy2folia

Use spaCy for NLP and output to the FoLiA XML format.

Last synced: 08 Nov 2024

https://github.com/proycon/pbmbmt

Phrase-based Memory-based Machine Translation

Last synced: 19 Oct 2024

https://github.com/proycon/foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.

clariah clarin computational-linguistics conllu converters folia nlp

Last synced: 01 Nov 2024

https://github.com/proycon/codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process

Last synced: 01 Nov 2024

https://github.com/proycon/unilangforum

UniLang Language Community - Forum

Last synced: 08 Nov 2024

https://github.com/proycon/valkuil-gecco

Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco

Last synced: 19 Oct 2024

https://github.com/proycon/nederlab-pipeline

Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project

dutch historical-dutch historical-linguistics natural-language-processing nederlab nextflow nlp workflow

Last synced: 19 Oct 2024

https://github.com/proycon/colibri

THIS PROJECT IS BEING RENDERED OBSOLETE BY NEWER VERSIONS colibri-core and colibri-mt !!

Last synced: 19 Oct 2024

https://github.com/proycon/lingua-cli

Very small simple command-line interface for language detection using lingua-rs

Last synced: 08 Nov 2024

https://github.com/proycon/anavec

Proof-of-concept spelling correction/normalisation system based on anagram vectors

Last synced: 19 Oct 2024

https://github.com/proycon/foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.

document-server folia nlp python

Last synced: 08 Nov 2024

https://github.com/proycon/colibri-mt

A Machine Translation framework that wraps around the Moses Decoder and enables k-NN classifier techniques to be used for modelling source-side-context

Last synced: 19 Oct 2024

https://github.com/proycon/piereling

Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.

Last synced: 19 Oct 2024

https://github.com/proycon/semeval2014task5

This is the official repository for SemEval 2014 Task 5: L2 Translation Assistant. It contains the gold standard learner corpus, evaluation results and the Python program library needed for the task. It does not contain a full translation assistance system.

Last synced: 30 Oct 2024

https://github.com/proycon/labirinto

A web front-end portal for a virtual laboratory of NLP tools

codemeta lamachine portal scientific-software

Last synced: 19 Oct 2024

https://github.com/proycon/babelente

BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org

babelfy babelnet computational-linguistics nlp

Last synced: 19 Oct 2024

https://github.com/proycon/sesdiff

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).

diff levenshtein nlp shortest-edit-script

Last synced: 08 Nov 2024

https://github.com/proycon/folia-rust

FoLiA library for rust (alpha)

folia nlp rust

Last synced: 08 Nov 2024

https://github.com/proycon/clamservices

A collection of CLAM webservices for various of our Natural Language Processing tools

Last synced: 30 Oct 2024

https://github.com/proycon/codemeta-server

Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user

codemeta json-ld schema-org software-metadata

Last synced: 19 Oct 2024

https://github.com/proycon/sxmo-docs

my fork of https://git.sr.ht/~mil/sxmo-docs

Last synced: 30 Oct 2024

https://github.com/proycon/spreek2schrijf

Scripts voor Spreek2Schrijf, een project met de Tweede Kamer

Last synced: 19 Oct 2024

https://github.com/proycon/parseme-support

FoLiA & FLAT support for PARSEME

Last synced: 30 Oct 2024

https://github.com/proycon/vocadata

Data for vocabulary learning

Last synced: 19 Oct 2024

https://github.com/proycon/alpino_clam_webservice

A CLAM-powered webservice for Alpino, a dependency parser for Dutch

Last synced: 19 Oct 2024

https://github.com/proycon/svkbd

my fork of suckless' simple virtual keyboard: https://tools.suckless.org/x/svkbd/

Last synced: 19 Oct 2024

https://github.com/proycon/wikiente

A named entity recogniser and linker based on DBPedia Spotlight, with support for the FoLiA format

Last synced: 30 Oct 2024

https://github.com/proycon/wsd2

Last synced: 30 Oct 2024

https://github.com/proycon/antilope

A collection of NLP pipelines powered by Nextflow

Last synced: 30 Oct 2024

https://github.com/proycon/colibri-apps

Contains NLP applications using Colibri Core, suited for end-users. The applications are generally web-based.

Last synced: 30 Oct 2024

https://github.com/proycon/nlpsandbox

Natural Language Processing Sandbox - An experimental playground for all kinds of NLP tasks

Last synced: 30 Oct 2024

https://github.com/proycon/colibri-utils

NLP utilities that rely on Colibri Core: currently only language identification

computational-linguistics historical-linguistics language-detection language-identification nlp

Last synced: 30 Oct 2024

https://github.com/proycon/wrexp

Experiment Wrapper - A framework for launching and keeping track of experiments. Wrexp takes care of storing all stdout/stderr logs and mails you when experiments are completed.

Last synced: 30 Oct 2024

https://github.com/proycon/sxmo-utils

my fork of https://git.sr.ht/~mil/sxmo-utils/

Last synced: 30 Oct 2024

https://github.com/proycon/lexmatch

Simple lexicon matcher against a text

lexical-search nlp

Last synced: 08 Nov 2024

https://github.com/proycon/ssam

split sampler: split your data into multiple sets (e.g. train/test/development)

Last synced: 19 Oct 2024

https://github.com/proycon/colloquery

Web application for searching for phrases/collocations/synonyms in phrase translation tables

computational-linguistics machine-translation mt natural-language-processing nlp

Last synced: 30 Oct 2024

https://github.com/proycon/oersetter-models

Models for Oersetter, a Frisian<->Dutch Machine Translation system

fries frisian frysk machine-translation moses

Last synced: 30 Oct 2024

https://github.com/proycon/dwm

my patched fork of dwm

Last synced: 30 Oct 2024

https://github.com/proycon/hyphertool

Command-line tool for syllabification and hyphenisation for multiple languages

hyphenation nlp syllabification

Last synced: 30 Oct 2024

https://github.com/proycon/colibrita

Colibrita is a proof-of-concept translation assistance system, translating L1 fragments in an L2 context, using machine learning and statistical machine translation techniques

Last synced: 30 Oct 2024

https://github.com/proycon/campyon

Campyon is both a command-line tool as well as Python library for viewing and manipulating columned data files. It supports various filters, statistics, visualisations, and plotting.

Last synced: 30 Oct 2024

https://github.com/proycon/valkuil

Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.

Last synced: 30 Oct 2024

https://github.com/proycon/charfreq

Very simply command-line tool that counts (unicode) character frequency from standard input

Last synced: 30 Oct 2024

https://github.com/proycon/lamachine-docker-test

Meta repository for docker testing of LaMachine on Travis-CI

Last synced: 30 Oct 2024

https://github.com/proycon/lst-chat

Last synced: 30 Oct 2024

https://github.com/proycon/cwrap

Small C wrapper to turn a C function into a very simple webservice

Last synced: 30 Oct 2024

https://github.com/proycon/chira

Chinese Reading Assistant, pop-up translations for Linux

Last synced: 30 Oct 2024

https://github.com/proycon/aur-packages

Arch User Repository packages I maintain

Last synced: 30 Oct 2024

https://github.com/proycon/sxmo-svkbd

My fork of https://git.sr.ht/~mil/sxmo-svkbd

Last synced: 30 Oct 2024

https://github.com/proycon/unilang_ulr

Collection of open language resources from UniLang; containing mostly phrasebooks and stories

Last synced: 30 Oct 2024

https://github.com/proycon/vocavue

A vocabulary trainer with a view

Last synced: 30 Oct 2024

https://github.com/proycon/lamastats

Generates statistical reports on the usage of our software and webservices

Last synced: 30 Oct 2024

https://github.com/proycon/kaldi-installer

Script to install kaldi (and optionally kaldi_nl), geared towards ponyland servers at Radboud University Nijmegen

Last synced: 30 Oct 2024

https://github.com/proycon/counttriples

small hackerrank toy project in rust

Last synced: 30 Oct 2024

https://github.com/proycon/lisgd

Libinput synthetic gesture daemon: my fork of https://git.sr.ht/~mil/lisgd

Last synced: 30 Oct 2024

https://github.com/proycon/lrswitchboard

Code Repository for the Language Resources Switchboard of CLARIN

Last synced: 30 Oct 2024

https://github.com/proycon/cixue

词学 - Chinese Word Trainer in the terminal, using a spaced repetion system

Last synced: 30 Oct 2024

https://github.com/proycon/battlenode

Last synced: 30 Oct 2024

https://github.com/proycon/textshift

a terminal gadget to let text emerge from noise

Last synced: 30 Oct 2024

https://github.com/proycon/slstatus

my fork of slstatus

Last synced: 30 Oct 2024

https://github.com/proycon/ner112

Scripts and evaluation for Named Entity Recognition for dutch emergency calls

Last synced: 30 Oct 2024

https://github.com/proycon/proycon.github.io

My blog

Last synced: 30 Oct 2024

https://github.com/proycon/codemeta2html

Convert software metadata descriptions in codemeta to html

Last synced: 19 Oct 2024

https://github.com/proycon/hascl

Computatlonal Linguistics/Natural Language Processing library for Haskell (just a small toy learning project until further notice!)

Last synced: 30 Oct 2024

https://github.com/proycon/fowlt-gecco

English Spelling Correction system, powered by Gecco

Last synced: 30 Oct 2024

https://github.com/proycon/sxmo-dwm

my fork of sxmo-dwm

Last synced: 30 Oct 2024

https://github.com/proycon/foliaindexer

Create an index over one or more FoLiA XML documents, can produce SQL output for use in relational databases

Last synced: 30 Oct 2024

https://github.com/proycon/flat_configuration_radboud

Configuration for the FLAT instance at CLST, Radboud University, Nijmegen

Last synced: 30 Oct 2024

https://github.com/proycon/creeps-ai

Last synced: 30 Oct 2024

https://github.com/proycon/colibri-net

Last synced: 30 Oct 2024

https://github.com/proycon/contemplativecoding

Blog sources

Last synced: 30 Oct 2024

https://github.com/proycon/phd-thesis

PhD dissertation: Context as Linguistic Bridges

Last synced: 30 Oct 2024

https://github.com/proycon/oersetter-webservice

CLAM Webservice for Oersetter (Middleware of a Frisian-Dutch Translation System)

clam fries frisian frysk machine-translation webservice

Last synced: 30 Oct 2024

https://github.com/proycon/ucto_webservice

Webservice for the ucto, a rule-based tokeniser for multiple languages

Last synced: 30 Oct 2024

https://github.com/proycon/vocajeux

Last synced: 30 Oct 2024

https://github.com/proycon/homepage

My website

Last synced: 30 Oct 2024

https://github.com/proycon/frog_webservice

Webservice and web interface for Frog, a dutch NLP suite

dutch folia frog named-entity-recognition nlp part-of-speech-tagger webservice

Last synced: 30 Oct 2024