Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/frankier/lextract

Dictionary based lexical item extractor
https://github.com/frankier/lextract

Last synced: 8 days ago
JSON representation

Dictionary based lexical item extractor

Awesome Lists containing this project

README

        

# lextract - Dictionary based lexical item extractor

## Overview

### `lextract.aho_corasick`

Find multiwords in text using an Aho Corasick automaton. Works for Mandarin and
Finnish.

### `lextract.keyed_db`

Find multiwords in text using the rarest lemma as a key. Can find contiguous
multiwords in tokenized text or discontinuous ones from a dependency tree.

### `lextract.mweproc`

Processing pipeline for [FinnMWE](https://github.com/frankier/finnmwe).

## Documentation

There are only tests and a few docstrings for now.