Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/frankier/lextract
Dictionary based lexical item extractor
https://github.com/frankier/lextract
Last synced: 8 days ago
JSON representation
Dictionary based lexical item extractor
- Host: GitHub
- URL: https://github.com/frankier/lextract
- Owner: frankier
- Created: 2019-05-22T18:35:44.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-22T23:29:47.000Z (over 1 year ago)
- Last Synced: 2024-04-17T12:20:27.701Z (7 months ago)
- Language: Python
- Size: 1.74 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# lextract - Dictionary based lexical item extractor
## Overview
### `lextract.aho_corasick`
Find multiwords in text using an Aho Corasick automaton. Works for Mandarin and
Finnish.### `lextract.keyed_db`
Find multiwords in text using the rarest lemma as a key. Can find contiguous
multiwords in tokenized text or discontinuous ones from a dependency tree.### `lextract.mweproc`
Processing pipeline for [FinnMWE](https://github.com/frankier/finnmwe).
## Documentation
There are only tests and a few docstrings for now.