https://github.com/danieldk/ohnomore
Explorations in lemmatization
https://github.com/danieldk/ohnomore
Last synced: 3 months ago
JSON representation
Explorations in lemmatization
- Host: GitHub
- URL: https://github.com/danieldk/ohnomore
- Owner: danieldk
- Created: 2017-12-08T13:25:51.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-05-15T06:59:00.000Z (about 3 years ago)
- Last Synced: 2025-02-11T09:29:07.360Z (3 months ago)
- Language: Rust
- Size: 297 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Oh No! More Lemmas
ohnomore consists of two tools to incorporate TüBa-D/Z style lemmas
into language processing pipelines. The first tool, `ohnomore-preproc`
takes TüBa-D/Z lemmas and transforms them into lemmas that are more
fit for machine learning pipelines. For example:* Alternative lemmatizations are removed.
* Separable prefix markers are removed.
* Separable prefixes are removed when they are separated.
* The special reflexive lemma *#refl* is replaced by the lowercased form.
* Lemmas of truncations are replaced by their forms.The second tool, `ohnomore` performs the opposite transformation (as
much as is feasible).