https://github.com/ancatmara/early-irish-lemmatizer
A DIL-based lemmatizer for Early Irish data.
https://github.com/ancatmara/early-irish-lemmatizer
early-irish irish lemmatization lemmatizer morphological-analysis natural-language-processing nlp python seq2seq
Last synced: 20 days ago
JSON representation
A DIL-based lemmatizer for Early Irish data.
- Host: GitHub
- URL: https://github.com/ancatmara/early-irish-lemmatizer
- Owner: ancatmara
- License: gpl-3.0
- Created: 2016-04-19T20:48:36.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2021-07-21T10:41:09.000Z (almost 4 years ago)
- Last Synced: 2025-03-27T01:51:30.634Z (about 1 month ago)
- Topics: early-irish, irish, lemmatization, lemmatizer, morphological-analysis, natural-language-processing, nlp, python, seq2seq
- Language: Jupyter Notebook
- Homepage:
- Size: 44 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Early Irish Lemmatizer
There are two "editions" of the lemmatizer:
* rule-based (as in good old days)
* seq2seq (fancy neural network stuff)They are absolutely independent, so you are free to choose whichever you like; both are ready-to-use.
### Rule-based
This version is based on the [eDIL](http://dil.ie/). It shows ca. 80% accuracy on texts that follow classical Old Irish orthography, but performs poorly on Middle and Early Modern Irish texts. The algorithm does not resolve all possible cases of spelling variation and the lexicon does not cover all the inflectional forms. Please keep in mind that it is just a study project.
### Seq2seq
This version is also trained on the eDIL, but it yields more promising results: 99.2 % accuracy for known words and 64.9 % for unknown words. A detailed English description of neural network architecture and evalution can be found in my MA thesis (full text [here](https://www.academia.edu/35596032/Morphological_analysis_of_Old_Irish_data_with_neural_networks)).