Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aajanki/finnish-pos-accuracy

Evaluating accuracy of Finnish part-of-speech taggers
https://github.com/aajanki/finnish-pos-accuracy

finnish lemmatization nlp part-of-speech-tagger

Last synced: about 2 months ago
JSON representation

Evaluating accuracy of Finnish part-of-speech taggers

Host: GitHub
URL: https://github.com/aajanki/finnish-pos-accuracy
Owner: aajanki
License: mit
Created: 2019-10-27T17:35:53.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2023-07-22T19:55:09.000Z (over 1 year ago)
Last Synced: 2024-04-24T03:01:44.431Z (9 months ago)
Topics: finnish, lemmatization, nlp, part-of-speech-tagger
Language: Python
Size: 711 KB
Stars: 6
Watchers: 3
Forks: 0
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Benchmarking Finnish POS taggers and lemmatizers

This repository contains an evaluation of the accuracy of open

source Finnish part-of-speech taggers and lemmatization algorihtms.

### Tested algorithms

* [spaCy](https://spacy.io/) 3.3.0

* [Experimental Finnish model for spaCy](https://github.com/aajanki/spacy-fi) 0.10.0

* [FinnPos](https://github.com/mpsilfve/FinnPos/wiki) git commit 81c1f735 (Oct 2019)

* [Simplemma](https://github.com/adbar/simplemma/) 0.6.0

* [Stanza](https://stanfordnlp.github.io/stanza/) 1.4.0

* [Trankit](https://trankit.readthedocs.io/en/latest/) 1.1.1

* [Turku neural parser pipeline](https://turkunlp.org/Turku-neural-parser-pipeline/) git commit 8c9425dd (Jan 2022)

* [UDPipe](http://ufal.mff.cuni.cz/udpipe) (through spacy-udpipe 1.0.0)

* [UralicNLP](https://github.com/mikahama/uralicNLP) 1.3.0

* [libvoikko](https://voikko.puimula.org/) 4.3.1 and Python voikko module 0.5

* [Raudikko](https://github.com/EvidentSolutions/raudikko) git commit 572b8104 (Jan 2022)

### Test datasets

* [FinnTreeBank 1](https://github.com/UniversalDependencies/UD_Finnish-FTB/blob/master/README.md) v1: randomly sampled subset of about 1000 sentences

* [FinnTreeBank 2](http://urn.fi/urn:nbn:fi:lb-201407163): news, Sofie and Wikipedia subsets

* [UD_Finnish-TDT](https://github.com/UniversalDependencies/UD_Finnish-TDT) r2.9: the testset

## Setup

Install dependencies:

* Python 3.9

* libvoikko with Finnish morphology data files

* clang (or other C++ compiler)

* Dependencies needed to compile [FinnPos](https://github.com/mpsilfve/FinnPos) and [cg3](https://github.com/GrammarSoft/cg3)

* Java 11

Setup git submodules, create a Python 3.9 (must be 3.9 because the Turku parser is incompatible with more recent Python versions) virtual environment and download test data and models by running the following commands:

```

git submodule init

git submodule update

python3.9 -m venv venv

source venv/bin/activate

pip install wheel

pip install -r requirements.txt

# Compile FinnPos

(cd models/FinnPos/src && make -j 4)

# Compile cg3 in models/cg3

# See https://visl.sdu.dk/cg3/chunked/installation.html

# Compile Raudikko

(cd models/raudikko && ./gradlew shadowJar)

./download_data.sh

./download_models.sh

```

## Run

```

./run.sh

```

The numerical results will be saved in results/evaluation.csv, POS and

lemma errors made by each model will be saved in results/errorcases,

and plots will be saved in results/images.

## Results

### Lemmatization

![Lemmatization speed](images/lemma_f1_speed.png)

Execution duration as a function of the F1 score on the concatenated data. Larger values are better on both axes. Notice that the Y-axis is

on log scale.

The execution duration is measured as a batched evaluation (a batch

contains all sentences from one dataset) on a 4 core CPU. Some methods

can be run on a GPU which most likely would improve

their performance, but I haven't tested that.

![Lemmatization error rates](images/lemma_f1_by_dataset.png)

Lemmatization F1 scores for the benchmarked algorithms

on the test datasets.

### Part-of-speech tagging

![Part-of-speech speed](images/pos_f1_speed.png)

Execution duration as a function of the POS F1 score on the concatenated data.

Note that FinnPos and Voikko do not make a distinction between

auxiliary and main verbs and therefore their performance suffers by

4-5% in this evaluation as they mislabel all AUX tags as VERBs.

![Part-of-speech error rates](images/pos_f1_by_dataset.png)

Part-of-speech F1 scores for the benchmarked algorithms.

Simplemma does not include a POS tagging feature.