An open API service indexing awesome lists of open source software.

https://github.com/alexeyev/apertium2ud

tag parser and converter between the two tagsets: Apertium (enhanced Leipzig?) and the one used in UD
https://github.com/alexeyev/apertium2ud

apertium morphology natural-language-processing universal-dependencies

Last synced: 25 days ago
JSON representation

tag parser and converter between the two tagsets: Apertium (enhanced Leipzig?) and the one used in UD

Awesome Lists containing this project

README

        

# apertium2ud

Obtaining the mapping between the two tagsets based
on the [information from Apertium Wiki](https://wiki.apertium.org/w/index.php?title=List_of_symbols).

Loosely based on [this code](https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py),
hence the GPLv3 license.

To install, run

```bash
python -m pip install apertium2ud
```
The latest uploaded version is 0.0.8.

**NB!**

1. The instrument is far from being perfect.
2. It was originally developed for working with `apertium-kir`, i.e. with Kyrgyz language.
3. The latest version from PyPI is equipped with the [apertium-kir](https://github.com/apertium/apertium-kir/blob/main/apertium-kir.kir.udx) `.udx` file rules. For other languages, you may need to make some updates.

To build the machine-readable mapping, run

```bash
python apertium_wiki_parser.py
```
## Apertium to Universal tags

```
>>> from apertium2ud.convert import a2ud
>>> tags = ["n", "pl", "acc"]
>>> a2ud(tags)
(['NOUN'], ['Number=Plur', 'Case=Acc'])
>>> tags_sophisticated = ["v", "tv", "ger", "nom", "cop", "aor", "p3", "pl"]
>>> a2ud(tags_sophisticated)
(['VERB', 'AUX'], ['Subcat=Tran', 'VerbForm=Vnoun', 'Case=Nom', 'Tense=Past', 'Person=3', 'Number=Plur'])
```

## Universal tags to Apertium

So far the conversion is far from perfect
```
Кыз NOUN {'Number[psor]=Sing', 'Number=Sing', 'Case=Nom', 'Person[psor]=3', 'Person=3'} ->

досуна NOUN {'Number[psor]=Sing', 'Number=Sing', 'Person[psor]=3', 'Case=Dat', 'Person=3'} ->

кат NOUN {'Case=Nom', 'Person=3', 'Number=Sing'} ->

жазган VERB {'Aspect=Perf', 'Polarity=Pos', 'Number=Sing', 'Tense=Past', 'Person=3', 'Evident=Fh'} ->

. PUNCT set() ->

```

## TODO

* Should sections `chunks` and [XML tags](https://wiki.apertium.org/w/index.php?title=List_of_symbols#XML_tags) be added? [No](https://github.com/apertium/apertium/issues/185).
* Tests: Apertium -> UD -> Apertium, UD -> Apertium -> UD (sometimes losses are inevitable)
* Add the possibility to add the rules based on a `.udx` file, which usually describes custom tags

## How to cite

Greatly appreciated, if you use this work.

```
@misc{apertium2ud2023alekseev,
title = {{alexeyev/apertium2ud: mapping tagsets}},
year = {2023},
url = {https://github.com/alexeyev/apertium2ud}
}
```