https://github.com/alexeyev/apertium2ud
tag parser and converter between the two tagsets: Apertium (enhanced Leipzig?) and the one used in UD
https://github.com/alexeyev/apertium2ud
apertium morphology natural-language-processing universal-dependencies
Last synced: 25 days ago
JSON representation
tag parser and converter between the two tagsets: Apertium (enhanced Leipzig?) and the one used in UD
- Host: GitHub
- URL: https://github.com/alexeyev/apertium2ud
- Owner: alexeyev
- License: gpl-3.0
- Created: 2023-05-19T08:54:48.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-10T22:34:34.000Z (2 months ago)
- Last Synced: 2025-04-04T12:51:14.265Z (about 2 months ago)
- Topics: apertium, morphology, natural-language-processing, universal-dependencies
- Language: Python
- Homepage:
- Size: 72.3 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# apertium2ud
Obtaining the mapping between the two tagsets based
on the [information from Apertium Wiki](https://wiki.apertium.org/w/index.php?title=List_of_symbols).Loosely based on [this code](https://github.com/mr-martian/apertium-recursive-learning/blob/master/tags.py),
hence the GPLv3 license.To install, run
```bash
python -m pip install apertium2ud
```
The latest uploaded version is 0.0.8.**NB!**
1. The instrument is far from being perfect.
2. It was originally developed for working with `apertium-kir`, i.e. with Kyrgyz language.
3. The latest version from PyPI is equipped with the [apertium-kir](https://github.com/apertium/apertium-kir/blob/main/apertium-kir.kir.udx) `.udx` file rules. For other languages, you may need to make some updates.To build the machine-readable mapping, run
```bash
python apertium_wiki_parser.py
```
## Apertium to Universal tags```
>>> from apertium2ud.convert import a2ud
>>> tags = ["n", "pl", "acc"]
>>> a2ud(tags)
(['NOUN'], ['Number=Plur', 'Case=Acc'])
>>> tags_sophisticated = ["v", "tv", "ger", "nom", "cop", "aor", "p3", "pl"]
>>> a2ud(tags_sophisticated)
(['VERB', 'AUX'], ['Subcat=Tran', 'VerbForm=Vnoun', 'Case=Nom', 'Tense=Past', 'Person=3', 'Number=Plur'])
```## Universal tags to Apertium
So far the conversion is far from perfect
```
Кыз NOUN {'Number[psor]=Sing', 'Number=Sing', 'Case=Nom', 'Person[psor]=3', 'Person=3'} ->
досуна NOUN {'Number[psor]=Sing', 'Number=Sing', 'Person[psor]=3', 'Case=Dat', 'Person=3'} ->
кат NOUN {'Case=Nom', 'Person=3', 'Number=Sing'} ->
жазган VERB {'Aspect=Perf', 'Polarity=Pos', 'Number=Sing', 'Tense=Past', 'Person=3', 'Evident=Fh'} ->
. PUNCT set() ->
```## TODO
* Should sections `chunks` and [XML tags](https://wiki.apertium.org/w/index.php?title=List_of_symbols#XML_tags) be added? [No](https://github.com/apertium/apertium/issues/185).
* Tests: Apertium -> UD -> Apertium, UD -> Apertium -> UD (sometimes losses are inevitable)
* Add the possibility to add the rules based on a `.udx` file, which usually describes custom tags## How to cite
Greatly appreciated, if you use this work.
```
@misc{apertium2ud2023alekseev,
title = {{alexeyev/apertium2ud: mapping tagsets}},
year = {2023},
url = {https://github.com/alexeyev/apertium2ud}
}
```