An open API service indexing awesome lists of open source software.

https://github.com/apertium/apertium-sah

Apertium linguistic data for Sakha
https://github.com/apertium/apertium-sah

apertium-languages

Last synced: 3 months ago
JSON representation

Apertium linguistic data for Sakha

Awesome Lists containing this project

README

        

Sakha: `apertium-sah`
===============================================================================

This is an Apertium monolingual language package for Sakha. What
you can use this language package for:

* Morphological analysis of Sakha
* Morphological generation of Sakha
* Part-of-speech tagging of Sakha

Requirements
-------------------------------------------------------------------------------

You will need the following software installed:

* lttoolbox (>= 3.3.0)
* apertium (>= 3.3.0)
* vislcg3 (>= 0.9.9.10297)
* hfst (>= 3.8.2)

If this does not make any sense, we recommend you look at: apertium.org

Compiling
-------------------------------------------------------------------------------

Given the requirements being installed, you should be able to just run:

```bash
$ ./configure
$ make
```

You can use ./autogen.sh instead of ./configure if you're compiling
from source.

If you're doing development, you don't have to install the data, you
can use it directly from this directory.

If you are installing this language package as a prerequisite for an
Apertium translation pair, then do (typically as root / with sudo):

```bash
$ make install
```

You can give a `--prefix` to `./configure` to install as a non-root user,
but make sure to use the same prefix when installing the translation
pair and any other language packages.

Testing
-------------------------------------------------------------------------------

If you are in the source directory after running make, the following
commands should work:

* Morphological analysis:

```bash
$ echo "Бу сахалыы морфологическай ырытыы." | apertium -d . sah-morph
^Бу/бу/бу$ ^сахалыы/саха+лыы/сахалыы+э/сахалыы$ ^морфологическай/морфологическай+э/морфологическай$ ^ырытыы/ырытыы+э/ырытыы/ырытыы/ырыт+э/ырыт$^./.$
```

* Or online at [https://beta.apertium.org](https://beta.apertium.org/index.eng.html#analysis?aLang=sah&aQ=%D0%91%D1%83%20%D1%81%D0%B0%D1%85%D0%B0%D0%BB%D1%8B%D1%8B%20%D0%BC%D0%BE%D1%80%D1%84%D0%BE%D0%BB%D0%BE%D0%B3%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B0%D0%B9%20%D1%8B%D1%80%D1%8B%D1%82%D1%8B%D1%8B.).

* Tagging (analysis + disambiguation):

```bash
$ echo "Бу сахалыы морфологическай ырытыы." | apertium -d . sah-tagger
^Бу/Бу$ ^сахалыы/саха+лыы$ ^морфологическай/морфологическай$ ^ырытыы/ырытыы+э$^./.$
```

* Morphological generation:

```bash
$ echo "^бу$ ^саха+лыы$ ^морфологическай$ ^ырытыы+э$" | apertium -d . -f none sah-gener
бу сахалыы морфологическай ырытыы
```

* Or online at [https://beta.apertium.org](https://beta.apertium.org/index.eng.html#generation?gLang=sah&gQ=%5E%D0%B1%D1%83%3Cprn%3E%3Cdem%3E%3Cnom%3E%24%20%5E%D1%81%D0%B0%D1%85%D0%B0%3Cn%3E%2B%D0%BB%D1%8B%D1%8B%3Cpost%3E%24%20%5E%D0%BC%D0%BE%D1%80%D1%84%D0%BE%D0%BB%D0%BE%D0%B3%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B0%D0%B9%3Cadj%3E%24%20%5E%D1%8B%D1%80%D1%8B%D1%82%D1%8B%D1%8B%3Cn%3E%3Cnom%3E%2B%D1%8D%3Ccop%3E%3Caor%3E%3Cp3%3E%3Csg%3E%24).

Files and data
-------------------------------------------------------------------------------

* [`apertium-sah.sah.lexc`](apertium-sah.sah.lexc) - Morphotactic dictionary
* [`apertium-sah.sah.twol`](apertium-sah.sah.twol) - Morphophonological rules
* [`apertium-sah.sah.rlx`](apertium-sah.sah.rlx) - Constraint Grammar disambiguation rules
* [`apertium-sah.post-sah.dix`](apertium-sah.post-sah.dix) - Post-generator
* [`sah.prob`](sah.prob) - Tagger model
* [`modes.xml`](modes.xml) - Translation modes

For more information
-------------------------------------------------------------------------------

* https://wiki.apertium.org/wiki/Installation
* https://wiki.apertium.org/wiki/apertium-sah
* https://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary

Citing
-------------------------------------------------------------------------------

When referencing this work in an academic publication, we ask that you cite the following source:

* Ivanova, Sardana, Jonathan N. Washington, and Francis M. Tyers (2022). **A Free/Open-Source Morphological Analyser and Generator for Sakha**. Presented at _LREC 2022_. [Poster](https://github.com/apertium/papers/blob/master/2022-lrec-sahmorph/LREC%202022%20Sakha%20poster.pdf), [Paper](https://aclanthology.org/2022.lrec-1.550.pdf).

The transducer also appeared as the following:

* Ivanova, Sardana, Francis M. Tyers, and Jonathan N. Washington (2021). **[A Prototype Free/Open-Source Morphological Analyser and Generator for Sakha](http://www.winlp.org/wp-content/uploads/2021/11/winlp2021_43_Paper.pdf)**. Presented at _WiNLP 2021 Workshop, EMNLP 2021_.

Help and support
-------------------------------------------------------------------------------

If you need help using this language pair or data, you can contact:

* Mailing list: [email protected]
* IRC: `#apertium` on irc.oftc.net (irc://irc.oftc.net/#apertium)

See also the file [`AUTHORS`](AUTHORS), included in this distribution.