Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/apertium/apertium

Core tools (driver script, transfer, tagger, formatters) for the FOSS RBMT system Apertium
https://github.com/apertium/apertium

apertium-core

Last synced: 1 day ago
JSON representation

Core tools (driver script, transfer, tagger, formatters) for the FOSS RBMT system Apertium

Awesome Lists containing this project

README

        

# Apertium

Apertium is an open-source rule-based machine translation toolchain and ecosystem. It facilitates the creation of consistent and transparent machine translation systems by relying on deterministic linguistic rules rather than statistical or neural models. Apertium's tools are designed to be language-agnostic and platform-independent, making them suitable for a wide range of languages and applications.

## Project Overview

Apertium's framework is based on finite-state transducers, which enable efficient and accurate processing of natural languages. The language data used by Apertium is stored in XML and other human-readable text formats, organized into modular single-language packages and translation pairs. This modularity allows for the reuse of language data across multiple translation systems.

## Features

- **Rule-Based Translation**: Consistent and understandable translations based on deterministic rules.
- **Finite-State Transducers**: Efficient language processing using advanced computational models.
- **Language-Agnostic Tools**: Broad applicability across multiple languages.
- **Modular Design**: Reusable language packages simplify the development of new translation pairs.

## Installation

Apertium provides binaries for several platforms, including Debian, Ubuntu, Fedora, CentOS, OpenSUSE, Windows, and macOS. Both nightly builds and official releases are available. If you are on a supported platform, it is recommended to use the pre-built binaries.

For more information, see the [Apertium Installation Guide](https://wiki.apertium.org/wiki/Installation).

## Building from Source

If you need to modify Apertium’s behavior or are on a platform that is not officially supported, follow these steps to build from source.

### Requirements

- [lttoolbox](https://github.com/apertium/lttoolbox)
- libxml2
- ICU

### Compiling

```bash
$ autoreconf -fvi
$ ./configure
$ make
```

## Usage

Apertium can be used to translate text between supported languages.
Assuming the relevant language data (here the Spanish-Catalan translator) has been installed, translation can be achieved with the following command:

```bash
$ apertium spa-cat input.txt output.txt
```

The `apertium` executable can also use piped streams:

```bash
$ echo "La casa es roja." | apertium spa-cat
```

Language data which has been compiled but not installed can be used with the `-d` flag:

```bash
$ echo "La casa es roja." | apertium -d ./apertium-spa-cat spa-cat
```

Formats other than plaintext can be specified with the `-f` flag:

```bash
$ apertium -f html spa-cat input.html output.html
```

Data packages may provide modes besides the main translation mode. Use the `-l` flag to list them.

```bash
$ apertium -l
$ apertium -l -d ./apertium-spa-cat
```

## Additional Tools

This repository also provides the following executables:

### Pipeline Modules

- `apertium-extract-caps`, `apertium-restore-caps`: Handle capitalization
- `apertium-pretransfer`: Split compound analyses into separate words for processing by `apertium-transfer`
- `apertium-posttransfer`: Clean up repeated spaces
- `apertium-tagger`: Perform statistical part-of-speech tagging
- `apertium-transfer`, `apertium-interchunk`, `apertium-postchunk`: Structural transfer modules ([documentation](https://wiki.apertium.org/wiki/A_long_introduction_to_transfer_rules))
- `apertium-wblank-attach`, `apertium-wblank-detach`, `apertium-wblank-mode`: Handle word-bound blanks

### Build Tools

These programs are used in the process of compiling linguistic data packages.

- `apertium-compile-caps`: Compile capitalization-handling rules for use by `apertium-restore-caps` ([documentation](https://wiki.apertium.org/wiki/Capitalization_restoration))
- `apertium-gen-modes`: Process the `modes.xml` file, which specifies what translation and analysis modes a data package provides
- `apertium-preprocess-transfer`: Process structural transfer rule files for use by `apertium-transfer`
- `apertium-validate-acx`, `apertium-validate-crx`, `apertium-validate-dictionary`, `apertium-validate-interchunk`, `apertium-validate-modes`, `apertium-validate-postchunk`, `apertium-validate-tagger`, `apertium-validate-transfer`: Validators for various XML rule formats

### Format Handlers

For each supported file format, there is a deformatter named `apertium-des[NAME]` (e.g. `apertium-deshtml`) which reads formatted text from standard input and writes [Apertium stream format](https://wiki.apertium.org/wiki/Apertium_stream_format) to standard output.
There is also a corresponding set of reformatters which do the reverse and are named `apertium-re[NAME]` (e.g. `apertium-rehtml`).
These programs rarely need to be invoked directly, since they are handled by the `apertium` executable.

Most of the format handlers are currently deprecated in favor of [Transfuse](https://github.com/TinoDidriksen/transfuse).

## License

This project is licensed under the GNU General Public License v2.0. See the [COPYING](COPYING) file for details.

For more information, visit [Apertium](https://apertium.org) or the [Apertium Wiki](https://wiki.apertium.org).