https://github.com/jerinphilip/slimt

Inference slice of marian for bergamot's tiny11 models. Faster to compile, and wield. Fewer model-archs than bergamot-translator.
https://github.com/jerinphilip/slimt

cpp20 inference-engine machine-translation pybind11 python

Last synced: about 1 year ago
JSON representation

Inference slice of marian for bergamot's tiny11 models. Faster to compile, and wield. Fewer model-archs than bergamot-translator.

Host: GitHub
URL: https://github.com/jerinphilip/slimt
Owner: jerinphilip
License: gpl-2.0
Created: 2023-08-11T14:28:03.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-10-24T16:30:40.000Z (over 1 year ago)
Last Synced: 2025-04-10T01:12:00.375Z (about 1 year ago)
Topics: cpp20, inference-engine, machine-translation, pybind11, python
Language: C++
Homepage:
Size: 387 KB
Stars: 11
Watchers: 7
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # slimt

**slimt** (_slɪm tiː_) is an inference frontend for

[tiny](https://github.com/browsermt/students/tree/master/deen/ende.student.tiny11)

[models](https://github.com/browsermt/students) trained as part of the

[Bergamot project](https://browser.mt/).

[bergamot-translator](https://github.com/browsermt/bergamot-translator/) builds

on top of [marian-dev](https://github.com/marian-nmt/marian-dev) and uses the

inference code-path from marian-dev. While marian is a a capable neural network

library with focus on machine translation, all the bells and whistles that come

with it are not necessary to run inference on client-machines (e.g: autograd,

multiple sequence-to-sequence architecture support, beam-search). For some use

cases like an input-method engine doing translation (see

[lemonade](https://github.com/jerinphilip/lemonade)) - single-thread operation

existing along with other processes on the system suffices. This is the

motivation for this transplant repository. There's not much novel here except

easiness to wield. This repository is simply just the _tiny_ part of marian.

Code is reused where possible.

This effort is inspired by contemporary efforts like

[ggerganov/ggml](https://github.com/ggerganov/ggml) and

[karpathy/llama2](https://github.com/karpathy/llama2.c). _tiny_ models roughly

follow the [transformer architecture](https://arxiv.org/abs/1706.03762), with

[Simpler Simple Recurrent Units](https://aclanthology.org/D19-5632/) (SSRU) in

the decoder. The same models are used in Mozilla Firefox's [offline translation

addon](https://addons.mozilla.org/en-US/firefox/addon/firefox-translations/).

Both `tiny` and `base` models have 6 encoder-layers and 2 decoder-layers, and

for most existing pairs a vocabulary size of 32000 (with tied embeddings). The

following table briefly summarizes some architectural differences between

`tiny` and `base` models:

| Variant | emb | ffn  | params | f32   | i8   |

| ------- | --- | ---  | ------ | ----- | ---- |

| `base`  | 512 | 2048 | 39.0M  | 149MB | 38MB |

| `tiny`  | 256 | 1536 | 15.7M  | 61MB  | 17MB |

The `i8` models, quantized to 8-bit and as small as 17MB is used to provide

translation for Mozilla Firefox's offline translation addon, among other

things.

More information on the models are described in the following papers:

* [From Research to Production and Back: Ludicrously Fast Neural Machine Translation](https://aclanthology.org/D19-5632)

* [Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task](https://aclanthology.org/2020.ngt-1.26/)

The large-list of dependencies from bergamot-translator have currently been

reduced to:

* For `int8_t` matrix-multiply [intgemm](https://github.com/kpu/intgemm)

  (`x86_64`) or [ruy](https://github.com/google/ruy) (`aarch64`) or

  [xsimd](https://github.com/xtensor-stack/xsimd) via

  [gemmology](https://github.com/mozilla/gemmology).

* For vocabulary - [sentencepiece](https://github.com/browsermt/sentencepiece). 

* For sentence-splitting using regular-expressions

  [PCRE2](https://github.com/PCRE2Project/pcre2).

* For `sgemm` - Whatever BLAS provider is found via CMake (openblas,

  intel-oneapimkl, cblas).  Feel free to provide

  [hints](https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors). 

* [CLI11](https://github.com/CLIUtils/CLI11/) (only a dependency for cmdline) 

Source code is made public where basic functionality (text-translation) works

for English-German tiny models. Parity in features and speed with marian and

bergamot-translator (where relevant) is a work-in-progress. Eventual support for

`base` models are planned. Contributions are welcome and appreciated.

## Getting started

Clone with submodules.

```

git clone --recursive https://github.com/jerinphilip/slimt.git

```

Configure and build. `slimt` is still experimenting with CMake and

dependencies. The following, being prepared towards linux distribution should

work at the moment:

```bash

# Configure to use xsimd via gemmology

ARGS=(

    # Use gemmology

    -DWITH_GEMMOLOGY=ON               

    # On x86_64 machines use the following to enable a faster matrix

    # multiplication backend using SIMD. All of these can co-exist and dispatch

    # on best detecting CPU at runtime.

    -DUSE_AVX512=ON -DUSE_AVX2=ON -DUSE_SSSE3=ON -DUSE_SSE2=ON

    # Uncomment below line, comment x86_64 above and use for aarch64, armv7+neon)

    # -DUSE_NEON=ON 

    # Use sentencepiece installed via system.

    -DUSE_BUILTIN_SENTENCEPIECE=OFF        

    # Exports slimtConfig.cmake (cmake) and slimt.pc.in (pkg-config)

    -DSLIMT_PACKAGE=ON 

    # Customize installation prefix if need be.

    -DCMAKE_INSTALL_PREFIX=/usr/local

)

cmake -B build -S $PWD -DCMAKE_BUILD_TYPE=Release "${ARGS[@]}"

cmake --build build --target all

# Require sudo since /usr/local is writable usually only by root.

sudo cmake --build build --target install 

```

The above run expects the packages `sentencepiece`, `xsimd` and a BLAS provider

to come from the system's package manager. Examples of this in distributions

include:

```bash

# Debian based systems

sudo apt-get install -y libxsimd-dev libsentencepiece-dev libopenblas-dev

# ArchLinux

pacman -S openblas xsimd

yay -S sentencepiece-git

```

Successful build generate two executables `slimt-cli` and `slimt-test` for

command-line usage and testing respectively. 

```bash

build/bin/slimt-cli                           \

    --root                    \

    --model          \

    --vocabulary     \

    --shortlist 

build/slimt-test 

```

This is still very much a work in progress, towards being able to make

[lemonade](https://github.com/jerinphilip/lemonade) available in distributions.

Help is much appreciated here, please get in touch if you can help here.

### Python

Python bindings to the C++ code are available.  Python bindings provide a layer

to download models and use-them via command line entrypoint `slimt` (the core

slimt library only has the inference code).

```bash

python3 -m venv env

source env/bin/activate

python3 -m pip install wheel

python3 setup.py bdist_wheel

python3 -m pip install dist/.whl

# Download en-de-tiny and de-en-tiny models.

slimt download -m en-de-tiny

slimt download -m de-en-tiny

```

Find an example of the built wheel running on colab below:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12wFMVwOTzOyRjoeWtett2DTDhwNAbvBZ?usp=sharing)

You may pass customizing cmake-variables via `CMAKE_ARGS` environment variable.

```bash

CMAKE_ARGS='-D...' python3 setup.py bdist_wheel

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jerinphilip/slimt

Awesome Lists containing this project

README