{"id":34176258,"url":"https://github.com/libretranslate/locomotive","last_synced_at":"2026-03-11T07:01:56.488Z","repository":{"id":196158352,"uuid":"694803946","full_name":"LibreTranslate/Locomotive","owner":"LibreTranslate","description":"Toolkit for training/converting LibreTranslate compatible language models 🚂","archived":false,"fork":false,"pushed_at":"2025-06-23T15:24:45.000Z","size":153,"stargazers_count":77,"open_issues_count":8,"forks_count":15,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-01-19T16:46:19.120Z","etag":null,"topics":["language-model","libretranslate","nlp","train"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LibreTranslate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-21T18:16:46.000Z","updated_at":"2026-01-16T21:57:40.000Z","dependencies_parsed_at":"2024-01-23T03:44:52.626Z","dependency_job_id":"66192ec5-089e-48ab-9b5b-be138ff4224f","html_url":"https://github.com/LibreTranslate/Locomotive","commit_stats":{"total_commits":124,"total_committers":3,"mean_commits":"41.333333333333336","dds":"0.048387096774193505","last_synced_commit":"a9c2aa193c6a5b3536fe17d723076e35e6d57b8d"},"previous_names":["libretranslate/locomotive"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/LibreTranslate/Locomotive","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LibreTranslate%2FLocomotive","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LibreTranslate%2FLocomotive/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LibreTranslate%2FLocomotive/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LibreTranslate%2FLocomotive/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LibreTranslate","download_url":"https://codeload.github.com/LibreTranslate/Locomotive/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LibreTranslate%2FLocomotive/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30373508,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T06:09:32.197Z","status":"ssl_error","status_checked_at":"2026-03-11T06:09:17.086Z","response_time":84,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","libretranslate","nlp","train"],"created_at":"2025-12-15T12:28:49.363Z","updated_at":"2026-03-11T07:01:56.480Z","avatar_url":"https://github.com/LibreTranslate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Locomotive\n\nEasy to use, cross-platform toolkit to train [argos-translate](https://github.com/argosopentech/argos-translate) models, which can be used by [LibreTranslate](https://github.com/LibreTranslate/LibreTranslate) 🚂\n\nIt can also [convert pre-trained Opus-MT models](#convert-helsinki-nlp-opus-mt-models).\n\n## Requirements\n\n * Python \u003e= 3.8\n * NVIDIA CUDA graphics card (not required, but highly recommended)\n\n## Install\n\n```bash\ngit clone https://github.com/LibreTranslate/Locomotive --depth 1\ncd Locomotive\npip install -r requirements.txt\n```\n\n## Background\n\nLanguage models can be trained by providing lots of example translations from a source language to a target language. All you need to get started is a set of two files (`source` and `target`). The source file containing sentences written in the source language and a corresponding file with sentences written in the target language.\n\nFor example:\n\n`source.txt`:\n\n```\nHello\nI'm a train!\nGoodbye\n```\n\n`target.txt`:\n\n```\nHola\n¡Soy un tren!\nAdiós\n```\n\nYou'll need a few million sentences to train decent models, and at least ~100k sentences to get some results. [OPUS](https://opus.nlpl.eu/) has a good collection of datasets to get started. You can also use any of the data sources listed on the [argos-train index](https://github.com/argosopentech/argos-train/blob/master/data-index.json). Also check [NLLU](https://nllu.libretranslate.com).\n\n## Usage\n\nPlace `source.txt` and `target.txt` files in a folder (e.g. `mydataset-en_es`) of your choice:\n\n```bash\nmydataset-en_es/\n├── source.txt\n└── target.txt\n```\n\nCreate a `config.json` file specifying your sources:\n\n```json\n{\n    \"from\": {\n        \"name\": \"English\",\n        \"code\": \"en\"\n    },\n    \"to\": {\n        \"name\": \"Spanish\",\n        \"code\": \"es\"\n    },\n    \"version\": \"1.0\",\n    \"sources\": [\n        \"file://D:\\\\path\\\\to\\\\mydataset-en_es\",\n        \"opus://Ubuntu\",\n        \"http://data.argosopentech.com/data-ccaligned-en_es.argosdata\"\n    ]   \n}\n```\n\nNote you can specify, local folders (using the `file://` prefix), internet URLs to .zip archives (using the `http://` or `https://` prefix) or [OPUS](https://opus.nlpl.eu/) datasets (using the `opus://` prefix). For a complete list of OPUS datasets, see [OPUS.md](OPUS.md) and note that they are case-sensitive.\n\nThen run:\n\n```bash\npython train.py --config config.json\n```\n\nTraining can take a while and depending on the size of datasets can require a graphics card with lots of memory.\n\nThe output will be saved in `run/[model]/translate-[from]_[to]-[version].argosmodel`.\n\n### Running out of memory\n\nIf you're running out of CUDA memory, decrease the `batch_size` parameter, which by default is set to `8192`:\n\n```json\n{\n    \"from\": {\n        \"name\": \"English\",\n        \"code\": \"en\"\n    },\n    \"to\": {\n        \"name\": \"Spanish\",\n        \"code\": \"es\"\n    },\n    \"version\": \"1.0\",\n    \"sources\": [\n        \"file://D:\\\\path\\\\to\\\\mydataset-en_es\",\n        \"http://data.argosopentech.com/data-ccaligned-en_es.argosdata\"\n    ],\n    \"batch_size\": 2048\n}\n```\n\n### Reverse Training\n\nOnce you have trained a model from `source =\u003e target`, you can easily train a reverse model `target =\u003e source` model by passing `--reverse`:\n\n```bash\npython train.py --config config.json --reverse\n```\n\n### Tensorboard\n\nTensorBoard allows tracking and visualizing metrics such as loss and accuracy, visualizing the model graph and other features. You can enable tensorboard with the `--tensorboard` option:\n\n```bash\npython train.py --config config.json --tensorboard\n```\n\n### Tuning\n\nThe model is generated using sensible default values. You can override the [default configuration](https://github.com/LibreTranslate/Locomotive/blob/main/train.py#L276) by adding values directly to your `config.json`. For example, to use a smaller dictionary size, add a `vocab_size` key in `config.json`:\n\n```json\n{\n    \"from\": {\n        \"name\": \"English\",\n        \"code\": \"en\"\n    },\n    \"to\": {\n        \"name\": \"Spanish\",\n        \"code\": \"es\"\n    },\n    \"version\": \"1.0\",\n    \"sources\": [\n        \"file://D:\\\\path\\\\to\\\\mydataset-en_es\",\n        \"http://data.argosopentech.com/data-ccaligned-en_es.argosdata\"\n    ],\n    \"vocab_size\": 30000\n}\n```\n\n### Using Filters and Transforms\n\nLocomotive provides various [filters](https://github.com/LibreTranslate/Locomotive/blob/main/FILTERS.md), [transforms](https://github.com/LibreTranslate/Locomotive/blob/main/TRANSFORMS.md) and [augmenters](https://github.com/LibreTranslate/Locomotive/blob/main/AUGMENTERS.md)  which can be used to dynamically cleanup, modify and augment the input sources before training: \n\n```json\n{\n    \"filters\": [\n        \"duplicates\", \n        {\"source_target_ratio\": {\"min\": 0.6, \"max\": 1.5}}\n    ],\n    \"transforms\":[\n        \"remove_unpaired_quotes_and_brackets\"\n    ],\n    \"augmenters\":[\n        \"single_word_punctuation\"\n    ],\n    \"sources\": [\n        {\n            \"source\": \"file://D:\\\\path\\\\to\\\\mydataset-en_es\", \n            \"filters\": [\n                {\"char_length\": {\"min\": 20}}\n            ]\n        }\n    ]\n}\n```\n\nFilters, transforms and augmenters can be specified globally (applied to all sources) as well as per-source (applied only to the specified source).\n\n## Using Weights\n\nIt's possible to specify weights for each source, for example, it's possible to instruct the training to use less samples for certain datasets:\n\n```json\n{\n    \"sources\": [\n        {\"source\": \"file://D:\\\\path\\\\to\\\\mydataset-en_es\", \"weight\": 1},\n        {\"source\": \"http://data.argosopentech.com/data-ccaligned-en_es.argosdata\", \"weight\": 5}\n    ]\n}\n```\n\nIn the example above, 1 sample will be taken from mydataset and 5 will will be taken from CCAligned.\n\nSpecifying weights disables filtering, transformations and augmentations. The datasets are used as-is. No merging or shuffling is performed either. A weight of 1 can be used to instruct Locomotive to not preprocess a source.\n\n## Evaluate\n\nYou can evaluate the model by running:\n\n```bash\npython eval.py --config config.json\nStarting interactive mode\n(en)\u003e Hello!\n(es)\u003e ¡Hola!\n(en)\u003e\n```\n\nYou can also compute [BLEU](https://en.wikipedia.org/wiki/BLEU) scores against the [flores200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md) dataset for the model by running:\n\n```bash\npython eval.py --config config.json --bleu\nBLEU score: 45.12354\n```\n\n## Convert Helsinki-NLP OPUS MT models\n\nLocomotive provides a convenient script to convert pre-trained models from [OPUS-MT](https://github.com/Helsinki-NLP/OPUS-MT-train) to make them compatible with LibreTranslate:\n\n```bash\npython opus_mt_convert.py -s en -t it\n```\n\nThis will attempt to automatically find/download the OPUS-MT's model archive from https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/ or https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/. This doesn't always work, and will not always pick the best model. You can specify a model archive manually by using the `--model-url` parameter:\n\nSome models also need a beginning of sentence (BOS) token for the model to work. You can specify a BOS token by using the `--bos` parameter:\n\n```\npython opus_mt_convert.py -s en -t vi --model-url https://object.pouta.csc.fi/Tatoeba-MT-models/eng-vie/opus+bt-2021-04-10.zip --bos \"\u003e\u003evie\u003c\u003c\"\n```\n\nTo run evaluation:\n\n```bash\npython eval.py --config run/en_it-opus_1.0/config.json\n```\n\nThe script is experimental. If you find issues, feel free to open a pull request!\n\n### Known Limitations\n\nSome models fail to execute with int8 quantization. If you get a lot of repeated words, try to set `-q float32` to keep full precision.\n\n## Contribute\n\nWant to share your model with the world? Post it on [community.libretranslate.com](https://community.libretranslate.com) and we'll include in future releases of LibreTranslate. Make sure to share both a forward and reverse model (e.g. `en =\u003e es` and `es =\u003e en`), otherwise we won't be able to include it in the model repository.\n\nWe also welcome contributions to Locomotive! Just open a pull request.\n\n## Use with LibreTranslate\n\nTo install the resulting .argosmodel file, locate the `~/.local/share/argos-translate/packages` folder. On Windows this is the `%userprofile%\\.local\\share\\argos-translate\\packages` folder. Then create a `[from-code]_[to-code]` folder (e.g. `en_es`). If it already exists, delete or move it.\n\nExtract the contents of the .argosmodel file (which is just a .zip file, you might need to change the extension to .zip) into this folder. Then restart LibreTranslate.\n\nYou can also install .argosmodel packages from Python:\n```\nimport pathlib\nimport argostranslate.package\npackage_path = pathlib.Path(\"/root/translate-en_it-2_0.argosmodel\")\nargostranslate.package.install_from_path(package_path)\n```\n\n## Credits\n\nIn no particular order, we'd like to thank:\n\n * [OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py)\n * [SentencePiece](https://github.com/google/sentencepiece)\n * [Stanza](https://github.com/stanfordnlp/stanza)\n * [argos-train](https://github.com/argosopentech/argos-train)\n * [OPUS](https://opus.nlpl.eu)\n\nFor making Locomotive possible.\n\n## License\n\nAGPLv3\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flibretranslate%2Flocomotive","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flibretranslate%2Flocomotive","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flibretranslate%2Flocomotive/lists"}