https://github.com/amake/moses-smt
Dock You a Moses: Moses Statistical MT in a container
https://github.com/amake/moses-smt
docker-image moses repl tmx
Last synced: about 1 year ago
JSON representation
Dock You a Moses: Moses Statistical MT in a container
- Host: GitHub
- URL: https://github.com/amake/moses-smt
- Owner: amake
- License: mit
- Created: 2016-04-26T08:32:13.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2020-02-18T23:49:08.000Z (over 6 years ago)
- Last Synced: 2025-03-24T12:39:11.873Z (about 1 year ago)
- Topics: docker-image, moses, repl, tmx
- Language: Makefile
- Homepage:
- Size: 49.8 KB
- Stars: 13
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Dock You a Moses
Want to play with the [Moses](http://www.statmt.org/moses/) Statistical Machine
Translation system, but...
- You don't have time to get a PhD in Setting Up Moses?
- You have TMX files (or structured bilingual text files easily convertible to
TMX) and want to use them with Moses without doing all the munging yourself?
Well now you don't have to, because I stuffed Moses in a Docker container for
you.
# What is this?
- A full Moses + MGIZA installation in a Docker image: `amake/moses-smt:base` on
[Docker Hub](https://hub.docker.com/r/amake/moses-smt/)
- A [`make`](https://www.gnu.org/software/make/)-based set of commands for
easily
- Converting TMX files into Moses-ready corpus files: `make corpus`
- Training and tuning Moses: `make train`
- Building Docker images of trained Moses instances: `make build`
- Deploying trained Moses instances to Docker Hub/Amazon Elastic Beanstalk:
`make deploy-hub`
- Some peripheral tools:
- A simple REPL for querying Moses over XML-RPC: `mosesxmlrpcrepl.py` or `make
repl`
# Requirements
- make
- Docker
- Python 3 with pip and virtualenv
- OS X? (not tested elsewhere)
- Some TMX files ([Okapi](http://okapi.opentag.com/) Rainbow is a good tool for
converting structured bilingual files to TMX)
# Usage
First, if trying to build the base image, you might need to re-balance the
number of cores vs memory available to Docker: e.g. 8 cores but only 2 GB of
memory results in compilation failures. 4 cores with 4 GB seems to work better.
1. Put most of your TMXs in `tmx-train`, and the rest in `tmx-tune`.
2. Run `make SOURCE_LANG= TARGET_LANG= [LABEL=]`.
- `src` and `trg` (required) are the language codes (*not* language + country)
for your source and target languages, e.g. `en` and `fr`.
- `lbl` is an optional label for the resulting image; `myinstance` by default.
3. Wait forever.
4. When done, you will have a Docker image tagged `moses-smt:--`.
- Run `make server SOURCE_LANG= TARGET_LANG= [PORT=]` to start
[`mosesserver`](http://www.statmt.org/moses/?n=Advanced.Moses#ntoc1) which
you can query over XML-RPC.
- Optionally specify a port; the default is `8080`.
## What then?
- Train a new image with swapped languages or with a new set of TMXs.
- Use a trained instance for translation in OmegaT with the [omegat-moses-mt
plugin](https://github.com/amake/omegat-moses-mt):
- Run `make server` to run the server locally; the `moses.server.url` value is
then `http://localhost:8080/RPC2`
- Run `make deploy-hub` and then upload the .zip that's produced as a new EB
environment