Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/machinalis/yalign
A sentence aligner for comparable corpora
https://github.com/machinalis/yalign
Last synced: 3 months ago
JSON representation
A sentence aligner for comparable corpora
- Host: GitHub
- URL: https://github.com/machinalis/yalign
- Owner: machinalis
- License: other
- Created: 2013-08-26T15:46:31.000Z (about 11 years ago)
- Default Branch: develop
- Last Pushed: 2016-05-19T15:55:06.000Z (over 8 years ago)
- Last Synced: 2024-07-11T23:46:21.364Z (4 months ago)
- Language: Python
- Size: 57.7 MB
- Stars: 127
- Watchers: 16
- Forks: 31
- Open Issues: 11
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-machine-translation - yalign - A sentence aligner for comparable corpora. (Aligners 🌌)
README
About
=====Yalign is a tool for extracting parallel sentences from comparable corpora.
`Statistical Machine Translation `_ relies on `parallel corpora `_ (eg.. `europarl `_) for training translation models. However these corpora are limited and take time to create. Yalign is designed to automate this process by finding sentences that are close translation matches from `comparable corpora `_. This opens up avenues for harvesting parallel corpora from sources like translated documents and the web.
Installation
============Yalign requires that you install `scikit-learn `_.
After that you can install Yalign from PyPi via pip:
::
sudo pip install yalign
Usage
=====Firstly we need to download and unpack the english to spanish model.
::
wget https://raw.githubusercontent.com/machinalis/yalign/develop/data/models/0.1/en-es.tar.gz
tar -xvzf en-es.tar.gzNow we can use the **yalign-align** script along with the english to spanish model to align two web pages.
::
yalign-align en-es http://en.wikipedia.org/wiki/Antiparticle http://es.wikipedia.org/wiki/Antipart%C3%ADcula
Yalign is not limited to any one language pair. By creating your own models you can align any two languages. For more details on how to use yalign and on yalign's implementation please `read the docs `_.
**The Yalign Team:**
Yalign is a `Machinalis `_ project.
You can view our other open source contributions `here `_.| Andrew Vine
| Gonzalo GarcÃa Berrotarán
| Rafael Carrascosa
| ElÃas Andrawos
| Laura Alonso Alemany