Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wellecks/word_aligner
Word Aligner for Machine Translation
https://github.com/wellecks/word_aligner
Last synced: 17 days ago
JSON representation
Word Aligner for Machine Translation
- Host: GitHub
- URL: https://github.com/wellecks/word_aligner
- Owner: wellecks
- Created: 2014-01-26T21:11:45.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-02-07T04:11:50.000Z (almost 11 years ago)
- Last Synced: 2024-10-29T20:12:56.642Z (2 months ago)
- Language: FORTRAN
- Homepage:
- Size: 16 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
##Word Aligner
####CIS526, Machine Translation, HW1
**Sean Welleck**
This project is related to the problem of aligning words from a source and target language.
The project contains three models:
- IBM Model 1
- IBM Model 2
- Bayesian AlignerAnd symmetrization to combine the results of two models.
Run ```python run_alignment.py > output.txt``` to train the models and output alignments to output.txt.
-----
#####model.py
Contains the model implementations, ```IBMM1()```, ```IBMM2()```, ```BayesM()```.Each model extends the ```Model()``` class and must implement the ```train()``` and ```align()``` functions.
#####aligner.py
Contains top level functions for using the models:
```python
# loading data
data = aligner.load_input(e_file, f_file, num_sents)
```
```python
# training models
ibm_model1 = aligner.train_model(IBMM1(), data, num_iters)
ibm_model2 = aligner.train_model(IBMM2(), data, num_iters)
```
```python
# getting alignments using a trained model
m1_alignments = aligner.align(ibm_model1, data)
m2_alignments = aligner.align(ibm_model2, data)
```
```python
# symmetrizing output from two models
sym_alignments = aligner.symmetrize_all(m1_alignments, m2_alignments)
```
```python
# printing alignments
print_output(sym_alignments)
```