Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tomtung/happy-tree-align
Burkett, D., & Klein, D. (2012). Transforming trees to improve syntactic convergence
https://github.com/tomtung/happy-tree-align
Last synced: 20 days ago
JSON representation
Burkett, D., & Klein, D. (2012). Transforming trees to improve syntactic convergence
- Host: GitHub
- URL: https://github.com/tomtung/happy-tree-align
- Owner: tomtung
- Created: 2013-01-15T00:48:37.000Z (almost 12 years ago)
- Default Branch: master
- Last Pushed: 2013-05-03T23:08:19.000Z (over 11 years ago)
- Last Synced: 2024-10-05T21:07:40.307Z (about 1 month ago)
- Language: Scala
- Homepage:
- Size: 266 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Happy Tree Align [![Build Status](https://api.travis-ci.org/tomtung/happy-tree-align.png)](https://travis-ci.org/tomtung/happy-tree-align)
This is a implementation of the method described in [Burkett, D., & Klein, D. (2012) Transforming trees to improve syntactic convergence](http://www.cs.berkeley.edu/~dburkett/papers/burkett12-tree_transform.pdf). For binlingual corpora, it learns syntax tree transformation rules to improve their agreement with corresponding word alignments. This improves the performance of syntactic machine translation systems.
The name comes from the author's keynote [Happy Trees are Better than Correct Trees](http://www.cs.berkeley.edu/~dburkett/slides/burkett12-tree_transform-slides.pdf).
## Usage
The program can be built with [sbt](http://www.scala-sbt.org/).
`sbt one-jar` will produce a stand-alone jar package callable form command line.
```
Usage: happy-tree-align learn [options]-n | --n-trans
max number of transformations to learn (default 200)
-o | --out
output path (print to stdout by default)
path to training tree file
path to training alignment file
optional, path to dev tree file
optional, path to dev alignment fileUsage: happy-tree-align apply [options]
-o | --out
output path. print to stdout by default
-n | --n-trans
use first n transformations (use all by default)
transformation sequence file
input tree file
optional, path to alignment file. if provided, report agreement scores after each transformationUsage: happy-tree-align score [options]
-a | --all
output scores for all trees
-o | --out
output path. print to stdout by default
tree file
alignment file
```## Peformance
In our experiment, using a 3000-sentence training data set, and a 1000-sentence development data set, on a duo core 2.53GHz machine, it takes 16 minutes to learn the first 100 rules, and 70 minutes to learn all 3979 rules.
The implementation is parallelized, so it should take less time with more cores available.