https://github.com/ubisoft/ubisoft-laforge-binaryalignwordalignementasbinarysequencelabeling
repo on the BinaryAlign: Word Alignment as Binary Sequence Labeling
https://github.com/ubisoft/ubisoft-laforge-binaryalignwordalignementasbinarysequencelabeling
Last synced: 11 months ago
JSON representation
repo on the BinaryAlign: Word Alignment as Binary Sequence Labeling
- Host: GitHub
- URL: https://github.com/ubisoft/ubisoft-laforge-binaryalignwordalignementasbinarysequencelabeling
- Owner: ubisoft
- License: other
- Created: 2024-05-28T18:53:37.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-08T19:07:11.000Z (almost 2 years ago)
- Last Synced: 2024-08-13T17:06:32.175Z (almost 2 years ago)
- Language: Python
- Size: 22.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
© [2024] Ubisoft Entertainment. All Rights Reserved
# BinaryAlign: Word Alignment as Binary Sequence Labeling
## Introduction
**BinaryAlign** reformulates word alignment as a set of binary sequence labeling tasks. It outperforms existing approaches in both high and low-resource language settings, providing a unified approach to word alignment. This repository contains the code and models for BinaryAlign as described in [our paper](https://arxiv.org/pdf/2407.12881) accepted to the main conference of ACL 2024.
## Datasets
### Format
* source (.src)
```
He has a sofa .
```
* target (.tgt)
```
Il a un canapé .
```
* gold alignment (.talp)
```
1-1 2-2 3-3 4-4 5-5
```
### Data
We used the same datasets as https://github.com/sufenlp/AccAlign.
## Training
```shell
bash train.sh
```
## Evaluation
```shell
bash eval.sh
```
## Checkpoints
| Training Languages |Link |
| ------------- | ------------- |
| Align6 | models/align6 |
| deen | models/deen |
| roen | models/roen |
| fren | models/fren |
| zhen | models/zhen |
| jaen | models/jaen |
## Citation
```
@article{latouche2024binaryalign,
title={BinaryAlign: Word Alignment as Binary Sequence Labeling},
author={Latouche, Gaetan Lopez and Carbonneau, Marc-Andr{\'e} and Swanson, Ben},
journal={arXiv preprint arXiv:2407.12881},
year={2024}
}
```
## License
See Licence File - CC4.0 non commercial
© [2024] Ubisoft Entertainment. All Rights Reserved