https://github.com/yzhangcs/ctc-copy
[EMNLP'23] Code for "Non-autoregressive Text Editing with Copy-aware Latent Alignments".
https://github.com/yzhangcs/ctc-copy
ctc non-autoregressive text-editing
Last synced: 5 months ago
JSON representation
[EMNLP'23] Code for "Non-autoregressive Text Editing with Copy-aware Latent Alignments".
- Host: GitHub
- URL: https://github.com/yzhangcs/ctc-copy
- Owner: yzhangcs
- Created: 2023-10-08T06:46:04.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-17T17:46:47.000Z (over 2 years ago)
- Last Synced: 2023-10-17T18:50:51.179Z (over 2 years ago)
- Topics: ctc, non-autoregressive, text-editing
- Language: Python
- Homepage: https://arxiv.org/abs/2310.07821
- Size: 50.8 KB
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Non-autoregressive Text Editing with Copy-aware Latent Alignments
1Soochow University, Suzhou, China
2Tencent AI Lab
[](https://yzhang.site/assets/pubs/emnlp/2023/ctc.pdf)
[](https://arxiv.org/abs/2310.07821)
[](https://www.semanticscholar.org/paper/Non-autoregressive-Text-Editing-with-Copy-aware-Zhang-Zhang/116277fd27c97d50bba2d8023d3c590c1ea8187b)


## Citation
If you are interested in our work, please cite
```bib
@inproceedings{zhang-etal-2023-ctc,
title = {Non-autoregressive Text Editing with Copy-aware Latent Alignments},
author = {Zhang, Yu and
Zhang, Yue and
Cui, Leyang and
Fu, Guohong},
booktitle = {Proceedings of EMNLP},
year = {2023},
address = {Singapore}
}
```
## Setup
The following packages should be installed:
* [`PyTorch`](https://github.com/pytorch/pytorch): >= 2.0
* [`Transformers`](https://github.com/huggingface/transformers)
* [`Errant`](https://github.com/chrisjbryant/errant)
Clone this repo recursively:
```sh
git clone https://github.com/yzhangcs/ctc-copy.git --recursive
```
You can follow this [repo](https://github.com/HillZhang1999/SynGEC) to obtain the 3-stage train/dev/test data for training a English GEC model.
The multilingual datasets are available [here](https://github.com/google-research-datasets/clang8).
Before running, you are required to preprocess each sentence pair into the format of `SRC:\t[src]\nTGT:\t[tgt]\n`, where `src` and `tgt` are the source and target sentences, respectively. Each sentence pair is separated by a blank line.
See [`data/clang8.toy`](data/clang8.toy) for examples.
## Run
Try the following command to train a 3-stage English model,
```sh
bash train.sh
```
To make predictions & evaluations:
```sh
bash pred.sh
```
## Contact
If you have any questions, please feel free to [email](mailto:yzhang.cs@outlook.com) me.