https://github.com/ukplab/coling2016-pcrf-seq2seq
An adaptation of MarMot morphological tagger for generic sequence-to-sequence tasks
https://github.com/ukplab/coling2016-pcrf-seq2seq
Last synced: 9 months ago
JSON representation
An adaptation of MarMot morphological tagger for generic sequence-to-sequence tasks
- Host: GitHub
- URL: https://github.com/ukplab/coling2016-pcrf-seq2seq
- Owner: UKPLab
- Created: 2016-10-10T12:52:29.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-12-29T14:17:25.000Z (over 6 years ago)
- Last Synced: 2025-06-18T03:10:03.977Z (12 months ago)
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 10
- Watchers: 29
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PCRF-Seq2Seq
An adaptation of the MarMot higher-order CRF tagger for generic sequence-to-sequence tasks from [our paper](http://aclweb.org/anthology/C16-1160).
Please use the following citation:
```
@inproceedings{Schnober:2016:Coling,
author = {Carsten Schnober and Steffen Eger and Erik-Lân Do Dinh and Iryna Gurevych},
title = {Still not there? Comparing Traditional Sequence-to-Sequence Models to
Encoder-Decoder Neural Networks on Monotone String Translation Tasks},
month = dec,
year = {2016},
booktitle = {Proceedings of the 26th International Conference on Computational
Linguistics (COLING)},
pages = {(1703--1714)},
location = {Osaka, Japan},
language = {English},
}
```
> **Abstract:** We analyze the performance of encoder-decoder neural models and compare them with well-known established methods. The latter represent different classes of traditional approaches that are applied to the monotone sequence-to-sequence tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion, and lemmatization.
Such tasks are of practical relevance for various higher-level research fields including \textit{digital humanities}, automatic text correction, and speech recognition.
We investigate how well generic deep-learning approaches adapt to these tasks, and how they perform in comparison with established and more specialized methods, including our own adaptation of pruned CRFs.
Contact persons:
* Carsten Schnober, schnober@ukp.informatik.tu-darmstadt.de
* Steffen Eger, eger@aiphes.tu-darmstadt.de
* Erik-Lân Do Dinh, dodinh@ukp.informatik.tu-darmstadt.de
http://www.ukp.tu-darmstadt.de/
http://www.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
> This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
## Project structure
* `src` -- this folder contains the code and detailed instructions
* `src/data/` -- sample data from the Twitter typo corpus
## Requirements
See [src/README.md](src/README.md) for details!
* [Marmot](https://github.com/muelletm/cistern/) morphological tagger
* [m2m-aligner](https://github.com/letter-to-phoneme/m2m-aligner)
## Installation and Running
See [src/README.md](src/README.md) for details!