Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nusnlp/gecmetrics
Code to evaluate the correlation of GEC metrics to human judgments.
https://github.com/nusnlp/gecmetrics
Last synced: 17 days ago
JSON representation
Code to evaluate the correlation of GEC metrics to human judgments.
- Host: GitHub
- URL: https://github.com/nusnlp/gecmetrics
- Owner: nusnlp
- Created: 2018-06-12T09:39:34.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-08-31T06:55:30.000Z (over 6 years ago)
- Last Synced: 2023-10-20T22:08:31.007Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 873 KB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## A Reassessment of Reference-Based Grammatical Error Correction Metrics
If you use the data/code from this repository, please cite the following [paper](http://aclweb.org/anthology/C18-1231):
```
@InProceedings{chollampatt2018reassessment,
author = {Chollampatt, Shamil and Ng, Hwee Tou},
title = {A Reassessment of Reference-Based Grammatical Error Correction Metrics},
booktitle = {Proceedings of the 27th International Conference on Computational Linguistics },
month = {August},
year = {2018},
address = {Santa Fe, New Mexico, USA},
url = {http://aclweb.org/anthology/C18-1231}
}
```The directory structure is as follows:
```
├── data
│ └── conll14st-test
│ ├── conll14st-test.m2
│ ├── conll14st-test.tok.src
│ └── refs
│ ├── conll14st-test.tok.trg0
│ └── conll14st-test.tok.trg1
├── README.md
├── run.sh
├── scores
│ ├── sentence_pairwiseranks_humans
│ │ ├── expanded.csv.gz
│ │ └── unexpanded.csv.gz
│ ├── sentence_scores_metrics
│ │ ├── gleu.txt.gz
│ │ ├── imeasure.txt.gz
│ │ └── m2score.txt.gz
│ ├── system_scores_humans
│ │ ├── expected_wins.txt.gz
│ │ └── trueskill.txt.gz
│ └── system_scores_metrics
│ ├── gleu.txt.gz
│ ├── imeasure.txt.gz
│ └── m2score.txt.gz
├── scripts
│ ├── sentence_correlation.py
│ └── system_correlation.py
└── tools
└── significance-williams```
* The `scores/system_scores_{humans,metrics}`/ directory contains human and metric scores at system level
* The `scores/sentence_scores_metrics}`/ directory contains metric scores at sentence level.
* The `scores/sentence_pairwiseranks_humans}`/ directory contains human pairwise rankings of system output sentences.
* Human judgments are obtained from: [https://github.com/grammatical/evaluation/](https://github.com/grammatical/evaluation/)
* Three automatic GEC metrics are used:
1. [GLEU](https://github.com/cnap/gec-ranking/commit/50b5032a4ef2444b9381fb47a55b3bac0654a6d7)
2. [I-measure](https://github.com/mfelice/imeasure/commit/fc79fdfd36d338299274b8a357c3cd09cc19d8a5)
3. [MaxMatch or M^2 score](https://github.com/nusnlp/m2scorer/tree/2122ffd0f7a17b6e969131e42fa3a4eae7cff389)* Data used to run metrics are from CoNLL-2014 shared task (given in `data/` directory)
* Scripts to find system-level and sentence-level correlations are adapted from WMT (given in `scripts/` directory)
* William's significance test was done using the code in `tools/significance-williams/` directory (originally from https://github.com/ygraham/significance-williams)
#### Running
To run the system and obtain system-level (+significance tests) and sentence-level scores, run:
`./run.sh`The results are stored in `results/` directory.