{"id":24064385,"url":"https://github.com/nusnlp/greco","last_synced_at":"2025-04-23T18:43:58.550Z","repository":{"id":225802292,"uuid":"709167932","full_name":"nusnlp/greco","owner":"nusnlp","description":"The official code for the \"System Combination via Quality Estimation for Grammatical Error Correction\" paper, published in EMNLP 2023.","archived":false,"fork":false,"pushed_at":"2024-03-04T10:49:29.000Z","size":8245,"stargazers_count":14,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T02:23:21.348Z","etag":null,"topics":["deep-learning","ensemble-model","gec","grammatical-error-correction","pytorch","quality-estimation","re-ranking"],"latest_commit_sha":null,"homepage":"https://aclanthology.org/2023.emnlp-main.785","language":"Macaulay2","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nusnlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-24T06:54:06.000Z","updated_at":"2025-02-22T12:03:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"a6a5ebdb-277d-49a2-aa56-1dbf061996fd","html_url":"https://github.com/nusnlp/greco","commit_stats":null,"previous_names":["nusnlp/greco"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nusnlp%2Fgreco","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nusnlp%2Fgreco/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nusnlp%2Fgreco/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nusnlp%2Fgreco/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nusnlp","download_url":"https://codeload.github.com/nusnlp/greco/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250494162,"owners_count":21439939,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","ensemble-model","gec","grammatical-error-correction","pytorch","quality-estimation","re-ranking"],"created_at":"2025-01-09T10:25:03.799Z","updated_at":"2025-04-23T18:43:58.544Z","avatar_url":"https://github.com/nusnlp.png","language":"Macaulay2","funding_links":[],"categories":[],"sub_categories":[],"readme":"# System Combination via Quality Estimation for Grammatical Error Correction\r\nThis repository provides the code to easily score, re-rank, and combine corrections from Grammatical Error Correction (GEC) models, as reported in this paper:\r\n\u003e System Combination via Quality Estimation for Grammatical Error Correction \u003cbr\u003e\r\n\u003e [Muhammad Reza Qorib](https://mrqorib.github.io/) and [Hwee Tou Ng](https://www.comp.nus.edu.sg/~nght/) \u003cbr\u003e\r\n\u003e The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) ([PDF](https://aclanthology.org/2023.emnlp-main.785.pdf))\r\n\r\n## Installation\r\nPlease install the necessary libraries by running the following commands:\r\n```\r\npip install -e requirements.txt\r\nwget -P models https://sterling8.d2.comp.nus.edu.sg/~reza/GRECO/checkpoint.bin\r\nwget https://www.comp.nus.edu.sg/~nlp/sw/m2scorer.tar.gz\r\ntar -xf m2scorer.tar.gz\r\n```\r\nPlease check whether the installed PyTorch matches your hardware CUDA version.\r\n\r\nTo also run other quality estimation models, please run the following commands:\r\n```\r\ngit clone https://github.com/nusnlp/neuqe\r\ngit clone https://github.com/thunlp/VERNet\r\ngit clone https://github.com/kokeman/SOME\r\n```\r\nAnd download the model checkpoints from \r\n- https://github.com/nusnlp/neuqe to `checkpoints/neuqe` folder.\r\n- https://github.com/thunlp/VERNet/ to `checkpoints/vernet` folder.\r\n- https://github.com/kokeman/SOME to `checkpoints/some` folder.\r\n\r\n## Quality Estimation\r\n### Scoring hypotheses in your code\r\nYou can import the GRECO class from `models.py`, instantiate the class, and pass the source(s) and hypotheses (in the form of python list of strings) to the `.score()` function.\r\n```\r\nimport torch\r\nfrom models import GRECO\r\n\r\nmodel = GRECO('microsoft/deberta-v3-large').to(device)\r\nmodel.load_state_dict(torch.load('models/checkpoint.bin))\r\nmodel.score(source, hyphoteses)\r\n```\r\n\r\n### Correlation coefficient\r\nGet the scores on all text by running this command. In this example, we will also score the text with SOME.\r\n```\r\npython score_all.py --auto --data_dir data/conll-official/texts --output_path outputs/greco_scores.json --model greco --lm_model microsoft/deberta-v3-large --checkpoint models/checkpoint.bin --source_file data/conll-source.txt --batch_size 16\r\npython score_all.py --auto --data_dir data/conll-official/texts --output_path outputs/some_scores.json --model some --source_file data/conll-source.txt --batch_size 16\r\n```\r\nGet the gold F0.5 score for each sentence by running this command.\r\n```\r\npython m2_for_corr.py --data_dir data/conll-official/reports --scorer m2scorer --output_path outputs/target.json\r\n```\r\n\r\nCalculate the correlation by running this command\r\n```\r\npython correlation.py --system_A outputs/greco_scores.json --system_B outputs/some_scores.json --target outputs/target.json --metric spearman\r\n```\r\n\r\n## Re-ranking\r\n### Reproducing re-ranking F0.5 score\r\nRun the following to re-rank the corrections\r\n```\r\npython rerank.py --data_dir data/conll-official/texts --source_file data/conll-source.txt --auto --output_path outputs/greco_rerank.out --model greco --lm_model microsoft/deberta-v3-large --checkpoint models/checkpoint.bin --batch_size 16\r\n```\r\nRun the following to get the F0.5 score\r\n```\r\npython2 m2scorer/scripts/m2scorer.py outputs/greco_rerank.out data/conll-2014.m2\r\n```\r\n\r\n### Re-ranking your top-_k_ model outputs\r\nYou can run the same command as above but change the data path in the `--data_dir` argument. For all _k_, print the _k_-th best correction for each source sentence into a single file inside a folder, and pass that folder path to the `--data_dir` argument. The code will read all files inside that folder. You can check the `data/conll-official/texts` as an example.\r\n\r\n## System Combination\r\n### Reproducing system combination F0.5 score\r\nRun the following command to reproduce the BEA-2019 test result\r\n```\r\npython run_combination.py --model greco --lm_model microsoft/deberta-v3-large --output_path outputs/bea-test.out --beam_size 16 --batch_size 16 --checkpoint models/checkpoint.bin --data data/test-m2/Riken-Tohoku.m2 data/test-m2/Kakao-Brain.m2 data/test-m2/UEDIN-MS.m2 data/test-m2/T5-Large.m2 data/test-m2/GECToR-XLNet.m2 data/test-m2/GECToR-Roberta.m2 --vote_coef 0.4 --edit_scores edit_scores/bea-test_score.json --score_ratio 0.7\r\n```\r\nThen, compress outputs/bea-test.out into a zip file and upload it to https://codalab.lisn.upsaclay.fr/competitions/4057#participate\r\n\r\nRun the following command to reproduce the CoNLL-2014 test result\r\n```\r\npython run_combination.py --model greco --lm_model microsoft/deberta-v3-large --output_path outputs/conll-2014.out --beam_size 16 --batch_size 16 --checkpoint models/checkpoint.bin --data data/conll-m2/Riken-Tohoku.m2 data/conll-m2/UEDIN-MS.m2 data/conll-m2/T5-Large.m2 data/conll-m2/GECToR-XLNet.m2 data/conll-m2/GECToR-Roberta.m2 --vote_coef 0.4\r\n```\r\n\r\nRun the following to get the F0.5 score\r\n```\r\npython2 m2scorer/scripts/m2scorer.py outputs/conll-2014.out data/conll-2014.m2\r\n```\r\n\r\n## Retraining the model\r\nRun the following command to train a new model\r\n```\r\npython train.py --do_train --model_name_or_path microsoft/deberta-v3-large --output_dir models/new_model --learning_rate 2e-5 --word_dropout 0.25 --save_strategy epoch --per_device_train_batch_size 32 --gradient_accumulation_steps 4 --num_train_epochs 15 --alpha 1 --data data/train.json --data_mode hierarchical --edit_weight 2.0 --rank_multiplier 5\r\n```\r\n\r\n## License\r\nThe source code and models in this repository are licensed under the GNU General Public License Version 3 (see [License](./LICENSE.txt)). For commercial use of this code and models, separate commercial licensing is also available. Please contact Hwee Tou Ng (nght@comp.nus.edu.sg)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnusnlp%2Fgreco","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnusnlp%2Fgreco","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnusnlp%2Fgreco/lists"}