{"id":13595239,"url":"https://github.com/cooelf/AwesomeMRC","last_synced_at":"2025-04-09T10:33:14.869Z","repository":{"id":61724696,"uuid":"257520135","full_name":"cooelf/AwesomeMRC","owner":"cooelf","description":"IJCAI 2021 Tutorial \u0026 code for Retrospective Reader for Machine Reading Comprehension (AAAI 2021)","archived":false,"fork":false,"pushed_at":"2023-09-06T17:33:17.000Z","size":2183,"stargazers_count":360,"open_issues_count":3,"forks_count":69,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-05-23T00:01:14.298Z","etag":null,"topics":["question-answering","reading-comprehension","transformers"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2005.06249","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cooelf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-04-21T07:50:09.000Z","updated_at":"2024-04-09T09:21:57.000Z","dependencies_parsed_at":"2024-01-16T22:19:35.110Z","dependency_job_id":"989d0b6c-9d99-4f06-b42e-4af890b2401c","html_url":"https://github.com/cooelf/AwesomeMRC","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cooelf%2FAwesomeMRC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cooelf%2FAwesomeMRC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cooelf%2FAwesomeMRC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cooelf%2FAwesomeMRC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cooelf","download_url":"https://codeload.github.com/cooelf/AwesomeMRC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["question-answering","reading-comprehension","transformers"],"created_at":"2024-08-01T16:01:46.189Z","updated_at":"2025-04-09T10:33:09.860Z","avatar_url":"https://github.com/cooelf.png","language":"Python","funding_links":[],"categories":["Python","机器阅读理解"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# AwesomeMRC\n\n**update**\n\n* [model] Our SOTA SQuAD2.0 models are available at [CodaLab](https://worksheets.codalab.org/worksheets/0xac07322a21164c6fa3d740c455571768) for reproduction.\n\n* [ensemble] The code for model ensemble (grid search): [run_ensemble_grid.py](transformer-mrc/run_ensemble_grid.py)\n\n## Requirements\nThe codes are based on [Transformers](https://github.com/huggingface/transformers) v2.3.0. The dependencies are the same.\nYou can install the dependencies by `pip install transformers==2.3.0` \n\nor directly download the requirements file: https://github.com/huggingface/transformers/blob/v2.3.0/requirements.txt and run `pip install -r requirements`.\n\n## Summary\n\nLooking for a comprehensive and comparative review of MRC? check out our new survey paper: **[Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond](https://arxiv.org/abs/2005.06249)** (preprint, 2020).\n\nIn this work, MRC model is regarded as a two-stage Encoder-Decoder architecture. Our empirical analysis is shared in this repo. \n\n![](figures/overview.png)\n\n## Encoder:\n\n1) Language Units\n\n    [Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)](https://www.aclweb.org/anthology/C18-1153/)\n    \n    [Effective Subword Segmentation for Text Comprehension (TASLP)](https://arxiv.org/abs/1811.02364)\n\n2) Linguistic Knowledge\n\n    [Semantics-aware BERT for language understanding (AAAI 2020)](https://arxiv.org/abs/1909.02209)\n    \n    [SG-Net: Syntax-Guided Machine Reading Comprehension (AAAI 2020)](https://arxiv.org/abs/1908.05147)\n    \n    [LIMIT-BERT: Linguistic Informed Multi-Task BERT (preprint)](https://arxiv.org/pdf/1910.14296.pdf)\n\n3) Commonsense Injection\n\n    [Multi-choice Dialogue-Based Reading Comprehension with Knowledge and Key Turns (preprint)](https://arxiv.org/abs/2004.13988)\n\n4) Contextualized language models (CLMs) for MRC:\n\n![](figures/clm_examples.png)\n\n### Decoder:\n\nThe implementation is based on [Transformers](https://github.com/huggingface/transformers) v2.3.0. \n\nAs part of the techniques in our Retro-Reader paper:\n\n[Retrospective Reader for Machine Reading Comprehension (AAAI 2021)](https://arxiv.org/abs/2001.09694)\n\n### Answer Verification\n\n**1) Multitask-style verification**\n\n   We evaluate different loss functions \n    \n   *cross-entropy* (`run_squad_av.py`)\n   \n   *binary cross-entropy* (`run_squad_av_bce.py`)\n    \n   *mse regression*  (`run_squad_avreg.py`)\n\n**2) External verification**\n\n   Train an external verifier (`run_cls.py`)\n\n### Matching Network\n\n   *Cross Attention* (`run_squad_seq_trm.py`)\n    \n   *Matching Attention* (`run_squad_seq_sc.py`)\n\n\u003cu\u003eRelated Work\u003c/u\u003e:\n\n  [Modeling Multi-turn Conversation with Deep Utterance Aggregation (COLING 2018)](https://www.aclweb.org/anthology/C18-1317/)\n\n  [DCMN+: Dual Co-Matching Network for Multi-choice Reading Comprehension (AAAI 2020)](https://arxiv.org/pdf/1908.11511.pdf)\n\n### Answer Dependency\n\n   Model answer dependency (start + seq -\u003e end) (`run_squad_dep.py`)\n\n### Example: Retrospective Reader\n\n   1) train a sketchy reader (`sh_albert_cls.sh`)\n    \n   2) train an intensive reader (`sh_albert_av.sh`)\n    \n   3) rear verification: merge the prediction for final answer (`run_verifier.py`)\n    \n    SQuAD 2.0 Dev Results:\t\n    \n      ```\n      {\n      \"exact\": 87.75372694348522, \n      \"f1\": 90.91630165754992, \n      \"total\": 11873, \n      \"HasAns_exact\": 83.1140350877193, \n      \"HasAns_f1\": 89.4482539777485, \n      \"HasAns_total\": 5928, \n      \"NoAns_exact\": 92.38015138772077, \n      \"NoAns_f1\": 92.38015138772077, \n      \"NoAns_total\": 5945\n      }\n      ```\n\n### Question Classification\n   [One-shot Learning for Question-Answering in Gaokao History Challenge (COLING 2018)](https://www.aclweb.org/anthology/C18-1038/)\n\n### Citation\n\n```\n@article{zhang2020survey,\n  title={Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond},\n  author={Zhang, Zhuosheng and Zhao, Hai and Wang, Rui},\n  journal={arXiv preprint arXiv:2005.06249},\n  year={2020}\n}\n@inproceedings{zhang2021retrospective,\n  title={Retrospective reader for machine reading comprehension},\n  author={Zhang, Zhuosheng and Yang, Junjie and Zhao, Hai},\n  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n  volume={35},\n  number={16},\n  pages={14506--14514},\n  year={2021}\n}\n```\n## Related Records (best)\n\n[CMRC 2017](https://hfl-rc.github.io/cmrc2017/leaderboard/): The **best** single model (2017).\n\n[SQuAD 2.0](https://rajpurkar.github.io/SQuAD-explorer/): \nThe **best** among all submissions (both single and ensemble settings);\nThe **first** to surpass human benchmark on both EM and F1 scores with a single model (2019).\n\n[SNLI](https://nlp.stanford.edu/projects/snli/): The **best** among all submissions (2019-2020).\n\n[RACE](http://www.qizhexie.com/data/RACE_leaderboard.html): The **best** among all submissions (2019).\n\n[GLUE](https://gluebenchmark.com/): The **3rd best** among all submissions (early 2019).\n\n## Contact\n\nFeel free to email zhangzs [at] sjtu.edu.cn if you have any questions.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcooelf%2FAwesomeMRC","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcooelf%2FAwesomeMRC","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcooelf%2FAwesomeMRC/lists"}