{"id":29552182,"url":"https://github.com/abachaa/MEDEC","last_synced_at":"2025-07-18T05:03:07.201Z","repository":{"id":266351564,"uuid":"898095320","full_name":"abachaa/MEDEC","owner":"abachaa","description":null,"archived":false,"fork":false,"pushed_at":"2025-05-31T01:14:44.000Z","size":1982,"stargazers_count":36,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-31T12:05:27.933Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abachaa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-03T19:26:44.000Z","updated_at":"2025-05-31T01:14:47.000Z","dependencies_parsed_at":"2025-05-31T02:48:27.079Z","dependency_job_id":"eed735c1-2ad2-4707-b743-c6464f71bb48","html_url":"https://github.com/abachaa/MEDEC","commit_stats":null,"previous_names":["abachaa/medec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abachaa/MEDEC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abachaa%2FMEDEC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abachaa%2FMEDEC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abachaa%2FMEDEC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abachaa%2FMEDEC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abachaa","download_url":"https://codeload.github.com/abachaa/MEDEC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abachaa%2FMEDEC/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265703083,"owners_count":23813925,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-18T05:01:30.523Z","updated_at":"2025-07-18T05:03:07.169Z","avatar_url":"https://github.com/abachaa.png","language":null,"funding_links":[],"categories":["5. 数据集","🌐 Resources \u0026 Tools"],"sub_categories":["5.1 评测基准","Benchmarks \u0026 Datasets"],"readme":"# MEDEC Dataset\n\nMEDEC is the first dataset for medical error detection and correction in clinical notes. \n\n\u003ca href=\"url\"\u003e\u003cimg src=\"https://github.com/abachaa/MEDEC/blob/main/medec-exps\" align=\"right\" height=\"450\" width=\"550\" \u003e\u003c/a\u003e\n\nIt includes 3,848 clinical texts from the MS and UW collections covering five types\nof errors (Diagnosis, Management, Treatment, Pharmacotherapy, and Causal Organism).  \n\n  - The Training Set contains 2,189 MS texts. \n  - The MS Validation Set contains 574 clinical texts.\n  - The UW Validation Set contains 160 clinical texts.\n  - The MS Test includes 597 MS texts\n  - The UW Test set includes 328 UW texts.\n\nEach clinical text is either correct or contains one error. The task consists in: \n  - (A) predicting the error flag (1: the text contains an error, 0: the text has no errors)\n  - For flagged texts (with error):\n    - (B) extracting the sentence that contains the error, and\n    - (C) generating a corrected sentence.\n\n**DATA:**\n- **MEDEC-MS Collection:** https://github.com/abachaa/MEDEC/tree/main/MEDEC-MS (MEDEC-MS Training, Validation, and Test Sets)\n- **MEDEC-UW Collection:** Please send an email to medec-uw@googlegroups.com to receive the UW Data Usage Agreement (DUA) required to have access to the MEDEC-UW Validation \u0026 Test sets.\n\n\nMEDEC Paper\n=================\n\n- **PDF**: https://arxiv.org/pdf/2412.19260\n- **Abstract**: Several studies showed that Large Language Models (LLMs) can answer medical questions correctly,\neven outperforming the average human score in some medical exams. However, to our knowledge,\nno study has been conducted to assess the ability of language models to validate existing or generated\nmedical text for correctness and consistency. In this paper, we introduce MEDEC, the first publicly\navailable benchmark for medical error detection and correction in clinical notes, covering five types\nof errors (Diagnosis, Management, Treatment, Pharmacotherapy, and Causal Organism). MEDEC\nconsists of 3,848 clinical texts, including 488 clinical notes from three US hospital systems that were\nnot previously seen by any LLM. The dataset has been used for the MEDIQA-CORR shared task\nto evaluate seventeen participating systems [Ben Abacha et al., 2024]. In this paper, we describe\nthe data creation methods and we evaluate recent LLMs (e.g., o1-preview, GPT-4, Claude 3.5\nSonnet, and Gemini 2.0 Flash) for the tasks of detecting and correcting medical errors requiring\nboth medical knowledge and reasoning capabilities. We also conducted a comparative study where\ntwo medical doctors performed the same task on the MEDEC test set. The results showed that\nMEDEC is a sufficiently challenging benchmark to assess the ability of models to validate existing\nor generated notes and to correct medical errors. We also found that although recent LLMs have a\ngood performance in error detection and correction, they are still outperformed by medical doctors in\nthese tasks. We discuss the potential factors behind this gap, the insights from our experiments, the\nlimitations of current evaluation metrics, and share potential pointers for future research.\n\n     \n\nMEDIQA-CORR Shared Task  \n=================\n\nThe dataset has been introduced for the first shared task on medical error detection and correction: MEDIQA-CORR @ NAACL-ClinicalNLP 2024  \n\n* **Website:** https://sites.google.com/view/mediqa2024/mediqa-corr\n* **Shared Task Paper:** https://aclanthology.org/2024.clinicalnlp-1.57.pdf\n* **GitHub:** https://github.com/abachaa/MEDIQA-CORR-2024\n\nEvaluation\n=================\n\nEvaluation metrics and scripts: [https://github.com/abachaa/MEDIQA-CORR-2024 ](https://github.com/abachaa/MEDIQA-CORR-2024/tree/main/evaluation) \n\n## \u003ch2\u003eLicense\u003c/h2\u003e\n- This work is published under a Creative Commons Attribution 4.0 International License ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)). Please cite our paper: \n    \n        @article{medec,\n          author     = {Asma {Ben Abacha} and Wen-wai Yim and Yujuan Fu and Zhaoyi Sun and Meliha Yetisgen and Fei Xia and Thomas Lin},\n          title      = {MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes},\n          journal    = {CoRR}, \n          eprinttype = {arXiv},\n          url        = {https://arxiv.org/pdf/2412.19260}, \n          year       = {2024}\n          }\n\n\nContact\n=================\n\n    -  Asma Ben abacha (abenabacha at microsoft dot com)\n    -  Wen-wai Yim (yimwenwai at microsoft dot com)\n----\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabachaa%2FMEDEC","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabachaa%2FMEDEC","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabachaa%2FMEDEC/lists"}