{"id":13605115,"url":"https://github.com/neemakot/Health-Fact-Checking","last_synced_at":"2025-04-12T02:32:46.125Z","repository":{"id":227547419,"uuid":"305856371","full_name":"neemakot/Health-Fact-Checking","owner":"neemakot","description":"Dataset and code for \"Explainable Automated Fact-Checking for Public Health Claims\" from EMNLP 2020.","archived":false,"fork":false,"pushed_at":"2021-04-27T08:08:40.000Z","size":24862,"stargazers_count":53,"open_issues_count":2,"forks_count":9,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-07T09:44:19.409Z","etag":null,"topics":["emnlp2020","explainable-ai","explainable-ml","fact-checking","fake-news","fake-news-detection","public-health"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2010.09926","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neemakot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-20T23:24:36.000Z","updated_at":"2024-10-28T07:55:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"880f867f-cafd-4d8e-bfd9-3d223bfb0a0c","html_url":"https://github.com/neemakot/Health-Fact-Checking","commit_stats":null,"previous_names":["neemakot/health-fact-checking"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemakot%2FHealth-Fact-Checking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemakot%2FHealth-Fact-Checking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemakot%2FHealth-Fact-Checking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neemakot%2FHealth-Fact-Checking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neemakot","download_url":"https://codeload.github.com/neemakot/Health-Fact-Checking/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248506928,"owners_count":21115509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["emnlp2020","explainable-ai","explainable-ml","fact-checking","fake-news","fake-news-detection","public-health"],"created_at":"2024-08-01T19:00:54.826Z","updated_at":"2025-04-12T02:32:46.118Z","avatar_url":"https://github.com/neemakot.png","language":"Python","funding_links":[],"categories":["Dataset"],"sub_categories":["Only Text"],"readme":"# Explainable Fact-Checking for Public Health Claims\n\nThis repository contains data and code for the paper [Explainable Fact-Checking for Public Health Claims (Kotonya and Toni, 2020)](https://arxiv.org/abs/2010.09926). This research will be presented at The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020).\n\n\n\n\n## Introduction\n\nFact-checking is the task of verifying claims (i.e., distinguishing between false stories and facts) by assessing the  assertions made by claims against credible evidence. The vast majority of fact-checking studies focus exclusively on political claims. Very little research explores fact-checking for other topics, specifically subject matters for which _expertise_ is required. We present the first study in [explainable fact-checking](https://neemakot.github.io/project/survey/) for claims which require specific expertise. \n\nFor our case study we choose the setting of public health. To support this, we construct a new dataset __PUBHEALTH__ of 11.8K claims accompanied by journalist-crafted, gold standard explanations (i.e., judgments) to support the fact-check labels for claims. We explore two tasks: veracity prediction and explanation generation. We also define and evaluate, with humans and computationally, three coherence properties of explanation quality. Our results indicate that, by training on in-domain data, gains can be made in explainable, automated fact-checking for claims which require specific expertise.\n\n\n## Data\n\n### PUBHEALTH fact-checking dataset\n\nWe present __PUBHEALTH__, a comprehensive dataset for explainable automated fact-checking of public health claims. Each instance in the __PUBHEALTH__ dataset has an associated veracity label (true, false, unproven, mixture). Furthermore each instance in the dataset has an _explanation_ text field. The explanation is a justification for which the claim has been assigned a particular veracity label. \n\nThe dataset can be [downloaded here](https://drive.google.com/file/d/1eTtRs5cUlBP5dXsx-FTAlmXuB6JQi2qj/view). \n\nOR\n\nThe dataset can be acquired using the following commands\n\n```\n cd src\n ./download_data.sh\n```\n\nThe following is an example instance of the __PUBHEALTH__ dataset:\n\n|  Field              |  Example                                                     |\n| -----------------   | -------------------------------------------------------------|\n| __claim__  \t      | Expired boxes of cake and pancake mix are dangerously toxic. |\n| __explanation__     | What's True:  Pancake and cake mixes that contain mold can cause life-threatening allergic reactions. What's False: Pancake and cake mixes that have passed their expiration dates are not inherently dangerous to ordinarily healthy people, and the yeast in packaged baking products does not \"over time develops spores.\" |\n| __label__           |  mixture                                                     |\n| __claim URL__       | https://www.snopes.com/fact-check/expired-cake-mix/          |\n| __author(s)__       | David Mikkelson                                              | \n| __date published__  | April 19, 2006                                               |\n| __tags__            | food, allergies, baking, cake                                |\n| __main_text__        |   In April 2006, the experience of a 14-year-old who had eaten pancakes made from a mix that had gone moldy was described in the popular newspaper column Dear Abby. The account has since been circulated widely on the Internet as scores of concerned homemakers ponder the safety of the pancake and other baking mixes lurking in their larders [...]       |\n| __evidence sources__    | [1] Bennett, Allan and Kim Collins.  “An Unusual Case of Anaphylaxis: Mold in Pancake Mix.” American Journal of Forensic Medicine \u0026 Pathology.   September 2001   (pp. 292-295). [2] Phillips, Jeanne.   “Dear Abby.” 14 April 2006   [syndicated column]. |\n\nMore information about the __PUBHEALTH__ dataset can be found in [DATASHEET.md](data/DATASHEET.md) and [README.md](data/README.md) provided under under ``data/``, including test/train/dev splits, and data collection and processing information.\n\n\n### PUBHEALTH evidence documents\n\nWe have are also collecting the original evidence documents cited in the fact-checking articles. We are currently updating this collection, however the current version can be downloaded using the following commands\n\n```\n cd src\n ./download_evidence_docs.sh\n```\n\nAlternatively, you can download the evidence documents [here](https://drive.google.com/file/d/1qDjbniulHhSI73JoZHs3eWdVPQBMH2Gt/view?usp=sharing).\n\nThe evidence documents are all text files with names formatted as ```doc_\u003cCLAIM_ID\u003e_\u003cEVIDENCE_NUMBER\u003e.txt```.\n\n\n## Requirements\n\nThis project is built using Py36 and Tensorflow. To install the dependencies use the following command\n\n```\npip install -r requirements.txt\n```\n\nThere is the full list of requirements including versions:\n\n* [Python 3.6](https://www.python.org/downloads/release/python-360/)\n\n_Machine Learning, NLP, evaluation and visualization packages_:\n* [bert](https://pypi.org/project/bert/)==2.2.0\n* [bleach](https://pypi.org/project/bleach/)==3.0.2\n* [beautifulsoup4](https://pypi.org/project/beautifulsoup4/)==4.8.2\n* [Keras](https://pypi.org/project/Keras/)==2.3.1\n* [matplotlib](https://pypi.org/project/matplotlib/)==3.0.1\n* [numpy](https://pypi.org/project/numpy/)==1.18.1\n* [py-rouge](https://pypi.org/project/py-rouge)==1.1\n* [PyYAML](https://pypi.org/project/PyYAML/)==5.3.1\n* [PyPDF2](https://pypi.org/project/PyPDF2/)==1.26.0\n* [requests](https://pypi.org/project/requests/)==2.13.0\n* [scikit-learn](https://pypi.org/project/scikitlearn/)==0.1.1\n* [sentence-transformers](https://pypi.org/project/sentence-transformers/)==0.3.8\n* [tensorflow](https://pypi.org/project/tensorflow/)==1.15.0\n* [tokenizers](https://pypi.org/project/tokenizers/)==0.7.0\n* [tqdm](https://pypi.org/project/tqdm/)==4.43.0\n\n\n## Reference\n\nIf you use the dataset, please cite the paper as formatted below.\n\n```\n@inproceedings{kotonya-toni-2020-explainable,\n    title = \"Explainable Automated Fact-Checking for Public Health Claims\",\n    author = \"Kotonya, Neema  and\n      Toni, Francesca\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    month = nov,\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2020.emnlp-main.623\",\n    pages = \"7740--7754\",\n}\n```\n\n## Contact\n\nPlease feel free to contact [Neema Kotonya](mailto:nk2418@ic.ac.uk) if you have any queries.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneemakot%2FHealth-Fact-Checking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneemakot%2FHealth-Fact-Checking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneemakot%2FHealth-Fact-Checking/lists"}