{"id":18011584,"url":"https://github.com/titipata/detecting-scientific-claim","last_synced_at":"2025-06-13T19:36:54.367Z","repository":{"id":84610957,"uuid":"136517946","full_name":"titipata/detecting-scientific-claim","owner":"titipata","description":"Extracting scientific claims from biomedical abstracts (powered by AllenNLP)","archived":false,"fork":false,"pushed_at":"2021-06-02T13:37:07.000Z","size":1743,"stargazers_count":142,"open_issues_count":8,"forks_count":18,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-07T20:06:22.589Z","etag":null,"topics":["allennlp","deep-learning","natural-language-processing","sentence-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/titipata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-07T18:45:50.000Z","updated_at":"2025-02-26T04:06:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"bde026ab-a434-480d-8cde-eb76bf48b8a8","html_url":"https://github.com/titipata/detecting-scientific-claim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/titipata/detecting-scientific-claim","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titipata%2Fdetecting-scientific-claim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titipata%2Fdetecting-scientific-claim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titipata%2Fdetecting-scientific-claim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titipata%2Fdetecting-scientific-claim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/titipata","download_url":"https://codeload.github.com/titipata/detecting-scientific-claim/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titipata%2Fdetecting-scientific-claim/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259708088,"owners_count":22899539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allennlp","deep-learning","natural-language-processing","sentence-classification"],"created_at":"2024-10-30T03:11:48.141Z","updated_at":"2025-06-13T19:36:54.333Z","avatar_url":"https://github.com/titipata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Claim Extraction for Scientific Publications\n\nDetecting claim from scientific publication using [discourse model](https://github.com/Franck-Dernoncourt/pubmed-rct) and transfer learning. \nModels are trained using [AllenNLP](https://github.com/allenai/allennlp) library.\n\n## Installing as a package\n\nYou can install the package using PIP, which will help you use the `discourse` classes inside a module\n\n```bash\npip install git+https://github.com/titipata/detecting-scientific-claim.git\n```\n\nyou will be able to use them as\n\n```python\nimport discourse\npredictor = discourse.DiscourseCRFClassifierPredictor()\n```\n\n## Training discourse model\n\nRunning AllenNLP to train a discourse model using [PubmMedRCT dataset](https://github.com/Franck-Dernoncourt/pubmed-rct) as follows\n\n```bash\nallennlp train experiments/pubmed_rct.json -s output --include-package discourse\n```\n\nWe point data location to Amazon S3 directly in `pubmed_rct.json`\nso you do not need to download the data locally. Change `cuda_device` to `-1` in `pubmed_rct.json`\nif you want to run on CPU. There are more experiments available in `experiments` folder.\n\n**Note** that you have to remove `output` folder first before running.\n\n\n## Predicting discourse\n\nWe trained the Bidirectional LSTM model on structured abstracts from Pubmed to predict\ndiscourse probability (`RESULTS`, `METHODS`, `CONCLUSIONS`, `BACKGROUND`, `OBJECTIVE`)\nof a given sentence. You can download trained model from Amazon S3\n\n```bash\nwget https://s3-us-west-2.amazonaws.com/pubmed-rct/model.tar.gz # or model_crf.tar.gz for pretrained model with CRF layer\n```\n\nand run web service for discourse prediction task as follow\n\n```bash\nbash web_service.sh\n```\n\nTo test the train model with provided examples [`fixtures.json`](pubmed-rct/PubMed_200k_RCT/fixtures.json),\nsimply run the following to predict labels.\n\n```bash\nallennlp predict model.tar.gz \\\n    pubmed-rct/PubMed_200k_RCT/fixtures.json \\\n    --include-package discourse \\\n    --predictor discourse_predictor\n```\n\nor run the following for \n\n```\nallennlp predict model_crf.tar.gz \\\n    pubmed-rct/PubMed_200k_RCT/fixtures_crf.json \\\n    --include-package discourse \\\n    --predictor discourse_crf_predictor\n```\n\nTo evaluate discourse model, you can run the following command\n\n```bash\nallennlp evaluate model.tar.gz \\\n  https://s3-us-west-2.amazonaws.com/pubmed-rct/test.json \\\n  --include-package discourse\n```\n\n\n## Predicting claim (web service)\n\nWe use transfer learning with fine tuning to train claim extraction model \nfrom pre-trained discourse model. The schematic of the training can be seen below.\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"static/transfer_learning.png\" width=\"400\" /\u003e\n\u003c/p\u003e\n\nYou can run the demo web application to detect claims as follows\n\n```bash\nexport FLASK_APP=main.py\nflask run --host=0.0.0.0 # this will serve at port 5000\n```\n\nThe interface will look something like this\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"static/interface.png\" width=\"600\" /\u003e\n\u003c/p\u003e\n\nAnd output will look something like the following (highlight means claim,\n  tag behind the sentence is discourse prediction)\n\n\u003cp float=\"left\"\u003e\n  \u003cimg src=\"static/output.png\" width=\"600\" /\u003e\n\u003c/p\u003e\n\n\n**Expertly annotated dataset** We release the dataset of annotated 1,500 abstracts containing 11,702 sentences (2,276 annotated as claim sentences) \nsampled from 110 biomedical journals. The final dataset are the majority vote from three experts. The annotations are hosted on Amazon S3 and \ncan be found from these given [URLs](https://github.com/titipata/detecting-scientific-claim/blob/master/scripts/transfer_learning_crf.py#L48-L50).\n\n\n## Requirements\n\n- [Python 3.6](https://www.python.org/downloads/release/python-360/)\n- [AllenNLP](https://github.com/allenai/allennlp) \u003e= 0.6.1\n- [spacy](https://github.com/explosion/spaCy)\n- [fastText](https://github.com/facebookresearch/fastText)\n- [Pubmed RCT](https://github.com/Franck-Dernoncourt/pubmed-rct) - dataset\n\n\n## Citing the repository\n\nYou can cite our paper available on arXiv as\n\nAchakulvisut, Titipat, Chandra Bhagavatula, Daniel Acuna, and Konrad Kording. _\"Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning.\"_ arXiv preprint arXiv:1907.00962 (2019).\n\nor using BibTeX\n\n```\n@article{achakulvisut2019claim,\n  title={Claim Extraction in Biomedical Publications using Deep Discourse Model and Transfer Learning},\n  author={Achakulvisut, Titipat and Bhagavatula, Chandra and Acuna, Daniel and Kording, Konrad},\n  journal={arXiv preprint arXiv:1907.00962},\n  year={2019}\n}\n```\n\n## Acknowledgement\n\nThis project is done at the [Allen Institute for Artificial Intelligence](https://allenai.org/)\nand [Konrad Kording lab, University of Pennsylvania](http://kordinglab.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftitipata%2Fdetecting-scientific-claim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftitipata%2Fdetecting-scientific-claim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftitipata%2Fdetecting-scientific-claim/lists"}