{"id":21015679,"url":"https://github.com/gersteinlab/factual_clinicalsumm","last_synced_at":"2026-04-22T12:38:40.161Z","repository":{"id":181093278,"uuid":"666208709","full_name":"gersteinlab/Factual_ClinicalSumm","owner":"gersteinlab","description":null,"archived":false,"fork":false,"pushed_at":"2023-07-15T03:05:37.000Z","size":32,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-01-20T12:07:53.802Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gersteinlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-14T01:16:03.000Z","updated_at":"2024-04-01T15:22:34.000Z","dependencies_parsed_at":"2024-11-19T10:10:55.050Z","dependency_job_id":null,"html_url":"https://github.com/gersteinlab/Factual_ClinicalSumm","commit_stats":null,"previous_names":["gersteinlab/factual_clinicalsumm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2FFactual_ClinicalSumm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2FFactual_ClinicalSumm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2FFactual_ClinicalSumm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gersteinlab%2FFactual_ClinicalSumm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gersteinlab","download_url":"https://codeload.github.com/gersteinlab/Factual_ClinicalSumm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243446928,"owners_count":20292446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T10:10:51.129Z","updated_at":"2025-12-28T12:35:26.343Z","avatar_url":"https://github.com/gersteinlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning\n\n\n# Metrics\n\n## Rouge, METEOR, BLEU\nGithub repository: [nlg-eval](https://github.com/Maluuba/nlg-eval)\nUsage:\n\n1. Set up as described in the repository readme.\n\n2. This repository provides a unified way to compute a series of natural language generation evaluation metrics. Here is an example of how to use it:\n\n```python\nfrom nlgeval import compute_metrics\nmetrics_dict = compute_metrics(hypothesis='examples/hyp.txt',\n                               references=['examples/ref1.txt', 'examples/ref2.txt'])\n```\nThe `hypothesis` parameter is the file path to the generated text. If there are multiple references, they can be provided as a list. If there is only one reference, it can be directly provided as a file path.\n\nThe metrics will be printed together.\n\n## QAFactEval\nGithub repository: [QAFactEval](https://github.com/salesforce/QAFactEval)\nUsage: First modify the `kwargs` and metric-related parameters, then store the `input.txt` as a list `[input]`, and store the `output.txt` as a nested list `[[output]]` (I understand the input and output in the example as source document and summary, respectively). Here is an example code:\n\n```python\nfrom qafacteval import QAFactEval\nkwargs = {\"cuda_device\": 0, \"use_lerc_quip\": True, \\\n        \"verbose\": True, \"generation_batch_size\": 32, \\\n        \"answering_batch_size\": 32, \"lerc_batch_size\": 8}\n\nmodel_folder = \"\" # path to models downloaded with download_models.sh\nmetric = QAFactEval(\n    lerc_quip_path=f\"{model_folder}/quip-512-mocha\",\n    generation_model_path=f\"{model_folder}/generation/model.tar.gz\",\n    answering_model_dir=f\"{model_folder}/answering\",\n    lerc_model_path=f\"{model_folder}/lerc/model.tar.gz\",\n    lerc_pretrained_model_path=f\"{model_folder}/lerc/pretraining.tar.gz\",\n    **kwargs\n)\n\nresults = metric.score_batch_qafacteval([\"This is a source document\"], [[\"This is a summary.\"]], return_qa_pairs=True)\nscore = results[0][0]['qa-eval']['lerc_quip']\n```\n\n## SUPERT\nGithub repository: [SUPERT](https://github.com/danieldeutsch/SUPERT)\nUsage: Use `CorpusReader` to read the target folder `data/topic_1`, which contains three subfolders `input_docs`, `references`, and `summaries` for input, reference, and output, respectively. Here is an example code:\n\n```python\nfrom ref_free_metrics.supert import Supert\nfrom utils.data_reader import CorpusReader\n\n# read docs and summaries\nreader = CorpusReader('data/topic_1')\nsource_docs = reader()\nsummaries = reader.readSummaries() \n\n# compute the Supert scores\nsupert = Supert(source_docs) \nscores = supert(summaries)\n```\n\n## BLANC\nGithub repository: [BLANC](https://github.com/PrimerAI/blanc)\nUsage: When there are multiple files, store the files as a list in `documents` and `summaries`, and use `eval_pairs()` to run the evaluation. When there is only one file, store it as a string, and use `eval_once()` to run the evaluation. Here is an example code:\n\n```python\nfrom blanc import BlancHelp, BlancTune\nblanc_help = BlancHelp()\nblanc_tune = BlancTune(finetune_mask_evenly=False, show_progress_bar=False)\n# Single document\ndocument = \"Jack drove his minivan to the bazaar to purchase milk and honey for his large family.\"\nsummary = \"Jack bought milk and honey.\"\nblanc_help.eval_once(document, summary)\nblanc_tune.eval_once(document, summary)\n# Multiple documents\ndocuments = [\"Jack drove his minivan to the bazaar to purchase milk and honey for his large family.\", \"As Jill started taking a walk in the park, she certainly noticed that the trees were extra green this year.\"]\nsummaries = [\"Jack bought milk and honey.\", \"Jill saw green trees in the park.\"]\nblanc_help.eval_pairs(documents, summaries)\n```\nThe above code runs on CPU. To use GPU acceleration, use the following code:\n```python\nblanc_help = BlancHelp(device='cuda', inference_batch_size=128)\nblanc_tune = BlancTune(device='cuda', inference_batch_size=24, finetune_mask_evenly=False, finetune_batch_size=24)\n```\n\n## QAEval\nGithub repository: [QAEval](https://github.com/danieldeutsch/sacrerouge/blob/master/doc/metrics/qaeval.md)\nThis one seems to require installing the library instead of cloning the repository, so I will include its readme here. Usage:\n\n1. Install SacreROUGE and then run `pip install qaeval`.\n\n2. Run `sacrerouge setup-metric qa-eval`.\n\n3. Store the summaries as a list `[]`, and store the references as a nested list `[[]]`, such as `[summary1, summary2]`, `[[reference1], [reference2]]`. Finally, call the `qaeval.score_all()` function. Here is an example code:\n\n```python\nimport json\nfrom sacrerouge.metrics import QAEval\n\nsummary1 = 'Dan walked to the bakery this morning.'\nreference1 = 'Dan went to buy scones earlier this morning.'\n\n# This line will load the generation and answer models into memory, so it may take some time to complete.\nqaeval = QAEval()\n\n# To score an individual summary with a list of reference summaries. This example\n# only uses 1 reference, so it is wrapped in a list.\nscores = qaeval.score(summary1, [reference1])\nprint(scores)\n{'qa-eval': {'em': 0.5, 'f1': 0.5}}\n\n# To run batch scoring, use the score_all function and pass a list of summaries and\n# a list of list of references. Again, each instance here only has 1 reference, so it is wrapped\n# in a list\nsummary2 = 'Roger Federer beat Rafael Nadal yesterday.'\nreference2 = 'Yesterday, Nadal lost to Federer'\n# scores_list is a list of size 2. scores_list[0] is the scores for summary1, and scores_list[1] for summary2\nscores_list = qaeval.score_all([summary1, summary2], [[reference1], [reference2]])\n\n# If you want the QA pairs used to score the summaries returned, add the return_qa_pairs=True argument\n# to any of the scoring methods. A tuple of size 2 will be returned. The first item is the scores\n# like above. The second item are the QA pairs.\nscores, qas = qaeval.score(summary2, [reference2], return_qa_pairs=True)\n\n# qas[i][j] is the j-th QA pair for the i-th reference summary. The \"probability\" is the QA model's\n# probability for the prediction. \"null_probability\" is its probability there is no answer.\nprint(json.dumps(qas[0][0], indent=2))\n```\n\n## QuestEval\nGithub repository: [QuestEval](https://github.com/danieldeutsch/sacrerouge/blob/master/doc/metrics/qaeval.md)\nUsage: This one also requires installing the library. Please refer to the readme in the above link. The `hypothesis` should be a list of all the outputs, `sources` should be a list of all the inputs, and `list_references` should be a list of all the inputs. Finally, call `score = questeval.corpus_questeval(hypothesis, sources, list_references)`. Here is an example code:\n\n```python\nfrom questeval.questeval_metric import QuestEval\nquesteval = QuestEval(no_cuda=True)\n\nsource_1 = \"Since 2000, the recipient of the Kate Greenaway medal has also been presented with the Colin Mears award to the value of 35000.\"\nprediction_1 = \"Since 2000, the winner of the Kate Greenaway medal has also been given to the Colin Mears award of the Kate Greenaway medal.\"\nreferences_1 = [\n    \"Since 2000, the recipient of the Kate Greenaway Medal will also receive the Colin Mears Awad which worth 5000 pounds\",\n    \"Since 2000, the recipient of the Kate Greenaway Medal has also been given the Colin Mears Award.\"\n]\n\nsource_2 = \"He is also a member of another Jungiery boyband 183 Club.\"\nprediction_2 = \"He also has another Jungiery Boyband 183 club.\"\nreferences_2 = [\n    \"He's also a member of another Jungiery boyband, 183 Club.\", \n    \"He belonged to the Jungiery boyband 183 Club.\"\n]\n\nscore = questeval.corpus_questeval(\n    hypothesis=[prediction_1, prediction_2], \n    sources=[source_1, source_2],\n    list_references=[references_1, references_2]\n)\n\nprint(score)\n```\n\n## FactCC, DAE\nsee FactCC, Github repository: [FactCC](https://github.com/nargesam/factCC/tree/master)\n\n## SummaC\nGithub repository: [SummaC](https://github.com/tingofurro/summac/)\n\nUsage:\n1. Install SummaC by running `pip install summac`.\n\n2. Import the necessary modules and create instances of the SummaC models:\n\n```python\nfrom summac.model_summac import SummaCZS, SummaCConv\n\nmodel_zs = SummaCZS(granularity=\"sentence\", model_name=\"vitc\", device=\"cpu\") # If you have a GPU: switch to: device=\"cuda\"\nmodel_conv = SummaCConv(models=[\"vitc\"], bins='percentile', granularity=\"sentence\", nli_labels=\"e\", device=\"cpu\", start_file=\"default\", agg=\"mean\")\n```\n\n3. Prepare your document and summary as strings and pass them to the `score` method of the respective model:\n\n```python\ndocument = \"\"\"Scientists are studying Mars to learn about the Red Planet and find landing sites for future missions.\nOne possible site, known as Arcadia Planitia, is covered in strange sinuous features.\nThe shapes could be signs that the area is actually made of glaciers, which are large masses of slow-moving ice.\nArcadia Planitia is in Mars' northern lowlands.\"\"\"\n\nsummary1 = \"There are strange shape patterns on Arcadia Planitia. The shapes could indicate the area might be made of glaciers. This makes Arcadia Planitia ideal for future missions.\"\n\nscore_zs1 = model_zs.score([document], [summary1])\nscore_conv1 = model_conv.score([document], [summary1])\n\nprint(\"[Summary 1] SummaCZS Score: %.3f; SummacConv score: %.3f\" % (score_zs1[\"scores\"][0], score_conv1[\"scores\"][0]))\n```\n\nThis will output the scores for the provided summary using the SummaCZS and SummaCConv models respectively.\n\n\n\nI hope this helps! Let me know if you have any further questions. @xiangru.tang@yale.edu\n\n# Citation\n\n```\n@inproceedings{tang-etal-2023-aligning,\n    title = \"Aligning Factual Consistency for Clinical Studies Summarization through Reinforcement Learning\",\n    author = \"Tang, Xiangru  and\n      Cohan, Arman  and\n      Gerstein, Mark\",\n    booktitle = \"Proceedings of the 5th Clinical Natural Language Processing Workshop\",\n    month = jul,\n    year = \"2023\",\n    address = \"Toronto, Canada\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.clinicalnlp-1.7\",\n    pages = \"48--58\",\n    abstract = \"In the rapidly evolving landscape of medical research, accurate and concise summarization of clinical studies is crucial to support evidence-based practice. This paper presents a novel approach to clinical studies summarization, leveraging reinforcement learning to enhance factual consistency and align with human annotator preferences. Our work focuses on two tasks: Conclusion Generation and Review Generation. We train a CONFIT summarization model that outperforms GPT-3 and previous state-of-the-art models on the same datasets and collects expert and crowd-worker annotations to evaluate the quality and factual consistency of the generated summaries. These annotations enable us to measure the correlation of various automatic metrics, including modern factual evaluation metrics like QAFactEval, with human-assessed factual consistency. By employing top-correlated metrics as objectives for a reinforcement learning model, we demonstrate improved factuality in generated summaries that are preferred by human annotators.\",\n}\nCreative Commons License\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgersteinlab%2Ffactual_clinicalsumm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgersteinlab%2Ffactual_clinicalsumm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgersteinlab%2Ffactual_clinicalsumm/lists"}