{"id":17614927,"url":"https://github.com/gsarti/covid-papers-browser","last_synced_at":"2025-08-21T05:32:11.515Z","repository":{"id":40967389,"uuid":"248975168","full_name":"gsarti/covid-papers-browser","owner":"gsarti","description":"Browse Covid-19 \u0026 SARS-CoV-2 Scientific Papers with Transformers  🦠 📖","archived":false,"fork":false,"pushed_at":"2022-06-22T01:30:00.000Z","size":11223,"stargazers_count":182,"open_issues_count":8,"forks_count":27,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-04-08T04:51:15.707Z","etag":null,"topics":["biobert","bionlp","covid-19","natural-language-processing","scibert","search-engine"],"latest_commit_sha":null,"homepage":"http://covidbrowser.areasciencepark.it","language":"CSS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gsarti.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-21T12:51:08.000Z","updated_at":"2024-08-12T19:59:05.000Z","dependencies_parsed_at":"2022-08-31T22:10:48.737Z","dependency_job_id":null,"html_url":"https://github.com/gsarti/covid-papers-browser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gsarti/covid-papers-browser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsarti%2Fcovid-papers-browser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsarti%2Fcovid-papers-browser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsarti%2Fcovid-papers-browser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsarti%2Fcovid-papers-browser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gsarti","download_url":"https://codeload.github.com/gsarti/covid-papers-browser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gsarti%2Fcovid-papers-browser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271430770,"owners_count":24758368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biobert","bionlp","covid-19","natural-language-processing","scibert","search-engine"],"created_at":"2024-10-22T18:54:53.251Z","updated_at":"2025-08-21T05:32:08.646Z","avatar_url":"https://github.com/gsarti.png","language":"CSS","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Covid-19 Semantic Browser: Browse Covid-19 \u0026 SARS-CoV-2 Scientific Papers with Transformers 🦠 📖\n\n**Covid-19 Semantic Browser** is an interactive experimental tool leveraging a state-of-the-art language model to search relevant content inside the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coronavirus-research) recently published by the White House and its research partners. The dataset contains over 44,000 scholarly articles about COVID-19, SARS-CoV-2 and related coronaviruses.\n\nVarious models already fine-tuned on Natural Language Inference are available to perform the search:\n\n- **[`scibert-nli`](https://huggingface.co/gsarti/scibert-nli)**, a fine-tuned version of AllenAI's [SciBERT](https://github.com/allenai/scibert) [1].\n\n- **[`biobert-nli`](https://huggingface.co/gsarti/biobert-nli)**, a fine-tuned version of [BioBERT](https://github.com/dmis-lab/biobert) by J. Lee et al. [2]\n\n- **[`covidbert-nli`](https://huggingface.co/gsarti/covidbert-nli)**, a fine-tuned version of Deepset's [CovidBERT](https://huggingface.co/deepset/covid_bert_base).\n\n- **[`clinicalcovidbert-nli`](https://huggingface.co/manueltonneau/clinicalcovid-bert-nli)**, a fine-tuned version of [@manueltonneau](https://github.com/manueltonneau)'s [ClinicalCovidBERT](https://github.com/manueltonneau/covid-berts).\n\nAll models are trained on [SNLI](https://nlp.stanford.edu/projects/snli/) [3] and [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) [4] using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) [5] to produce universal sentence embeddings [6]. Embeddings are subsequently used to perform semantic search on CORD-19.\n\nCurrently supported operations are:\n\n- Browse paper abstract with interactive queries.\n\n- Reproduce SciBERT-NLI, BioBERT-NLI and CovidBERT-NLI training results.\n\n## Setup\n\nPython 3.6 or higher is required to run the code. First, install the required libraries with `pip`, then download the `en_core_web_sm` language pack for spaCy and data for NLTK:\n\n```shell\npip install -r requirements.txt\npython -m spacy download en_core_web_sm\npython -m nltk.downloader punkt\n```\n\n## Using the Browser\n\nFirst of all, download a model fine-tuned on NLI from HuggingFace's cloud repository.\n\n```shell\npython scripts/download_model.py --model scibert-nli\n```\n\nSecond, download the data from the [Kaggle challenge page](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge) and place it in the `data` folder.\n\nFinally, simply run:\n\n```shell\npython scripts/interactive_search.py\n```\n\nto enter the interactive demo. Using a GPU is suggested since the creation of the embeddings for the entire corpus might be time-consuming otherwise. Both the corpus and the embeddings are cached on disk after the first execution of the script, and execution is really fast after embeddings are computed.\n\nUse the interactive demo as follows:\n\n![Demo GIF](img/demo.gif)\n\n## Reproducing Training Results for Transformers\n\nFirst, download a pretrained model from HuggingFace's cloud repository.\n\n```shell\npython scripts/download_model.py --model scibert\n```\n\nSecond, download the NLI datasets used for training and the STS dataset used for testing.\n\n```shell\npython scripts/get_finetuning_data.py\n```\n\nFinally, run the finetuning script by adjusting the parameters depending on the model you intend to train (default is `scibert-nli`).\n\n```shell\npython scripts/finetune_nli.py\n```\n\nThe model will be evaluated against the test portion of the **Semantic Text Similarity (STS)** benchmark dataset at the end of training. Please refer to my [model cards](https://huggingface.co/gsarti) for additional references on parameter values.\n\n## References\n\n[1] Beltagy et al. 2019, [\"SciBERT: Pretrained Language Model for Scientific Text\"](https://www.aclweb.org/anthology/D19-1371/)\n\n[2] Lee et al. 2020, [\"BioBERT: a pre-trained biomedical language representation model for biomedical text mining\"](http://doi.org/10.1093/bioinformatics/btz682)\n\n[3] Bowman et al. 2015, [\"A large annotated corpus for learning natural language inference\"](https://www.aclweb.org/anthology/D15-1075/)\n\n[4] Adina et al. 2018, [\"A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference\"](http://aclweb.org/anthology/N18-1101)\n\n[5] Reimers et al. 2019, [\"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks\"](https://www.aclweb.org/anthology/D19-1410/)\n\n[6] As shown in Conneau et al. 2017, [\"Supervised Learning of Universal Sentence Representations from Natural Language Inference Data\"](https://www.aclweb.org/anthology/D17-1070/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsarti%2Fcovid-papers-browser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgsarti%2Fcovid-papers-browser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgsarti%2Fcovid-papers-browser/lists"}