{"id":13535042,"url":"https://github.com/EmilyAlsentzer/clinicalBERT","last_synced_at":"2025-04-02T00:32:01.884Z","repository":{"id":46271733,"uuid":"179753075","full_name":"EmilyAlsentzer/clinicalBERT","owner":"EmilyAlsentzer","description":"repository for Publicly Available Clinical BERT Embeddings ","archived":false,"fork":false,"pushed_at":"2020-08-25T14:11:44.000Z","size":98,"stargazers_count":630,"open_issues_count":5,"forks_count":126,"subscribers_count":25,"default_branch":"master","last_synced_at":"2024-04-28T05:28:10.705Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EmilyAlsentzer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-05T20:50:16.000Z","updated_at":"2024-04-25T02:15:23.000Z","dependencies_parsed_at":"2022-09-13T01:00:44.053Z","dependency_job_id":null,"html_url":"https://github.com/EmilyAlsentzer/clinicalBERT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmilyAlsentzer%2FclinicalBERT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmilyAlsentzer%2FclinicalBERT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmilyAlsentzer%2FclinicalBERT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmilyAlsentzer%2FclinicalBERT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EmilyAlsentzer","download_url":"https://codeload.github.com/EmilyAlsentzer/clinicalBERT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246735208,"owners_count":20825217,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T08:00:48.989Z","updated_at":"2025-04-02T00:32:01.519Z","avatar_url":"https://github.com/EmilyAlsentzer.png","language":"Python","funding_links":[],"categories":["domain specific BERT:","Uncategorized","Techniques and Models"],"sub_categories":["Uncategorized","BERT models"],"readme":"# clinicalBERT\nRepository for [Publicly Available Clinical BERT Embeddings](https://www.aclweb.org/anthology/W19-1909/) (NAACL Clinical NLP Workshop 2019)\n\n## Using Clinical BERT\n\nUPDATE: You can now use ClinicalBERT directly through the [transformers](https://github.com/huggingface/transformers)  library. Check out the [Bio+Clinical BERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) and [Bio+Discharge Summary BERT](https://huggingface.co/emilyalsentzer/Bio_Discharge_Summary_BERT) model pages for instructions on how to use the models within the Transformers library. \n\n## Download Clinical BERT\n\nThe Clinical BERT models can also be downloaded [here](https://www.dropbox.com/s/8armk04fu16algz/pretrained_bert_tf.tar.gz?dl=0), or via\n\n```\nwget -O pretrained_bert_tf.tar.gz https://www.dropbox.com/s/8armk04fu16algz/pretrained_bert_tf.tar.gz?dl=1\n```\n\n`biobert_pretrain_output_all_notes_150000` corresponds to Bio+Clinical BERT, and `biobert_pretrain_output_disch_100000` corresponds to Bio+Discharge Summary BERT. Both models are finetuned from [BioBERT](https://arxiv.org/abs/1901.08746). We specifically use the [BioBERT-Base v1.0 (+ PubMed 200K + PMC 270K)](https://github.com/naver/biobert-pretrained) version of BioBERT.\n\n`bert_pretrain_output_all_notes_150000` corresponds to Clinical BERT, and `bert_pretrain_output_disch_100000` corresponds to Discharge Summary BERT. Both models are finetuned from the cased version of BERT, specifically cased_L-12_H-768_A-12. \n\n## Reproduce Clinical BERT\n#### Pretraining\nTo reproduce the steps necessary to finetune BERT or BioBERT on MIMIC data, follow the following steps:\n1. Run `format_mimic_for_BERT.py` - Note you'll need to change the file paths at the top of the file.\n2. Run `create_pretrain_data.sh`\n3. Run `finetune_lm_tf.sh`\n\nNote: See issue [#4](https://github.com/EmilyAlsentzer/clinicalBERT/issues/4) for ways to improve section splitting code. \n\n#### Downstream Tasks\nTo see an example of how to use clinical BERT for the Med NLI tasks, go to the `run_classifier.sh` script in the downstream_tasks folder. To see an example for NER tasks, go to the `run_i2b2.sh` script.\n\n## Contact\nPlease post a Github issue or contact emilya@mit.edu if you have any questions.\n\n## Citation\nPlease acknowledge the following work in papers or derivative software:\n\nEmily Alsentzer, John Murphy, William Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 72-78, Minneapolis, Minnesota, USA. Association for Computational Linguistics. \n\n```\n@inproceedings{alsentzer-etal-2019-publicly,\n    title = \"Publicly Available Clinical {BERT} Embeddings\",\n    author = \"Alsentzer, Emily  and\n      Murphy, John  and\n      Boag, William  and\n      Weng, Wei-Hung  and\n      Jin, Di  and\n      Naumann, Tristan  and\n      McDermott, Matthew\",\n    booktitle = \"Proceedings of the 2nd Clinical Natural Language Processing Workshop\",\n    month = jun,\n    year = \"2019\",\n    address = \"Minneapolis, Minnesota, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/W19-1909\",\n    doi = \"10.18653/v1/W19-1909\",\n    pages = \"72--78\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEmilyAlsentzer%2FclinicalBERT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FEmilyAlsentzer%2FclinicalBERT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FEmilyAlsentzer%2FclinicalBERT/lists"}