Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmis-lab/biobert-pytorch
PyTorch Implementation of BioBERT
https://github.com/dmis-lab/biobert-pytorch
Last synced: 2 days ago
JSON representation
PyTorch Implementation of BioBERT
- Host: GitHub
- URL: https://github.com/dmis-lab/biobert-pytorch
- Owner: dmis-lab
- License: other
- Created: 2020-10-14T07:51:15.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-06-24T05:15:27.000Z (over 1 year ago)
- Last Synced: 2024-12-30T16:49:49.552Z (9 days ago)
- Language: Java
- Homepage: http://doi.org/10.1093/bioinformatics/btz682
- Size: 1.87 MB
- Stars: 315
- Watchers: 10
- Forks: 107
- Open Issues: 29
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BioBERT-PyTorch
This repository provides the PyTorch implementation of [BioBERT](https://academic.oup.com/bioinformatics/article/36/4/1234/5566506).
You can easily use BioBERT with [transformers](https://github.com/huggingface/transformers).
This project is supported by the members of [DMIS-Lab](https://dmis.korea.ac.kr/) @ Korea University including Jinhyuk Lee, Wonjin Yoon, Minbyul Jeong, Mujeen Sung, and Gangwoo Kim.## Installation
```bash
# Install huggingface transformers
pip install transformers==3.0.0# Download all datasets including NER/RE/QA
./download.sh
```
Note that you should also install `torch` (see [download instruction](https://pytorch.org/)) to use `transformers`.
If the download script does not work, you can manually download the datasets [here](https://drive.google.com/file/d/1cGqvAm9IZ_86C4Mj7Zf-w9CFilYVDl8j/view?usp=sharing) which should be unzipped in the current directory (`tar -xzvf datasets.tar.gz`).## Models
We provide following versions of BioBERT in PyTorch (click [here](https://huggingface.co/dmis-lab) to see all).
You can use BioBERT in `transformers` by setting `--model_name_or_path` as one of them (see example below).
* `dmis-lab/biobert-base-cased-v1.2`: Trained in the same way as BioBERT-Base v1.1 but includes LM head, which can be useful for probing
* `dmis-lab/biobert-base-cased-v1.1`: BioBERT-Base v1.1 (+ PubMed 1M)
* `dmis-lab/biobert-large-cased-v1.1`: BioBERT-Large v1.1 (+ PubMed 1M)
* `dmis-lab/biobert-base-cased-v1.1-mnli`: BioBERT-Base v1.1 pre-trained on MNLI
* `dmis-lab/biobert-base-cased-v1.1-squad`: BioBERT-Base v1.1 pre-trained on SQuAD
* `dmis-lab/biobert-base-cased-v1.2`: BioBERT-Base v1.2 (+ PubMed 1M + LM head)For other versions of BioBERT or for Tensorflow, please see the [README](https://github.com/dmis-lab/biobert) in the original BioBERT repository.
You can convert any version of BioBERT into PyTorch with [this](https://github.com/huggingface/transformers/blob/v3.5.1/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py).## Example
For instance, to train BioBERT on the NER dataset (NCBI-disease), run as:```bash
# Pre-process NER datasets
cd named-entity-recognition
./preprocess.sh# Choose dataset and run
export DATA_DIR=../datasets/NER
export ENTITY=NCBI-disease
python run_ner.py \
--data_dir ${DATA_DIR}/${ENTITY} \
--labels ${DATA_DIR}/${ENTITY}/labels.txt \
--model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
--output_dir output/${ENTITY} \
--max_seq_length 128 \
--num_train_epochs 3 \
--per_device_train_batch_size 32 \
--save_steps 1000 \
--seed 1 \
--do_train \
--do_eval \
--do_predict \
--overwrite_output_dir
```Please see each directory for different examples. Currently, we provide
* [embedding/](https://github.com/dmis-lab/biobert-pytorch/tree/master/embedding): BioBERT embedding.
* [named-entity-recognition/](https://github.com/dmis-lab/biobert-pytorch/tree/master/named-entity-recognition): NER using BioBERT.
* [question-answering/](https://github.com/dmis-lab/biobert-pytorch/tree/master/question-answering): QA using BioBERT.
* [relation-extraction/](https://github.com/dmis-lab/biobert-pytorch/tree/master/relation-extraction): RE using BioBERT.Most examples are modifed from [examples](https://github.com/huggingface/transformers/tree/master/examples) in Hugging Face transformers.
## Citation
```bibtex
@article{lee2020biobert,
title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining},
author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo},
journal={Bioinformatics},
volume={36},
number={4},
pages={1234--1240},
year={2020},
publisher={Oxford University Press}
}
```## License and Disclaimer
Please see the LICENSE file for details. Downloading data indicates your acceptance of our disclaimer.## Contact
For help or issues using BioBERT-PyTorch, please create an issue.