Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kormilitzin/med7
https://github.com/kormilitzin/med7
Last synced: 1 day ago
JSON representation
- Host: GitHub
- URL: https://github.com/kormilitzin/med7
- Owner: kormilitzin
- License: apache-2.0
- Created: 2020-02-26T23:03:32.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-12-11T16:38:09.000Z (almost 3 years ago)
- Last Synced: 2024-08-03T17:14:28.228Z (3 months ago)
- Size: 430 KB
- Stars: 200
- Watchers: 14
- Forks: 25
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Med7
This repository dedicated to the first release of [Med7: a transferable clinical natural language processing model for electronic health records](https://www.sciencedirect.com/science/article/pii/S0933365721000798), compatible with [spaCy](https://spacy.io) v3+, for clinical named-entity recognition (NER) tasks. The `en_core_med7_lg` model is trained on MIMIC-III free-text electronic health records and is able to recognise 7 categories:
Both vector and transformer models are now hosted on [Huggingface](https://huggingface.co/kormilitzin).
![Image description](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-26%20at%2018.18.54.png)
The trained model comprises three components in its pipeline:
* tagger
* parser
* clinical NER with seven categories.## Installation
It is recommended to create a dedicated virtual environment and install all recent required packages in there. The trained model was tested with spaCy version >=3.1 and Python >=3.7. For example, if the [anaconda distribution of Python](https://www.anaconda.com/distribution/#download-section) is already installed:
create a new virtual environment:
`(base) conda create -n med7 python=3.9`
activate and install spaCy:
```
(base) conda activate med7(med7) pip install -U spacy
```once all went through smoothly, install the Med7 model from the Huggingface Models repo:
Vectors model:
`pip install https://huggingface.co/kormilitzin/en_core_med7_lg/resolve/main/en_core_med7_lg-any-py3-none-any.whl`
Transformer-based model:
`pip install https://huggingface.co/kormilitzin/en_core_med7_trf/resolve/main/en_core_med7_trf-any-py3-none-any.whl`
This is RoBERTa-base implementation. Future works will improve its performance and introduce new feautres. Some entities **may not** be identified correctrly.
***Notice*** You can download `en_core_med7_lg` for spaCy v2 here: https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1
and then`pip install /path/to/downloaded/spacy2_model`
## Usage
```python
import spacymed7 = spacy.load("en_core_med7_lg")
# create distinct colours for labels
col_dict = {}
seven_colours = ['#e6194B', '#3cb44b', '#ffe119', '#ffd8b1', '#f58231', '#f032e6', '#42d4f4']
for label, colour in zip(med7.pipe_labels['ner'], seven_colours):
col_dict[label] = colouroptions = {'ents': med7.pipe_labels['ner'], 'colors':col_dict}
text = 'A patient was prescribed Magnesium hydroxide 400mg/5ml suspension PO of total 30ml bid for the next 5 days.'
doc = med7(text)spacy.displacy.render(doc, style='ent', jupyter=True, options=options)
[(ent.text, ent.label_) for ent in doc.ents]
```The Med7 model identifies correctly all seven entities in the following example and highlights them in different colours for better visualisation:
![](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-27%20at%2013.42.04.png)
and the resulting output:
```
[('Magnesium hydroxide', 'DRUG'),
('400mg/5ml', 'STRENGTH'),
('suspension', 'FORM'),
('PO', 'ROUTE'),
('30ml', 'DOSAGE'),
('bid', 'FREQUENCY'),
('for the next 5 days', 'DURATION')]
```It is straightforward to extract relations between the entities, since Med7 has both `parser` and `tagger` pipelines, similar to [this example.](https://github.com/explosion/spaCy/blob/master/examples/information_extraction/entity_relations.py)
The code in above can also be run in [Colab](https://colab.research.google.com/drive/1mY36G-vzBc_x4DGAYfyeb0OLIUcRMgff)
## Citing
This model is the very first step in our programme on clinical NLP for electronic health records (cNLPEHR). We are committed to developing FAIR - Findable, Accessible, Interoperable and Reusable tools which will benefit the wider community.
If you found this model useful, please acknowledge by citing as:
```
@article{kormilitzin2020med7,
title={Med7: a transferable clinical natural language processing model for electronic health records},
author={Kormilitzin, Andrey and Vaci, Nemanja and Liu, Qiang and Nevado-Holgado, Alejo},
journal={arXiv preprint arXiv:2003.01271},
year={2020}
}
```