Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kormilitzin/med7


https://github.com/kormilitzin/med7

Last synced: about 2 months ago
JSON representation

Awesome Lists containing this project

README

        

# Med7

This repository dedicated to the first release of [Med7: a transferable clinical natural language processing model for electronic health records](https://www.sciencedirect.com/science/article/pii/S0933365721000798), compatible with [spaCy](https://spacy.io) v3+, for clinical named-entity recognition (NER) tasks. The `en_core_med7_lg` model is trained on MIMIC-III free-text electronic health records and is able to recognise 7 categories:

Both vector and transformer models are now hosted on [Huggingface](https://huggingface.co/kormilitzin).

![Image description](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-26%20at%2018.18.54.png)

The trained model comprises three components in its pipeline:
* tagger
* parser
* clinical NER with seven categories.

## Installation

It is recommended to create a dedicated virtual environment and install all recent required packages in there. The trained model was tested with spaCy version >=3.1 and Python >=3.7. For example, if the [anaconda distribution of Python](https://www.anaconda.com/distribution/#download-section) is already installed:

create a new virtual environment:

`(base) conda create -n med7 python=3.9`

activate and install spaCy:

```
(base) conda activate med7

(med7) pip install -U spacy
```

once all went through smoothly, install the Med7 model from the Huggingface Models repo:

Vectors model:

`pip install https://huggingface.co/kormilitzin/en_core_med7_lg/resolve/main/en_core_med7_lg-any-py3-none-any.whl`

Transformer-based model:

`pip install https://huggingface.co/kormilitzin/en_core_med7_trf/resolve/main/en_core_med7_trf-any-py3-none-any.whl`

This is RoBERTa-base implementation. Future works will improve its performance and introduce new feautres. Some entities **may not** be identified correctrly.

***Notice*** You can download `en_core_med7_lg` for spaCy v2 here: https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1
and then

`pip install /path/to/downloaded/spacy2_model`

## Usage

```python
import spacy

med7 = spacy.load("en_core_med7_lg")

# create distinct colours for labels
col_dict = {}
seven_colours = ['#e6194B', '#3cb44b', '#ffe119', '#ffd8b1', '#f58231', '#f032e6', '#42d4f4']
for label, colour in zip(med7.pipe_labels['ner'], seven_colours):
col_dict[label] = colour

options = {'ents': med7.pipe_labels['ner'], 'colors':col_dict}

text = 'A patient was prescribed Magnesium hydroxide 400mg/5ml suspension PO of total 30ml bid for the next 5 days.'
doc = med7(text)

spacy.displacy.render(doc, style='ent', jupyter=True, options=options)

[(ent.text, ent.label_) for ent in doc.ents]
```

The Med7 model identifies correctly all seven entities in the following example and highlights them in different colours for better visualisation:

![](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-27%20at%2013.42.04.png)

and the resulting output:

```
[('Magnesium hydroxide', 'DRUG'),
('400mg/5ml', 'STRENGTH'),
('suspension', 'FORM'),
('PO', 'ROUTE'),
('30ml', 'DOSAGE'),
('bid', 'FREQUENCY'),
('for the next 5 days', 'DURATION')]
```

It is straightforward to extract relations between the entities, since Med7 has both `parser` and `tagger` pipelines, similar to [this example.](https://github.com/explosion/spaCy/blob/master/examples/information_extraction/entity_relations.py)

The code in above can also be run in [Colab](https://colab.research.google.com/drive/1mY36G-vzBc_x4DGAYfyeb0OLIUcRMgff)

## Citing

This model is the very first step in our programme on clinical NLP for electronic health records (cNLPEHR). We are committed to developing FAIR - Findable, Accessible, Interoperable and Reusable tools which will benefit the wider community.

If you found this model useful, please acknowledge by citing as:

```
@article{kormilitzin2020med7,
title={Med7: a transferable clinical natural language processing model for electronic health records},
author={Kormilitzin, Andrey and Vaci, Nemanja and Liu, Qiang and Nevado-Holgado, Alejo},
journal={arXiv preprint arXiv:2003.01271},
year={2020}
}
```