Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kormilitzin/med7

Last synced: 1 day ago
JSON representation

Host: GitHub
URL: https://github.com/kormilitzin/med7
Owner: kormilitzin
License: apache-2.0
Created: 2020-02-26T23:03:32.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2021-12-11T16:38:09.000Z (almost 3 years ago)
Last Synced: 2024-08-03T17:14:28.228Z (3 months ago)
Size: 430 KB
Stars: 200
Watchers: 14
Forks: 25
Open Issues: 10
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Med7

This repository dedicated to the first release of [Med7: a transferable clinical natural language processing model for electronic health records](https://www.sciencedirect.com/science/article/pii/S0933365721000798), compatible with [spaCy](https://spacy.io) v3+, for clinical named-entity recognition (NER) tasks. The `en_core_med7_lg` model is trained on MIMIC-III free-text electronic health records and is able to recognise 7 categories:

Both vector and transformer models are now hosted on [Huggingface](https://huggingface.co/kormilitzin).

![Image description](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-26%20at%2018.18.54.png)

The trained model comprises three components in its pipeline:

* tagger

* parser

* clinical NER with seven categories.

## Installation

It is recommended to create a dedicated virtual environment and install all recent required packages in there. The trained model was tested with spaCy version >=3.1 and Python >=3.7. For example, if the [anaconda distribution of Python](https://www.anaconda.com/distribution/#download-section) is already installed:

create a new virtual environment:

`(base) conda create -n med7 python=3.9`

activate and install spaCy:

```

(base) conda activate med7

(med7) pip install -U spacy

```

once all went through smoothly, install the Med7 model from the Huggingface Models repo:

Vectors model:

`pip install https://huggingface.co/kormilitzin/en_core_med7_lg/resolve/main/en_core_med7_lg-any-py3-none-any.whl`

Transformer-based model:

`pip install https://huggingface.co/kormilitzin/en_core_med7_trf/resolve/main/en_core_med7_trf-any-py3-none-any.whl`

This is RoBERTa-base implementation. Future works will improve its performance and introduce new feautres. Some entities **may not** be identified correctrly.

***Notice*** You can download `en_core_med7_lg` for spaCy v2 here: https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1

and then 

`pip install /path/to/downloaded/spacy2_model`

## Usage

```python

import spacy

med7 = spacy.load("en_core_med7_lg")

# create distinct colours for labels

col_dict = {}

seven_colours = ['#e6194B', '#3cb44b', '#ffe119', '#ffd8b1', '#f58231', '#f032e6', '#42d4f4']

for label, colour in zip(med7.pipe_labels['ner'], seven_colours):

    col_dict[label] = colour

options = {'ents': med7.pipe_labels['ner'], 'colors':col_dict}

text = 'A patient was prescribed Magnesium hydroxide 400mg/5ml suspension PO of total 30ml bid for the next 5 days.'

doc = med7(text)

spacy.displacy.render(doc, style='ent', jupyter=True, options=options)

[(ent.text, ent.label_) for ent in doc.ents]

```

The Med7 model identifies correctly all seven entities in the following example and highlights them in different colours for better visualisation:

![](https://github.com/kormilitzin/med7/blob/master/images/Screenshot%202020-02-27%20at%2013.42.04.png)

and the resulting output:

```

[('Magnesium hydroxide', 'DRUG'),

 ('400mg/5ml', 'STRENGTH'),

 ('suspension', 'FORM'),

 ('PO', 'ROUTE'),

 ('30ml', 'DOSAGE'),

 ('bid', 'FREQUENCY'),

 ('for the next 5 days', 'DURATION')]

```

It is straightforward to extract relations between the entities, since Med7 has both `parser` and `tagger` pipelines, similar to [this example.](https://github.com/explosion/spaCy/blob/master/examples/information_extraction/entity_relations.py)

The code in above can also be run in [Colab](https://colab.research.google.com/drive/1mY36G-vzBc_x4DGAYfyeb0OLIUcRMgff)

## Citing

This model is the very first step in our programme on clinical NLP for electronic health records (cNLPEHR). We are committed to developing FAIR - Findable, Accessible, Interoperable and Reusable tools which will benefit the wider community. 

If you found this model useful, please acknowledge by citing as:

```

@article{kormilitzin2020med7,

  title={Med7: a transferable clinical natural language processing model for electronic health records},

  author={Kormilitzin, Andrey and Vaci, Nemanja and Liu, Qiang and Nevado-Holgado, Alejo},

  journal={arXiv preprint arXiv:2003.01271},

  year={2020}

}

```