Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/plandes/medacy_bertcrf_model_clinical_notes
Clinical Notes Model for medaCy (BERT)
https://github.com/plandes/medacy_bertcrf_model_clinical_notes
bert-model machine-learning medical natural-language-processing
Last synced: 18 days ago
JSON representation
Clinical Notes Model for medaCy (BERT)
- Host: GitHub
- URL: https://github.com/plandes/medacy_bertcrf_model_clinical_notes
- Owner: plandes
- License: gpl-3.0
- Created: 2021-03-27T00:37:29.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2021-03-30T22:36:41.000Z (almost 4 years ago)
- Last Synced: 2024-11-08T21:27:28.672Z (2 months ago)
- Topics: bert-model, machine-learning, medical, natural-language-processing
- Language: Python
- Homepage:
- Size: 20.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![spaCy](https://img.shields.io/badge/built%20with-spaCy-09a3d5.svg)](https://spacy.io)
# medaCy
:hospital: Clinical Notes Model for medaCy (BERT+CRF) :hospital:This repository contains a versioned, medaCy compatible Model for information extraction from clinical notes.
![alt text](https://nlp.cs.vcu.edu/images/Edit_NanomedicineDatabase.png "Nanoinformatics")
# Description
This is the light-weight version (no metamap) of medaCy's model for extracting 9 unique entities from clinical notes:`Drug, Strength, Duration, Route, Form, ADE, Dosage, Reason, Frequency`
This model was trained using the
[ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT)
pre-trained embeddings with CRF.# Results
Model generalization ability is evaluated over 202 patient clinical note files not seen during training. *Strict* indicates exact matches of spans, *Lenient* indicates a fuzzy matching of spans (model predictions are off by single characters).| Entity (Count) | Precision | Recall | F1 | F1_Min | F1_Max |
|-------------------|-------------|----------|-------|----------|----------|
| ADE (1584) | 0.49 | 0.327 | 0.387 | 0.342 | 0.466 |
| Dosage (6902) | 0.941 | 0.951 | 0.946 | 0.936 | 0.961 |
| Drug (26800) | 0.904 | 0.891 | 0.898 | 0.883 | 0.905 |
| Duration (970) | 0.836 | 0.8 | 0.816 | 0.768 | 0.861 |
| Form (11010) | 0.937 | 0.939 | 0.938 | 0.931 | 0.954 |
| Frequency (10293) | 0.878 | 0.952 | 0.914 | 0.9 | 0.926 |
| Reason (6400) | 0.653 | 0.513 | 0.571 | 0.554 | 0.598 |
| Route (8989) | 0.929 | 0.932 | 0.93 | 0.925 | 0.937 |
| Strength (10921) | 0.956 | 0.955 | 0.955 | 0.95 | 0.961 |
| system (83869) | 0.894 | 0.896 | 0.893 | 0.889 | 0.902 |# Training Data
N2C2 2018 Shared Task
The data used to induce this model is protected by HIPAA privacy regulations and thus cannot be published.Authors
=======
Andriy Mulyar and Bridget McInnesAcknowledgments
===============
- [VCU Natural Language Processing Lab](https://nlp.cs.vcu.edu/) ![alt text](https://nlp.cs.vcu.edu/images/vcu_head_logo "VCU")
- [Nanoinformatics Vertically Integrated Projects](https://rampages.us/nanoinformatics/)