https://github.com/plandes/mimic

MIMIC III Corpus Parsing
https://github.com/plandes/mimic

mimic-iii natural-language-processing parsing-library spacy

Last synced: about 2 months ago
JSON representation

MIMIC III Corpus Parsing

Host: GitHub
URL: https://github.com/plandes/mimic
Owner: plandes
License: mit
Created: 2022-05-04T22:11:11.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2024-03-08T12:58:18.000Z (about 1 year ago)
Last Synced: 2024-03-08T13:49:59.357Z (about 1 year ago)
Topics: mimic-iii, natural-language-processing, parsing-library, spacy
Language: Python
Homepage: https://plandes.github.io/mimic/
Size: 521 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md

Awesome Lists containing this project

README

        # MIMIC III Corpus Parsing

[![PyPI][pypi-badge]][pypi-link]

[![Python 3.11][python311-badge]][python311-link]

[![Build Status][build-badge]][build-link]

A utility library for parsing the [MIMIC-III] corpus.  This uses [spaCy] and

extends the [zensols.mednlp] to parse the [MIMIC-III] medical note dataset.

Features include:

* Creates both natural language and medical features from medical notes.  The

  latter is generated using linked entity concepts parsed with [MedCAT] via

  [zensols.mednlp].

* Modifies the [spaCy] tokenizer to chunk masked tokens.  For example, `[`,

  `**`, `First`, `Name` `**` `]` becomes `[**First Name**]`.

* Provides a clean Pythonic object oriented representation of MIMIC-III

  admissions and medical notes.

* Interfaces MIMIC-III data as a relational database (either PostgreSQL or

  SQLite).

* Paragraph chunking using the most common syntax/physician templates provided

  in the MIMIC-III dataset.

## Documentation

See the [full documentation](https://plandes.github.io/mimic/index.html).

The [API reference](https://plandes.github.io/mimic/api.html) is also

available.

## Obtaining

The easiest way to install the command line program is via the `pip` installer:

```bash

pip3 install zensols.mimic

```

Binaries are also available on [pypi].

## Installation

1. Install the package: `pip3 install zensols.mimic`

2. Install the database (either PostgreSQL or SQLite).

## Configuration

After a database is installed it must be configured in a new file `~/.mimicrc`

that you create.  This INI formatted file also specifies where to cache data:

```ini

[default]

# the directory where cached data is stored

data_dir = ~/directory/to/cached/data

```

If this file doesn't exist, it must be specified with the `--config` option.

### SQLite

SQLite is the default database used for MIMIC-III access, but, it is slower and

not as well tested compared to the [PostgreSQL](PostgreSQL) driver.  See the

[SQLite database file] using the [SQLite instructions] to create the SQLite

file from MIMIC-III if you need database access.

Once you create the file, configure it with the API using the following

additional configuration in the `--config` specified file is also necessary (or in

`~/.mimicrc`):

```ini

[mimic_sqlite_conn_manager]

db_file = path: /mimic3.sqlite3

```

### PostgreSQL

PostgreSQL is the preferred way to access MIMIC-II for this API.  The MIMIC-III

database can be loaded by following the [PostgreSQL instructions], or consider

the [PostgreSQL Docker image].  Then configure the database by adding the

following to `~/.mimicrc`:

```ini

[mimic_default]

resources_dir = resource(zensols.mimic): resources

sql_resources = ${resources_dir}/postgres

conn_manager = mimic_postgres_conn_manager

[mimic_db]

database = 

host = 

port = 

user = 

password = 

```

The Python PostgreSQL client package is also needed (not needed for the

[SQLite](#sqlite-configuration) installs), which can be installed with:

```bash

pip3 install zensols.dbpg

```

## Usage

The [Corpus] class is the data access object used to read and parse the corpus:

```python

# get the MIMIC-III corpus data acceess object

>>> from zensols.mimic import ApplicationFactory

>>> corpus = ApplicationFactory.get_corpus()

# get an admission by hadm_id

>>> adm = corpus.hospital_adm_stash['165315']

# get the first discharge note (some have admissions have addendums)

>>> from zensols.mimic.regexnote import DischargeSummaryNote

>>> ds = adm.notes_by_category[DischargeSummaryNote.CATEGORY][0]

# dump the note as a human readable section-by-section

>>> ds.write()

row_id: 12144

category: Discharge summary

description: Report

annotator: regular_expression

----------------------0:chief-complaint (CHIEF COMPLAINT)-----------------------

Unresponsiveness

-----------1:history-of-present-illness (HISTORY OF PRESENT ILLNESS)------------

The patient is a ...

# get features of the note useful in ML models as a Pandas dataframe

>>> df = ds.feature_dataframe

# get only medical features (CUI, entity, NER and POS tag) for the HPI section

>>> df[(df['section'] == 'history-of-present-illness') & (df['cui_'] != '--')]['norm cui_ detected_name_ ent_ tag_'.split()]

             norm      cui_           detected_name_     ent_ tag_

15        history  C0455527  history~of~hypertension  concept   NN

```

See the [application example], which gives a fine grain way of configuring the

API.

## Medical Note Segmentation

This package uses regular expressions to segment notes.  However, the

[zensols.mimicsid] uses annotations and a model trained by clinical informatics

physicians.  Using this package gives this enhanced segmentation without any

API changes.

## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex

@inproceedings{landes-etal-2023-deepzensols,

    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",

    author = "Landes, Paul  and

      Di Eugenio, Barbara  and

      Caragea, Cornelia",

    editor = "Tan, Liling  and

      Milajevs, Dmitrijs  and

      Chauhan, Geeticka  and

      Gwinnup, Jeremy  and

      Rippeth, Elijah",

    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",

    month = dec,

    year = "2023",

    address = "Singapore, Singapore",

    publisher = "Association for Computational Linguistics",

    url = "https://aclanthology.org/2023.nlposs-1.16",

    pages = "141--146"

}

```

## Changelog

An extensive changelog is available [here](CHANGELOG.md).

## Community

Please star this repository and let me know how and where you use this API.

Contributions as pull requests, feedback and any input is welcome.

## License

[MIT License](LICENSE.md)

Copyright (c) 2022 - 2025 Paul Landes

[pypi]: https://pypi.org/project/zensols.mimic/

[pypi-link]: https://pypi.python.org/pypi/zensols.mimic

[pypi-badge]: https://img.shields.io/pypi/v/zensols.mimic.svg

[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg

[python311-link]: https://www.python.org/downloads/release/python-3110

[build-badge]: https://github.com/plandes/mimic/workflows/CI/badge.svg

[build-link]: https://github.com/plandes/mimic/actions

[MIMIC-III]: https://physionet.org/content/mimiciii-demo/1.4/

[MedCAT]: https://github.com/CogStack/MedCAT

[spaCy]: https://spacy.io

[zensols.mednlp]: https://github.com/plandes/mednlp

[SQLite instructions]: https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/sqlite

[PostgreSQL instructions]: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/buildmimic/postgres/README.md

[PostgreSQL Docker image]: https://github.com/plandes/mimicdb

[SQLite database file]: https://github.com/plandes/mimicdbsqlite

[Corpus]: https://plandes.github.io/mimic/api/zensols.mimic.html#zensols.mimic.corpus.Corpus

[application example]: https://github.com/plandes/mimic/blob/master/example/shownote.py

[zensols.mimicsid]: https://github.com/plandes/mimicsid

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/plandes/mimic

Awesome Lists containing this project

README