https://github.com/corazonlabs/lemonpie
A deep learning library for Electronic Health Record (EHR) data
https://github.com/corazonlabs/lemonpie
deep-learning ehr fastai fhir health-data healthcare nbdev pytorch synthea
Last synced: about 2 months ago
JSON representation
A deep learning library for Electronic Health Record (EHR) data
- Host: GitHub
- URL: https://github.com/corazonlabs/lemonpie
- Owner: corazonlabs
- License: apache-2.0
- Created: 2020-11-26T00:27:38.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-04-12T05:18:48.000Z (about 3 years ago)
- Last Synced: 2025-12-16T18:53:22.544Z (5 months ago)
- Topics: deep-learning, ehr, fastai, fhir, health-data, healthcare, nbdev, pytorch, synthea
- Language: Jupyter Notebook
- Homepage: https://corazonlabs.github.io/lemonpie/
- Size: 18.6 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# lemonpie
> An open source deep learning library for Electronic Health Record (EHR) data.
In this initial release of the library ..
- it implements 2 deep learning models (an LSTM and a CNN) based on popular papers
- that can be trained on synthetic EHR data, created using the open source [Synthea Patient Generator](https://github.com/synthetichealth/synthea/wiki)
- to predict conditions that are on the [CDC's list of top chronic diseases](https://www.cdc.gov/chronicdisease/about/costs/index.htm) that contribute most to healthcare costs
- and is easily configurable to train on and predict any conditions in the dataset
The end goal is to
- keep adding more model implementations
- keep adding different publicly available datasets
- and have a leaderboard to track which models and configurations work best on these datasets
## Install
With conda
- `conda install -c corazonlabs -c fastai -c conda-forge lemonpie`
With pip
- `pip install lemonpie`
Or ..
1. Git clone the repo
- `https://github.com/corazonlabs/lemonpie.git`
2. Create a new conda env using the `environment.yml` file
- `cd lemonpie`
- `conda env create --name lemonpie --file environment.yml`
## How to use
1. Read through and then run the following **Quick Start** guides to get a general idea.
- if using the cloned repo, run these noteboooks
- `99_quick_walkthru.ipynb`
- `99_running_exps.ipynb`
- if using installed lib, just open a jupyter notebook and copy, paste & run cell-by-cell from these guides
- [Quick Walkthrough](https://corazonlabs.github.io/lemonpie/quick_walkthru.html)
- [Running Experiments](https://corazonlabs.github.io/lemonpie/running_exps.html)
2. Setup Synthea
- Refer to [condensed instructions](https://corazonlabs.github.io/lemonpie/setup.html#Setup-Synthea)
- Generate different datasets you like - e.g. 1K, 5K, 10K
3. Run experiments
- Refer to **Detailed Docs** for customizations
## Roadmap
- **A leader-board to track which models and configurations work best on different publicly available datasets**.
- Callbacks, Mixed Precision, etc
- Either upgrade the library to use fastai v2.
- Or as a minimum, build functionality for fastai-style callbacks & [PyTorch AMP](https://pytorch.org/docs/stable/amp.html).
- More models
- Pick some of the best EHR models out there and implement them.
- **Ideas are welcome** - please use [Github discussions](https://github.com/corazonlabs/lemonpie/discussions) for suggestions.
- More datasets
- eICU and MIMIC3 possibly.
- **Ideas are welcome** - [Github discussions](https://github.com/corazonlabs/lemonpie/discussions)
- NLP on clinical notes
- Synthea does not have clinical notes, so this can only be done with other datasets.
- Predicting different conditions
- Again different datasets will allow this - e.g. hospitalization data (length of stay, in-patient mortality), ER data, etc.
- Integraion with Experiment management tools like W&B, Comet, etc,.
## Known Issues & Limitations
1. `num_workers > 0` not working yet, under investigation
2. Test coverage
- Need to write more tests for more comprehensive coverage
Look at [Issues](https://github.com/corazonlabs/lemonpie/issues) for details about these and others.
## References
This library is created using the awesome [nbdev](https://nbdev.fast.ai/)
Synthea [Synthetic Patient Population Simulator](https://github.com/synthetichealth/synthea)
> Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea:An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079
LSTM Model *based on* this paper - [Scalable and accurate deep learning for electronic health records](http://arxiv.org/abs/1801.07860)
> Rajkomar, A., Oren, E., Chen, K. et al. Scalable and accurate deep learning with electronic health records. npj Digital Med 1, 18 (2018). https://doi.org/10.1038/s41746-018-0029-1
CNN Model *based* on this paper - [Deepr: A Convolutional Net for Medical Records](http://arxiv.org/abs/1607.07519)
> Nguyen, P., Tran, T., Wickramasinghe, N., & Venkatesh, S. (2017). $\mathtt {Deepr}$:A Convolutional Net for Medical Records. IEEE Journal of Biomedical and Health Informatics, 21, 22-30.