Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mp2893/medgan
Generative adversarial network for generating electronic health records.
https://github.com/mp2893/medgan
Last synced: 2 months ago
JSON representation
Generative adversarial network for generating electronic health records.
- Host: GitHub
- URL: https://github.com/mp2893/medgan
- Owner: mp2893
- License: bsd-3-clause
- Created: 2017-03-19T17:26:17.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-08-19T17:50:15.000Z (over 5 years ago)
- Last Synced: 2024-08-03T17:14:30.644Z (6 months ago)
- Language: Python
- Size: 29.3 KB
- Stars: 270
- Watchers: 15
- Forks: 91
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-data-synthesis - MedGAN - medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes) - [Paper](https://arxiv.org/abs/1703.06490) (Data-driven methods / Tabular)
README
medGAN
=========================================
medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes).#### Relevant Publications
medGAN implements the algorithm introduced in the following [paper](https://arxiv.org/abs/1703.06490):
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Machine Learning for Healthcare (MLHC) 2017#### Code Description
This code trains a generative adversarial network to generate patient records. This work currently can handle patient records that are aggregated over time, hence represented as a matrix where a row corresponds to a patient, and a column to a specific medical code (e.g. diagonsis code, medication code, or procedure code). The value of the matrix could either be binary (i.e. a specific medical code occurred in the longitudinal patient record or not) or count (i.e. how many times a specific medical code occurred in the longitudinal patient record).
#### Running GRAM**STEP 1: Installation**
1. medGAN was implemented to run on [TensorFlow](https://www.python.org/) 1.2. TensorFlow can be easily installed in Ubuntu as suggested [here](https://www.tensorflow.org/install/install_linux)
2. Download/clone the medGAN code
**STEP 2: Fast way to test medGAN with MIMIC-III**
This step describes how to train medGAN, with minimum number of steps using MIMIC-III.0. You will first need to request access for [MIMIC-III](https://mimic.physionet.org/gettingstarted/access/), a publicly avaiable electronic health records collected from ICU patients over 11 years.
1. You can use "process_mimic.py" to process MIMIC-III dataset and generate a suitable training dataset for medGAN.
Place the script to the same location where the MIMIC-III CSV files are located, and run the script.
The execution command is `python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv <"binary"|"count">`.
Note that the last argument decides whether you construct a binary matrix or a count matrix.
The above command will extract ICD9 diagnosis codes from MIMIC-III.
Mind that this script will use only 3 digits of the ICD9 diagnosis code. If you want to use all 5 digits, please see the source code of "process_mimic.py".2. Run medGAN using the ".matrix" file generated by process_mimic.py. The command is:
`python medgan.py --data_type=["binary", "count"]`.3. After the training, if you want to generate synthetic records, use this command :
`python medgan.py --model_file= --generate_data=True --data_type=["binary", "count"]`.
Note that `` is not actually used for generating synthetic records, so it is just a dummy input.