Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/marcolussetti/extended-medgan

Synthetic patient data using generative adversarial networks.
https://github.com/marcolussetti/extended-medgan

Last synced: 2 months ago
JSON representation

Synthetic patient data using generative adversarial networks.

Awesome Lists containing this project

README

        

medGAN
=========================================
medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes).

#### Relevant Publications

medGAN implements the algorithm introduced in the following [paper](https://arxiv.org/abs/1703.06490):

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Machine Learning for Healthcare (MLHC) 2017

#### Code Description

This code trains a generative adversarial network to generate patient records. This work currently can handle patient records that are aggregated over time, hence represented as a matrix where a row corresponds to a patient, and a column to a specific medical code (e.g. diagonsis code, medication code, or procedure code). The value of the matrix could either be binary (i.e. a specific medical code occurred in the longitudinal patient record or not) or count (i.e. how many times a specific medical code occurred in the longitudinal patient record).

#### Running GRAM

**STEP 1: Installation**

1. medGAN was implemented to run on [TensorFlow](https://www.python.org/) 1.2. TensorFlow can be easily installed in Ubuntu as suggested [here](https://www.tensorflow.org/install/install_linux)

2. Download/clone the medGAN code

**STEP 2: Fast way to test medGAN with MIMIC-III**
This step describes how to train medGAN, with minimum number of steps using MIMIC-III.

0. You will first need to request access for [MIMIC-III](https://mimic.physionet.org/gettingstarted/access/), a publicly avaiable electronic health records collected from ICU patients over 11 years.

1. You can use "process_mimic.py" to process MIMIC-III dataset and generate a suitable training dataset for medGAN.
Place the script to the same location where the MIMIC-III CSV files are located, and run the script.
The execution command is `python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv <"binary"|"count">`.
Note that the last argument decides whether you construct a binary matrix or a count matrix.
The above command will extract ICD9 diagnosis codes from MIMIC-III.
Mind that this script will use only 3 digits of the ICD9 diagnosis code. If you want to use all 5 digits, please see the source code of "process_mimic.py".

2. Run medGAN using the ".matrix" file generated by process_mimic.py. The command is:
`python medgan.py --data_type=["binary", "count"]`.

3. After the training, if you want to generate synthetic records, use this command :
`python medgan.py --model_file= --generate_data=True`.
Note that `` is not actually used for generating synthetic records, so it is just a dummy input.