Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/marcolussetti/extended-medgan
Synthetic patient data using generative adversarial networks.
https://github.com/marcolussetti/extended-medgan
Last synced: 2 months ago
JSON representation
Synthetic patient data using generative adversarial networks.
- Host: GitHub
- URL: https://github.com/marcolussetti/extended-medgan
- Owner: marcolussetti
- License: bsd-3-clause
- Created: 2018-11-06T23:57:00.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-02-04T11:05:15.000Z (almost 6 years ago)
- Last Synced: 2024-08-03T17:14:57.638Z (6 months ago)
- Language: Python
- Homepage:
- Size: 43.9 KB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-data-synthesis - extended-MedGan - Synthetic patient data using generative adversarial networks. (Data-driven methods / Tabular)
README
medGAN
=========================================
medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes).#### Relevant Publications
medGAN implements the algorithm introduced in the following [paper](https://arxiv.org/abs/1703.06490):
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun
Machine Learning for Healthcare (MLHC) 2017#### Code Description
This code trains a generative adversarial network to generate patient records. This work currently can handle patient records that are aggregated over time, hence represented as a matrix where a row corresponds to a patient, and a column to a specific medical code (e.g. diagonsis code, medication code, or procedure code). The value of the matrix could either be binary (i.e. a specific medical code occurred in the longitudinal patient record or not) or count (i.e. how many times a specific medical code occurred in the longitudinal patient record).
#### Running GRAM**STEP 1: Installation**
1. medGAN was implemented to run on [TensorFlow](https://www.python.org/) 1.2. TensorFlow can be easily installed in Ubuntu as suggested [here](https://www.tensorflow.org/install/install_linux)
2. Download/clone the medGAN code
**STEP 2: Fast way to test medGAN with MIMIC-III**
This step describes how to train medGAN, with minimum number of steps using MIMIC-III.0. You will first need to request access for [MIMIC-III](https://mimic.physionet.org/gettingstarted/access/), a publicly avaiable electronic health records collected from ICU patients over 11 years.
1. You can use "process_mimic.py" to process MIMIC-III dataset and generate a suitable training dataset for medGAN.
Place the script to the same location where the MIMIC-III CSV files are located, and run the script.
The execution command is `python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv <"binary"|"count">`.
Note that the last argument decides whether you construct a binary matrix or a count matrix.
The above command will extract ICD9 diagnosis codes from MIMIC-III.
Mind that this script will use only 3 digits of the ICD9 diagnosis code. If you want to use all 5 digits, please see the source code of "process_mimic.py".2. Run medGAN using the ".matrix" file generated by process_mimic.py. The command is:
`python medgan.py --data_type=["binary", "count"]`.3. After the training, if you want to generate synthetic records, use this command :
`python medgan.py --model_file= --generate_data=True`.
Note that `` is not actually used for generating synthetic records, so it is just a dummy input.