Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soda-inria/causal_ehr_mimic
Code repository for the paper : Causal thinking for decision making on Electronic Health Records: why and how
https://github.com/soda-inria/causal_ehr_mimic
Last synced: about 2 months ago
JSON representation
Code repository for the paper : Causal thinking for decision making on Electronic Health Records: why and how
- Host: GitHub
- URL: https://github.com/soda-inria/causal_ehr_mimic
- Owner: soda-inria
- License: other
- Created: 2023-07-21T12:00:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-07T16:07:32.000Z (over 1 year ago)
- Last Synced: 2023-09-07T16:26:58.900Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 11.1 MB
- Stars: 7
- Watchers: 7
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Step-by-step Causal Analysis Of Ehrs To Ground Decision Making
## Overview
This code was used to study the effect of a combination of Albumin+crystalloids compared to crystalloids only for sepsis patients on 28 day mortality in the Mimic-iv database.
The corresponding paper is:
[Causal thinking for decision making on Electronic Health Records: why and how, Doutreligne M. and Struja T. and Abecassis J. and Morgand C. and Celi L. and Varoquaux G., arXiv preprint arXiv:2308.01605, 2023](https://arxiv.org/abs/2308.01605)
## Step 0: data acquisition
We used a combination of duckdb and postgresql to build mimic for a laptop into parquet files easy to process during the analysis.
Steps to reproduce data acquisition on a laptop
0. We first built the data using postgresql following [mimic-code instructions](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/postgres) *=30min-1h*.
1. We then used built the concepts from the [postgresql concepts scripts](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/concepts_postgres) *=1h-2h*.
2. We built the original database with duckdb *=16min* following [mimic-code instructions](https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/duckdb). This is done to copy the large files to parquet without memory overflow.
3. We convert all tables from the original database to parquet files using duckdb: `cli.duckdb2parquet` *=5min*.
4. We convert all derived tables from postgresql to parquet using polars: `cli.mimiciv_derived2parquet` *=2min*.NB: Step 2 and 3 could be done from polars or duckdb directly from the csv, I think. But I had some bugs with the `pl.sink_parquet` when trying to convert the csv.
**Computing setup:** The analysis was performed on a laptop running Ubuntu 22.04.2 LTS with the following hardware: CPU 12th Gen
Intel(R) Core(TM) i7-1270P with 16 threads and 15 GB of RAM.
## Step 1: study design – Frame the question to avoid biases
- Code for main paper: [`caumim.framing.albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/caumim/framing/albumin_for_sepsis.py)
- Step-by-step notebook: [`notebooks/_1_framing_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/notebooks/_1_framing_albumin_for_sepsis.py)## Step 2: identification – List necessary information to answer the causal question
We built the causal graph with a clinician, using the online graphical tool [Daggity](https://dagitty.net/).
The practical implementation of the selection of these covariates in MIMIC-IV is done with the [`get_event_covariates_albumin_zhou`](https://github.com/soda-inria/causal_ehr_mimic/blob/06a65e94c221bd02d1613477937b281543b05577/caumim/variables/selection.py#L107) function in `caumim.variables.selection.py`
## Step 3: Estimation – Compute the causal effect of interest
- Code for main paper: [`caumim.experiments.sensitivity_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/605a706f883b67459ec711de8eec387e8dd8528f/caumim/experiments/sensitivity_albumin_for_sepsis.py#L216)
- Step-by-step notebook: [`notebooks/_3_estimation__albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/605a706f883b67459ec711de8eec387e8dd8528f/notebooks/_3_estimation__albumin_for_sepsis.py#L141)## Step 4: Vibration analysis – Assess the robustness of the hypotheses
- Code for main paper:
- Vibration analysis on immortal time bias: [`caumim.experiments.immortal_time_bias_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/caumim/experiments/immortal_time_bias_albumin_for_sepis.py)
- Vibration analysis on feature aggregations: [`caumim.experiments.sensitivity_feature_aggregation_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/caumim/experiments/sensitivity_feature_aggregation_albumin_for_sepsis.py)
- Vibration analysis on causal and statistical estimators: [`caumim.experiments.sensitivity_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/caumim/experiments/sensitivity_albumin_for_sepsis.py)- Step-by-step notebook: [`notebooks/_3_estimation__albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/605a706f883b67459ec711de8eec387e8dd8528f/notebooks/_3_estimation__albumin_for_sepsis.py#L161)
## Step 5: Treatment heterogeneity – Compute treatment effects on subpopulations
- Code for the main paper: [`caumim.experiments.cate_exploration_albumin_for_sepsis.py`](https://github.com/soda-inria/causal_ehr_mimic/blob/main/caumim/experiments/cate_exploration_albumin_for_sepsis.py)