https://github.com/soda-inria/predictive-ehr-benchmark

Exploring a complexity gradient in representation and predictive algorithms for EHRs
https://github.com/soda-inria/predictive-ehr-benchmark

Last synced: about 2 months ago
JSON representation

Exploring a complexity gradient in representation and predictive algorithms for EHRs

Host: GitHub
URL: https://github.com/soda-inria/predictive-ehr-benchmark
Owner: soda-inria
License: other
Created: 2023-09-13T15:12:28.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-14T11:59:16.000Z (almost 2 years ago)
Last Synced: 2025-02-17T11:13:04.853Z (5 months ago)
Language: Python
Homepage: https://soda-inria.github.io/predictive-ehr-benchmark/
Size: 76 MB
Stars: 0
Watchers: 7
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Predictive algorithms from Electronic Health Records

This repository hosts code for the working paper: *Exploring a complexity gradient in representation and predictive algorithms for EHRs*

[**Documentation**](https://soda-inria.github.io/predictive-ehr-benchmark/)

[**Source Code**](https://github.com/soda-inria/predictive-ehr-benchmark)

[**Working Paper repository**](https://github.com/strayMat/predictive_ehr_paper)

### Abstract

Electronic Health Records contain time-varying features with high cardinality.
Current state-of-the-art predictive models build on increasingly elaborated
pipelines --based on transformers-- to handle the complexity of these data.
Acknowledging the complexity to deploy, transfer and adapt these models on local
care environments, we explore a complexity-benefit tradeoff by comparing them to
simple aggregation of events. We use three clinical tasks involving time-varying
structured Electronic Health Records (EHRs) and increasingly clinically relevant
problems. We show that these benchmarking tasks display heterogeneous predictive
difficulties. We introduce a simple aggregation of static embeddings
--transferred from national claims and publicly available--, showing that it
outperforms transformer-based models on simple tasks with medium sample sizes.
We highlight the sample and computing resource efficiency of these models.
Finally, clinically relevant problems generally present a strong class
imbalance, which complicates models development and undermines their
performances. Further work is needed to understand if transformer-based models
perform well in these scenarios where the number of cases requires good sample
efficiency.

# Usage

See the [usage page on the documentation](https://soda-inria.github.io/predictive-ehr-benchmark//usage.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/soda-inria/predictive-ehr-benchmark

Awesome Lists containing this project

README