Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soda-inria/predictive-ehr-benchmark
Exploring a complexity gradient in representation and predictive algorithms for EHRs
https://github.com/soda-inria/predictive-ehr-benchmark
Last synced: 1 day ago
JSON representation
Exploring a complexity gradient in representation and predictive algorithms for EHRs
- Host: GitHub
- URL: https://github.com/soda-inria/predictive-ehr-benchmark
- Owner: soda-inria
- License: other
- Created: 2023-09-13T15:12:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-14T11:59:16.000Z (over 1 year ago)
- Last Synced: 2024-11-06T21:19:16.271Z (about 2 months ago)
- Language: Python
- Homepage: https://soda-inria.github.io/predictive-ehr-benchmark/
- Size: 76 MB
- Stars: 0
- Watchers: 7
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Predictive algorithms from Electronic Health Records
This repository hosts code for the working paper: *Exploring a complexity gradient in representation and predictive algorithms for EHRs*
[**Documentation**](https://soda-inria.github.io/predictive-ehr-benchmark/)
[**Source Code**](https://github.com/soda-inria/predictive-ehr-benchmark)
[**Working Paper repository**](https://github.com/strayMat/predictive_ehr_paper)
### Abstract
Electronic Health Records contain time-varying features with high cardinality.
Current state-of-the-art predictive models build on increasingly elaborated
pipelines --based on transformers-- to handle the complexity of these data.
Acknowledging the complexity to deploy, transfer and adapt these models on local
care environments, we explore a complexity-benefit tradeoff by comparing them to
simple aggregation of events. We use three clinical tasks involving time-varying
structured Electronic Health Records (EHRs) and increasingly clinically relevant
problems. We show that these benchmarking tasks display heterogeneous predictive
difficulties. We introduce a simple aggregation of static embeddings
--transferred from national claims and publicly available--, showing that it
outperforms transformer-based models on simple tasks with medium sample sizes.
We highlight the sample and computing resource efficiency of these models.
Finally, clinically relevant problems generally present a strong class
imbalance, which complicates models development and undermines their
performances. Further work is needed to understand if transformer-based models
perform well in these scenarios where the number of cases requires good sample
efficiency.# Usage
See the [usage page on the documentation](https://soda-inria.github.io/predictive-ehr-benchmark//usage.html)