https://github.com/corybrunson/aatd-score

build risk scores for genetic abnormalities associated with alpha-1 antitrypsin deficiency
https://github.com/corybrunson/aatd-score

Last synced: 2 months ago
JSON representation

build risk scores for genetic abnormalities associated with alpha-1 antitrypsin deficiency

Host: GitHub
URL: https://github.com/corybrunson/aatd-score
Owner: corybrunson
Created: 2022-06-11T12:27:11.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-11-20T16:18:07.000Z (over 1 year ago)
Last Synced: 2025-02-05T00:41:43.229Z (4 months ago)
Language: R
Size: 125 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Risk scores for severe AATD

This repo contains code used in [the study by Riley, Brunson, Eydgahi, Brantly, and Lascano](https://openres.ersjournals.com/content/9/5/00302-2023).[^1]

## Objective

The aim of the study was to design an interpretable clinical decision support tool to inform decision to screen for abnormal AAT genotype.
The modeling goal was to predict abnormal genotype from history of hepatic disease, respiratory disease, and (optionally) smoking.

## Data source

This project relied on data collected and provided by the National Detection Program based at the University of Florida and GeneAidyx.

## Software dependencies

The project used [Tidyverse](https://www.tidyverse.org/) for general data science and [Tidymodels](https://www.tidymodels.org/) for machine learning.
Additional risk models were built using [FasterRisk](https://github.com/jiachangliu/FasterRisk).

Most scripts source project-wide settings from 'code/settings.r'.
Due to FasterRisk being implemented in Python and not accessible through Tidymodels, separate scripts were prepared to build and analyze these models.

## Model selection

The first round of analysis (the scripts '1-*.r') compared the performance of several families of predictive models on a subset of the data, using a simple train--test partition without hyperparameter optimization.
The goal was to inform decisions about a cross-validated modeling effort using the complete data set.
Specifically, we used this preliminary analysis to
(1) select predictors to consider,
(2) decide what interpretable model families to use, and
(3) how much predictive performance might be weakened by using interpretable rather than non-interpretable models.

## Model optimization

The second round of analysis (the scripts '2-*.r') conducted a stratified 6-fold cross-validated comparison of models selected based on the results of the first round:
(1) as predictors, either lung and liver history alone or together with either gender or smoking history; and
(2) as model families, logistic regression, random forest, nearest neighbor, and FasterRisk, together with a stand-in for the current guidance to screen patients with COPD.
This round used a majority training set of the data to determine optimal hyperparameter settings, with the remaining testing set held out for evaluation in the third round.

## Model evaluation

The third and final round of analysis (the scripts '3-*.r') fitted the optimized model families and specifications to the full training set from the second round and evaluated them on the testing set.

## Contact

Please email [Cory Brunson](https://github.com/corybrunson/) with any question about the analysis and code.

[^1]: [Riley EL, Brunson JC, Eydgahi S, Brantly ML, Lascano JE. Development of a risk score to increase detection of severe alpha-1 antitrypsin deficiency. _ERJ Open Res_. 2023 Sep 18;9(5):00302-2023. doi: 10.1183/23120541.00302-2023. PMID: 37727673; PMCID: PMC10505949.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10505949/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/corybrunson/aatd-score

Awesome Lists containing this project

README