Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mcol/hsstan

An R package for biomarker discovery using Bayesian models implemented in Stan
https://github.com/mcol/hsstan

bayesian feature-selection mcmc r-package

Last synced: about 2 months ago
JSON representation

An R package for biomarker discovery using Bayesian models implemented in Stan

Host: GitHub
URL: https://github.com/mcol/hsstan
Owner: mcol
License: gpl-3.0
Created: 2019-08-13T14:26:11.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2024-09-13T07:54:23.000Z (4 months ago)
Last Synced: 2024-10-13T23:46:10.230Z (3 months ago)
Topics: bayesian, feature-selection, mcmc, r-package
Language: R
Homepage:
Size: 488 KB
Stars: 6
Watchers: 3
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

        # Hierarchical Shrinkage Stan Models for Biomarker Selection

[![CRAN\_Status\_Badge](https://www.r-pkg.org/badges/version/hsstan)](https://cran.r-project.org/package=hsstan)

[![CRAN\_Downloads\_Badge](https://cranlogs.r-pkg.org/badges/hsstan)](https://cran.r-project.org/package=hsstan)

The **hsstan** package provides linear and logistic regression models penalized

with hierarchical shrinkage priors for selection of biomarkers. Models are

fitted with [Stan](https://mc-stan.org), which allows to perform full Bayesian

inference ([Carpenter et al. (2017)](https://doi.org/10.18637/jss.v076.i01)).

It implements the horseshoe and regularized horseshoe priors ([Piironen and

Vehtari (2017)](https://doi.org/10.1214/17-EJS1337SI)), and the projection

predictive selection approach to recover a sparse set of predictive biomarkers

([Piironen, Paasiniemi and Vehtari (2020)](https://doi.org/10.1214/20-EJS1711)).

The approach is particularly suited to selection from high-dimensional panels

of biomarkers, such as those that can be measured by MSMS or similar technologies.

### Example

```r

library(hsstan)

data(diabetes)

## if possible, allow using as many cores as cross-validation folds

options(mc.cores=10)

## baseline model with only clinical covariates

hs.base <- hsstan(diabetes, Y ~ age + sex)

## model with additional predictors

hs.biom <- hsstan(diabetes, Y ~ age + sex, penalized=colnames(diabetes)[3:10])

print(hs.biom)

#              mean   sd  2.5% 97.5% n_eff Rhat

# (Intercept)  0.00 0.03 -0.07  0.07  4483    1

# age          0.00 0.04 -0.07  0.08  4706    1

# sex         -0.15 0.04 -0.22 -0.08  5148    1

# bmi          0.33 0.04  0.25  0.41  4228    1

# map          0.20 0.04  0.12  0.28  3571    1

# tc          -0.45 0.25 -0.94  0.04  3713    1

# ldl          0.28 0.20 -0.12  0.68  3674    1

# hdl          0.01 0.12 -0.23  0.25  3761    1

# tch          0.07 0.08 -0.06  0.25  4358    1

# ltg          0.43 0.11  0.22  0.64  3690    1

# glu          0.02 0.03 -0.03  0.10  3034    1

## behaviour of the sampler

sampler.stats(hs.base)

#         accept.stat stepsize divergences treedepth gradients warmup sample

# chain:1      0.9497   0.5723           0         3      6320   0.09   0.08

# chain:2      0.9357   0.6480           0         3      5938   0.09   0.08

# chain:3      0.9455   0.6014           0         3      6112   0.09   0.08

# chain:4      0.9488   0.5932           0         3      6238   0.09   0.08

# all          0.9449   0.6037           0         3     24608   0.36   0.32

sampler.stats(hs.biom)

#         accept.stat stepsize divergences treedepth gradients warmup sample

# chain:1      0.9821   0.0191           0         8    233656   5.04   4.28

# chain:2      0.9891   0.0158           1         8    255994   5.88   4.72

# chain:3      0.9908   0.0143           0         9    274328   5.77   5.14

# chain:4      0.9933   0.0121           0         9    344984   5.98   6.70

# all          0.9888   0.0153           1         9   1108962  22.67  20.84

## approximate leave-one-out cross-validation with Pareto smoothed

## importance sampling

loo(hs.base)

# Computed from 4000 by 442 log-likelihood matrix

#          Estimate   SE

# elpd_loo   -622.4 11.4

# p_loo         3.4  0.2

# looic      1244.9 22.7

# ------

# Monte Carlo SE of elpd_loo is 0.0.

#

# All Pareto k estimates are good (k < 0.5).

loo(hs.biom)

# Computed from 4000 by 442 log-likelihood matrix

#          Estimate   SE

# elpd_loo   -476.5 13.7

# p_loo         9.8  0.7

# looic       953.0 27.5

# ------

# Monte Carlo SE of elpd_loo is 0.1.

#

# All Pareto k estimates are good (k < 0.5).

## run 10-folds cross-validation

set.seed(1)

folds <- caret::createFolds(diabetes$Y, k=10, list=FALSE)

cv.base <- kfold(hs.base, folds=folds)

cv.biom <- kfold(hs.biom, folds=folds)

## cross-validated performance

round(posterior_performance(cv.base), 2)

#        mean   sd    2.5%   97.5%

# r2     0.02 0.00    0.01    0.03

# llk -623.14 1.67 -626.61 -620.13

# attr(,"type")

# [1] "cross-validated"

round(posterior_performance(cv.biom), 2)

#        mean   sd    2.5%   97.5%

# r2     0.48 0.01    0.47    0.50

# llk -482.86 3.76 -490.45 -476.56

# attr(,"type")

# [1] "cross-validated"

## projection predictive selection

sel.biom <- projsel(hs.biom)

print(sel.biom, digits=4)

#                 var       kl rel.kl.null rel.kl   elpd delta.elpd

# 1    Intercept only 0.352283     0.00000     NA -627.3 -155.84260

# 2  Initial submodel 0.333156     0.05429 0.0000 -619.8 -148.39729

# 3               bmi 0.138629     0.60648 0.5839 -533.1  -61.69199

# 4               ltg 0.058441     0.83411 0.8246 -492.5  -21.09681

# 5               map 0.035970     0.89789 0.8920 -482.7  -11.25515

# 6               hdl 0.010304     0.97075 0.9691 -473.9   -2.41192

# 7                tc 0.005292     0.98498 0.9841 -472.2   -0.72490

# 8               ldl 0.002444     0.99306 0.9927 -471.8   -0.38292

# 9               tch 0.001105     0.99686 0.9967 -471.5   -0.07819

# 10              glu 0.000000     1.00000 1.0000 -471.4    0.00000

```

### References

* [M. Colombo][mcol], A. Asadi Shehni, I. Thoma et al.,

  Quantitative levels of serum N-glycans in type 1 diabetes and their

  association with kidney disease,

  [_Glycobiology_ (2021) 31 (5): 613-623](https://doi.org/10.1093/glycob/cwaa106).

* [M. Colombo][mcol], S.J. McGurnaghan, L.A.K. Blackbourn et al.,

  Comparison of serum and urinary biomarker panels with albumin creatinin

  ratio in the prediction of renal function decline in type 1 diabetes,

  [_Diabetologia_ (2020) 63 (4): 788-798](https://doi.org/10.1007/s00125-019-05081-8).

* [M. Colombo][mcol], E. Valo, S.J. McGurnaghan et al.,

  Biomarkers associated with progression of renal disease in type 1 diabetes,

  [_Diabetologia_ (2019) 62 (9): 1616-1627](https://doi.org/10.1007/s00125-019-4915-0).

[mcol]: https://github.com/mcol