{"id":32207822,"url":"https://github.com/fbertran/selectboost","last_synced_at":"2025-10-22T05:59:44.047Z","repository":{"id":53606039,"uuid":"136206211","full_name":"fbertran/SelectBoost","owner":"fbertran","description":"A General Algorithm to Enhance the Performance of Variable Selection Methods in Correlated Datasets","archived":false,"fork":false,"pushed_at":"2025-09-24T10:29:23.000Z","size":180036,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-22T05:59:41.423Z","etag":null,"topics":["confidence","correlation","correlation-structure","modelling","precision","recall","selection-algorithm"],"latest_commit_sha":null,"homepage":"https://fbertran.github.io/SelectBoost/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fbertran.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-06-05T16:35:45.000Z","updated_at":"2025-09-24T10:29:27.000Z","dependencies_parsed_at":"2025-09-25T23:21:18.609Z","dependency_job_id":null,"html_url":"https://github.com/fbertran/SelectBoost","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/fbertran/SelectBoost","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbertran%2FSelectBoost","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbertran%2FSelectBoost/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbertran%2FSelectBoost/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbertran%2FSelectBoost/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fbertran","download_url":"https://codeload.github.com/fbertran/SelectBoost/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fbertran%2FSelectBoost/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280389301,"owners_count":26322507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["confidence","correlation","correlation-structure","modelling","precision","recall","selection-algorithm"],"created_at":"2025-10-22T05:59:37.679Z","updated_at":"2025-10-22T05:59:44.041Z","avatar_url":"https://github.com/fbertran.png","language":"R","readme":"\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\",\n  dpi=300,fig.width=7,\n  fig.keep=\"all\"\n)\n```\n\n# SelectBoost \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"200\"/\u003e\n\n# A General Algorithm to Enhance the Performance of Variable Selection Methods in Correlated Datasets\n## Frédéric Bertrand and Myriam Maumy-Bertrand\n\n\u003chttps://doi.org/10.32614/CRAN.package.SelectBoost\u003e\n\n\u003c!-- badges: start --\u003e\n[![DOI](https://img.shields.io/badge/doi-10.32614/CRAN.package.SelectBoost-blue.svg)](https://doi.org/10.32614/CRAN.package.SelectBoost)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![R-CMD-check](https://github.com/fbertran/SelectBoost/workflows/R-CMD-check/badge.svg)](https://github.com/fbertran/SelectBoost/actions)\n[![Codecov test coverage](https://codecov.io/gh/fbertran/SelectBoost/branch/master/graph/badge.svg)](https://app.codecov.io/gh/fbertran/SelectBoost?branch=master)\n[![CRAN status](https://www.r-pkg.org/badges/version/SelectBoost)](https://cran.r-project.org/package=SelectBoost)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/SelectBoost)](https://cran.r-project.org/package=SelectBoost)\n[![GitHub Repo stars](https://img.shields.io/github/stars/fbertran/SelectBoost?style=social)](https://github.com/fbertran/SelectBoost)\n\u003c!-- badges: end --\u003e\n\n\nThe SelectBoost package implements SelectBoost: a general algorithm to enhance the performance of variable selection methods \u003chttps://doi.org/10.1093/bioinformatics/btaa855\u003e, F. Bertrand, I. Aouadi, N. Jung, R. Carapito, L. Vallat, S. Bahram, M. Maumy-Bertrand (2015), \n\n\nWith the growth of big data, variable selection has become one of the major challenges in statistics. Although many methods have been proposed in the literature their performance in terms of recall and precision are limited in a context where the number of variables by far exceeds the number of observations or in a high correlated setting. \n\n\nResults: This package implements a new general algorithm which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or it can be used in an experimental design planning perspective.\n\n\nThis website and these examples were created by F. Bertrand and M. Maumy-Bertrand.\n\n## Installation\n\nYou can install the released version of SelectBoost from [CRAN](https://CRAN.R-project.org) with:\n\n```{r, eval = FALSE}\ninstall.packages(\"SelectBoost\")\n```\n\nYou can install the development version of SelectBoost from [github](https://github.com) with:\n\n```{r, eval = FALSE}\ndevtools::install_github(\"fbertran/SelectBoost\")\n```\n\nIf you are a Linux/Unix or a Macos user, you can install a version of SelectBoost with support for `doMC` from [github](https://github.com) with:\n\n```{r, eval = FALSE}\ndevtools::install_github(\"fbertran/SelectBoost\", ref = \"doMC\")\n```\n\n\n## Examples\n\n### First example: Simulated dataset \n\n#### Simulating data\nCreate a correlation matrix for two groups of variable with an intragroup correlation value of $cor\\_group$.\n```{r datasetsimulation1}\nlibrary(SelectBoost)\ngroup\u003c-c(rep(1:2,5))\ncor_group\u003c-c(.8,.4)\nC\u003c-simulation_cor(group,cor_group)\nC\n```\n\nSimulate predictor dataset witn $N=100$ observations.\n```{r datasetsimulation2, cache=TRUE}\nN\u003c-100\nX\u003c-simulation_X(N,C)\nhead(X)\n```\n\n$supp$ set the predictors that will be used to simulate the response (=true predictors). $minB$ and $maxB$ set the minimum and maximum absolute value for a $\\beta$ coefficient used in the model for the (true) predictors. $stn$ is a scaling factor for the noise in the response.\n```{r datasetsimulation3, cache=TRUE}\nsupp\u003c-c(1,1,1,0,0,0,0,0,0,0)\nminB\u003c-1\nmaxB\u003c-2\nstn\u003c-500\nDATA_exemple\u003c-simulation_DATA(X,supp,minB,maxB,stn)\nstr(DATA_exemple)\n```\n\n#### Selectboost analysis\n\nBy default `fastboost` performs $B=100$ resamplings of the model. As a result, we get a matrix with the proportions of selection of each variable at a given resampling level $c_0$. The resampling are designed to take into account the correlation structure of the predictors. The correlation used by default is the Pearson Correlation but any can be passed through the `corrfunc` argument. The $c_0$ value sets the minimum level for which correlations between two predictors are kept in the resampling process. The correlation structure is used to group the variables. Two groups functions `group_func_1`, grouping by thresholding the correlation matrix, and `group_func_2`, grouping using community selection, are available but any can be provided using the `group` argument of the function. The `func` argument is the variable selection function that should be used to assess variable memberships. It  defaults to `lasso_msgps_AICc` but many others, for instance for lasso, elastinet, logistic glmnet and network inference with the [Cascade package](https://fbertran.github.io/Cascade/), are provided:\n\n* lasso_cv_glmnet_bin_min(X, Y)\n* lasso_cv_glmnet_bin_1se(X, Y)\n* lasso_glmnet_bin_AICc(X, Y)\n* lasso_glmnet_bin_BIC(X, Y)\n* lasso_cv_lars_min(X, Y)\n* lasso_cv_lars_1se(X, Y)\n* lasso_cv_glmnet_min(X, Y)\n* lasso_cv_glmnet_min_weighted(X, Y, priors)\n* lasso_cv_glmnet_1se(X, Y)\n* lasso_cv_glmnet_1se_weighted(X, Y, priors)\n* lasso_msgps_Cp(X, Y, penalty = \"enet\")\n* lasso_msgps_AICc(X, Y, penalty = \"enet\")\n* lasso_msgps_GCV(X, Y, penalty = \"enet\")\n* lasso_msgps_BIC(X, Y, penalty = \"enet\")\n* enetf_msgps_Cp(X, Y, penalty = \"enet\", alpha = 0.5)\n* enetf_msgps_AICc(X, Y, penalty = \"enet\", alpha = 0.5)\n* enetf_msgps_GCV(X, Y, penalty = \"enet\", alpha = 0.5)\n* enetf_msgps_BIC(X, Y, penalty = \"enet\", alpha = 0.5)\n* lasso_cascade(M, Y, K, eps = 10^-5, cv.fun)\n\nUser defined functions can alse be specified in the `func` argument. See the vignette for an example of use with *adaptative* lasso.\n\nDefault steps for $c_0$\n```{r datasetsimulation4, cache=TRUE}\nquantile(abs(cor(DATA_exemple$X))[abs(cor(DATA_exemple$X))!=1],(0:10)/10)\n```\n\n```{r datasetsimulation4bis, cache=TRUE}\nresult.boost.raw = fastboost(DATA_exemple$X, DATA_exemple$Y)\nresult.boost.raw\n```\n\nApplying a non increasing post-processing step to the results improves the performance of the algorithm. \n```{r datasetsimulation4ter, cache=TRUE}\nresult.boost = force.non.inc(result.boost.raw)\nresult.boost\n```\n\n#### Comparing true and selected predictors\n\nWe can compute, for all the $c_0$ values and for a selection threshold varying from $1$ to $0.5$ by $0.05$ steps, the recall (sensitivity), the precision (positive predictive value), as well as several Fscores ($F_1$ harmonic mean of recall and precision, $F_{1/2}$ and $F_2$ two weighted harmonic means of recall and precision).\n```{r datasetsimulation5, cache=TRUE}\nAll_res=NULL\n#Here are the cutoff level tested\nfor(lev in 20:10/20){\nF_score=NULL\nfor(u in 1:nrow(result.boost)){\n\tF_score\u003c-rbind(F_score,SelectBoost::compsim(DATA_exemple,result.boost[u,],\n\t                                            level=lev)[1:5])\n}\nAll_res \u003c- abind::abind(All_res,F_score,along=3)\n}\n```\n\nFor a selection threshold equal to $0.90$, all the $c_0$ values and the 5 criteria.\n```{r datasetsimulation6, cache=TRUE, fig.keep='last'}\nmatplot(1:nrow(result.boost),All_res[,,3],type=\"l\",ylab=\"criterion value\",\n        xlab=\"c0 value\",xaxt=\"n\",lwd=2)\naxis(1, at=1:length(attr(result.boost,\"c0.seq\")),  \n     labels=round(attr(result.boost,\"c0.seq\"),3))\nlegend(x=\"topright\",legend=c(\"recall (sensitivity)\",\n      \"precision (positive predictive value)\",\"non-weighted Fscore\",\n      \"F1/2 weighted Fscore\",\"F2 weighted Fscore\"),lty=1:5,col=1:5,lwd=2)\n```\n\nFscores for all selection thresholds and all the $c_0$ values.\n```{r datasetsimulation7, cache=TRUE, fig.keep='last'}\nmatplot(1:nrow(result.boost),All_res[,3,],type=\"l\",ylab=\"Fscore\",\n        xlab=\"c0 value\",xaxt=\"n\",lwd=2,col=1:11,lty=1:11)\naxis(1, at=1:length(attr(result.boost,\"c0.seq\")),\n     labels=round(attr(result.boost,\"c0.seq\"),3))\nlegend(x=\"topright\",legend=(20:11)/20,lty=1:11,col=1:11,lwd=2,\n       title=\"Threshold\")\n```\n\n#### Complete Selectboost analysis\n\nWhat is the maximum number of steps ?\n```{r datasetsimulationc01res45, cache=TRUE}\nall.cors=unique(abs(cor(DATA_exemple$X))[abs(cor(DATA_exemple$X))!=1])\nlength(all.cors)\n```\n\nWith such datasets, we can perform all the 45 steps for the Selectboost analysis. We switch to  community analysis from the [igraph package](https://igraph.org) as the grouping variable function.\n```{r datasetsimulationc01res45bis, cache=TRUE}\ngroups.seq.f2=lapply(sort(unique(c(1,all.cors,0)),decreasing=TRUE), function(c0)\n  if(c0!=1){lapply(group_func_2(cor(DATA_exemple$X),c0)$communities,sort)}\n  else {lapply(group_func_2(cor(DATA_exemple$X),c0),sort)})\nnames(groups.seq.f2)\u003c-sort(unique(c(1,all.cors,0)),decreasing=TRUE)\ngroups.seq.f2[[1]]\n```\n\n```{r datasetsimulationc02res45, cache=TRUE}\nresult.boost.45.raw = fastboost(DATA_exemple$X, DATA_exemple$Y, B=100,\n                    steps.seq=sort(unique(all.cors),decreasing=TRUE))\nresult.boost.45.raw\n```\n\nApplying a non increasing post-processing step to the results improves the performance of the algorithm. \n```{r datasetsimulationc02res45bis, cache=TRUE}\nresult.boost.45 = force.non.inc(result.boost.45.raw)\nresult.boost.45\n```\n\n#### Comparing true and selected predictors\nDue to the effect of the correlated resampling, the proportion of selection for a variable may increase, especially if it is a variable that is often discarded. Hence, one should force those proportions of selection to be non-increasing. It is one of the results of the $summary$ function for the $selectboost$ class.\n\n```{r datasetsimulationc02res45dec, cache=TRUE}\ndec.result.boost.45 \u003c- summary(result.boost.45)$selectboost_result.dec\ndec.result.boost.45\n```\n\nLet's compute again, for all the $c_0$ values, the recall (sensitivity), precision (positive predictive value), and several Fscores ($F_1$ harmonic mean of recall and precision, $F_{1/2}$ and $F_2$ two weighted harmonic means of recall and precision).\n```{r datasetsimulationc03res45, cache=TRUE}\nAll_res.45=NULL\n#Here are the cutoff level tested\nfor(lev.45 in 20:10/20){\nF_score.45=NULL\nfor(u.45 in 1:nrow(dec.result.boost.45\n)){\n\tF_score.45\u003c-rbind(F_score.45,SelectBoost::compsim(DATA_exemple,\n\t           dec.result.boost.45[u.45,],level=lev.45)[1:5])\n}\nAll_res.45 \u003c- abind::abind(All_res.45,F_score.45,along=3)\n}\n```\n\nFor a selection threshold equal to $0.90$, all the $c_0$ values and the 5 criteria.\n```{r datasetsimulation6res45, cache=TRUE, fig.keep='last'}\nmatplot(1:nrow(dec.result.boost.45),All_res.45[,,3],type=\"l\",\n        ylab=\"criterion value\",xlab=\"c0 value\",xaxt=\"n\",lwd=2)\naxis(1, at=1:length(attr(result.boost.45,\"c0.seq\")), \n     labels=round(attr(result.boost.45,\"c0.seq\"),3))\nlegend(x=\"topright\",legend=c(\"recall (sensitivity)\",\n       \"precision (positive predictive value)\",\"non-weighted Fscore\",\n       \"F1/2 weighted Fscore\",\"F2 weighted Fscore\"),\n       lty=1:5,col=1:5,lwd=2)\n```\n\nFscores for all selection thresholds and all the $c_0$ values.\n```{r datasetsimulation7res45, cache=TRUE, fig.keep='last'}\nmatplot(1:nrow(dec.result.boost.45),All_res.45[,3,],type=\"l\",\n        ylab=\"Fscore\",xlab=\"c0 value\",xaxt=\"n\",lwd=2,col=1:11,lty=1:11)\naxis(1, at=1:length(attr(result.boost.45,\"c0.seq\")), \n     labels=round(attr(result.boost.45,\"c0.seq\"),3))\nlegend(x=\"topright\",legend=(20:11)/20,lty=1:11,col=1:11,lwd=2,\n       title=\"Threshold\")\n```\n\n#### Confidence indices.\n\nFirst compute the highest $c_0$ value for which the proportion of selection is under the threshold $thr$. In that analysis, we set $thr=1$.\n```{r datasetsimulation8res45, cache=TRUE}\nthr=1\nindex.last.c0=apply(dec.result.boost.45\u003e=thr,2,which.min)-1\nindex.last.c0\n```\n\nDefine some colorRamp ranging from blue (high confidence) to red (low confidence).\n```{r datasetsimulation8res45color, cache=TRUE}\njet.colors \u003c-\n  colorRamp(rev(c(\n  \"blue\", \"#007FFF\", \"#FF7F00\", \"red\", \"#7F0000\")))\n```\n\n```{r datasetsimulation9res45, cache=TRUE, fig.keep='all'}\nrownames(dec.result.boost.45)[index.last.c0]\nattr(result.boost.45,\"c0.seq\")[index.last.c0]\nconfidence.indices = c(0,1-attr(result.boost.45,\"c0.seq\"))[index.last.c0+1]\nconfidence.indices\nbarplot(confidence.indices,col=rgb(jet.colors(confidence.indices), maxColorValue = 255), \n        names.arg=colnames(result.boost.45), ylim=c(0,1))\n```\n\nFirst compute the highest $c_0$ value for which the proportion of selection is under the threshold $thr$. In that analysis, we set $thr=1$.\n```{r datasetsimulation9res45bis, cache=TRUE}\nthr=.9\nindex.last.c0=apply(dec.result.boost.45\u003e=thr,2,which.min)-1\nindex.last.c0\n```\n\n```{r datasetsimulation9res45bisbis, cache=TRUE, fig.keep='all'}\nrownames(dec.result.boost.45)[index.last.c0]\nattr(result.boost.45,\"c0.seq\")[index.last.c0]\nconfidence.indices = c(0,1-attr(result.boost.45,\"c0.seq\"))[index.last.c0+1]\nconfidence.indices\nbarplot(confidence.indices,col=rgb(jet.colors(confidence.indices), maxColorValue = 255), \n        names.arg=colnames(result.boost.45), ylim=c(0,1))\n```\n\n\n### Second example: biological network data\n#### Simulating data using real data\n\nThe loop should be used to generate at least 100 datasets and then average the results.\n```{r CascadeData, cache=TRUE, fig.keep='all'}\nrequire(CascadeData)\ndata(micro_S)\ndata(micro_US)\nmicro_US\u003c-Cascade::as.micro_array(micro_US,c(60,90,240,390),6)\nmicro_S\u003c-Cascade::as.micro_array(micro_S,c(60,90,240,390),6)\nS\u003c-Cascade::geneSelection(list(micro_S,micro_US),list(\"condition\",c(1,2),1),-1)\nrm(micro_S);data(micro_S)\nSel\u003c-micro_S[S@name,]\n\nsupp\u003c-c(1,1,1,1,1,rep(0,95))\nminB\u003c-1\nmaxB\u003c-2\nstn\u003c-5\n\nset.seed(3141)\nfor(i in 1:1){\nX\u003c-t(as.matrix(Sel[sample(1:1300 ,100),]))\nXnorm\u003c-t(t(X)/sqrt(diag(t(X)%*%X)))\nassign(paste(\"DATA_exemple3_nb_\",i,sep=\"\"),simulation_DATA(Xnorm,supp,minB,maxB,stn))\n}\n```\n\n```{r compcors, cache=TRUE, fig.keep='all'}\nall.cors.micro=unique(abs(cor(DATA_exemple3_nb_1$X))[abs(cor(\n  DATA_exemple3_nb_1$X))!=1])\nlength(unique(all.cors.micro))\nquantile(all.cors.micro,.90)\n```\n\n```{r findc0seq, cache=TRUE}\ntop10p.all.cors.micro=all.cors.micro[all.cors.micro\u003e=quantile(all.cors.micro,.90)]\nc0seq.top10p.all.cors.micro=quantile(top10p.all.cors.micro,rev(\n  seq(0,length(top10p.all.cors.micro),length.out = 50)/495))\nc0seq.top10p.all.cors.micro\n```\n\n```{r CascadeDatafastboost, cache=TRUE, fig.keep='all'}\nresult.boost.micro_nb1 = fastboost(DATA_exemple3_nb_1$X, DATA_exemple3_nb_1$Y, B=100, \n                                   steps.seq=c0seq.top10p.all.cors.micro)\nresult.boost.micro_nb1\n```\n\nThe summary function computes applies a non increasing post-processing step to the results to improve the performance of the algorithm. The results are store int the selectboost_result.dec entry of the summary.\n```{r datasetsimulationc02res45decbis, cache=TRUE}\ndec.result.boost.micro_nb1 \u003c- summary(result.boost.micro_nb1)$selectboost_result.dec\ndec.result.boost.micro_nb1\n```\n\n#### Confidence indices.\n\nFirst compute the highest $c_0$ value for which the proportion of selection is under the threshold $thr$. In that analysis, we set $thr=1$.\n```{r CascadeDataconfindices, cache=TRUE}\nthr=1\nindex.last.c0.micro_nb1=apply(dec.result.boost.micro_nb1\u003e=thr,2,which.min)-1\nindex.last.c0.micro_nb1\n```\n\nWe have to cap the confidence index value to the $1-\\{\\textrm{smallest } c_0\\}$ that we specified in the $c_0$ sequence and that was actually used for resampling. As a consequence, we have to exclude the $c_0=0$ case since we do not know what happen between $c0=\\mathrm{quantile}(cors,.9)$ and $c_0=0$.\n\n```{r CascadeDataconfindicesindices, cache=TRUE}\nindex.last.c0.micro_nb1 \u003c- pmin(index.last.c0.micro_nb1,\n                                nrow(dec.result.boost.micro_nb1)-1)\n```\n\nDefine some colorRamp ranging from blue (high confidence) to red (low confidence).\n```{r CascadeDatacolordef, cache=TRUE}\njet.colors \u003c-colorRamp(rev(c(\"blue\", \"#007FFF\", \"#FF7F00\", \"red\", \"#7F0000\")))\n```\n\n```{r CascadeDatabarplot, cache=TRUE, fig.keep='last'}\nrownames(dec.result.boost.micro_nb1)[index.last.c0.micro_nb1]\nattr(result.boost.micro_nb1,\"c0.seq\")[index.last.c0.micro_nb1]\nconfidence.indices.micro_nb1 = c(0,1-attr(result.boost.micro_nb1,\"c0.seq\"))[\n  index.last.c0.micro_nb1+1]\nconfidence.indices.micro_nb1\nbarplot(confidence.indices.micro_nb1,col=rgb(jet.colors(confidence.indices.micro_nb1),\nmaxColorValue = 255), names.arg=colnames(result.boost.micro_nb1), ylim=c(0,1))\nabline(h=)\n```\n\n\nLet's compute again, for all the $c_0$ values, the recall (sensitivity), precision (positive predictive value), and several Fscores ($F_1$ harmonic mean of recall and precision, $F_{1/2}$ and $F_2$ two weighted harmonic means of recall and precision).\n```{r datasetsimulationc03micro, cache=TRUE, fig.keep='last'}\nAll_micro_nb1=NULL\n#Here are the cutoff level tested\nfor(lev.micro_nb1 in 20:10/20){\nF_score.micro_nb1=NULL\nfor(u.micro_nb1 in 1:nrow(dec.result.boost.micro_nb1\n)){\n\tF_score.micro_nb1\u003c-rbind(F_score.micro_nb1,SelectBoost::compsim(DATA_exemple,\n\t                                                                dec.result.boost.micro_nb1[u.micro_nb1,],level=lev.micro_nb1)[1:5])\n}\nAll_micro_nb1 \u003c- abind::abind(All_micro_nb1,F_score.micro_nb1,along=3)\n}\n```\n\nFor a selection threshold equal to $0.90$, all the c0 values and the 5 criteria.\n```{r datasetsimulation6micro, cache=TRUE, fig.keep='last'}\nmatplot(1:nrow(dec.result.boost.micro_nb1),All_micro_nb1[,,3],type=\"l\",\n        ylab=\"criterion value\",xlab=\"c0 value\",xaxt=\"n\",lwd=2)\naxis(1, at=1:length(attr(result.boost.micro_nb1,\"c0.seq\")),\n     labels=round(attr(result.boost.micro_nb1,\"c0.seq\"),3))\nlegend(x=\"topright\",legend=c(\"recall (sensitivity)\",\n       \"precision (positive predictive value)\",\"non-weighted Fscore\",\n       \"F1/2 weighted Fscore\",\"F2 weighted Fscore\"),\n       lty=1:5,col=1:5,lwd=2)\n```\n\nFscores for all selection thresholds and all the $c_0$ values.\n```{r datasetsimulation7micro, cache=TRUE}\nmatplot(1:nrow(dec.result.boost.micro_nb1),All_micro_nb1[,3,],type=\"l\",\n        ylab=\"Fscore\",xlab=\"c0 value\",xaxt=\"n\",lwd=2,col=1:11,lty=1:11)\naxis(1, at=1:length(attr(result.boost.micro_nb1,\"c0.seq\")), \n     labels=round(attr(result.boost.micro_nb1,\"c0.seq\"),3))\nlegend(x=\"bottomright\",legend=(20:11)/20,lty=1:11,col=1:11,lwd=2,\n       title=\"Threshold\")\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffbertran%2Fselectboost","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffbertran%2Fselectboost","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffbertran%2Fselectboost/lists"}