{"id":34700950,"url":"https://github.com/mncube/swaprinc","last_synced_at":"2026-05-27T09:34:37.760Z","repository":{"id":151847866,"uuid":"622254079","full_name":"mncube/swaprinc","owner":"mncube","description":"Swap Principal Components into Regression Models","archived":false,"fork":false,"pushed_at":"2023-04-18T23:43:09.000Z","size":127,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-09T14:37:12.342Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mncube.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-01T15:16:03.000Z","updated_at":"2023-04-12T15:38:21.000Z","dependencies_parsed_at":"2025-09-08T15:23:47.243Z","dependency_job_id":"d47d3a83-e9ea-4582-9a88-463c043931df","html_url":"https://github.com/mncube/swaprinc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mncube/swaprinc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mncube%2Fswaprinc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mncube%2Fswaprinc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mncube%2Fswaprinc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mncube%2Fswaprinc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mncube","download_url":"https://codeload.github.com/mncube/swaprinc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mncube%2Fswaprinc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33560727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-24T22:51:55.603Z","updated_at":"2026-05-27T09:34:37.754Z","avatar_url":"https://github.com/mncube.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# swaprinc\n\n\u003c!-- badges: start --\u003e\n\u003c!-- badges: end --\u003e\n\nThe objective of swaprinc is to streamline the comparison between a regression \nmodel using original variables and a model in which some of these variables have \nbeen swapped out for principal components.\n\n## Installation\n\nYou can install the released version of swaprinc from [CRAN](https://CRAN.R-project.org) with:\n\n```{r eval=FALSE}\ninstall.packages(\"swaprinc\")\n```\n\n\nYou can install the development version of swaprinc from [GitHub](https://github.com/) with:\n\n```{r eval=FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"mncube/swaprinc\")\n```\n\n\n## A Simple Example\n\nIn the simple example provided, a regression model estimates the relationship \nbetween x1 and y, while controlling for variables x2 through x10.\n\nBy using the default engine, \"stats\", the statistical model is fitted with \nstats::lm, and by using the default prc_eng, \"stats\", principal components are \nextracted with stats::prcomp.\n\nThe \"raw model\" is specified by the formula parameter, which is passed to stats::lm. \nThe pca_vars and n_pca_components parameters indicate that variables x2 to x10 \nwill be used to extract three principal components. Subsequently, the \"PCA model\" \nis passed to stats::lm as follows: y ~ x1 + PC1 + PC2 + PC3.\n\nBy setting the lpca_center and lpca_scale parameters to 'pca', the data in pca_vars \nwill be centered and scaled according to the guidelines in the \n[Step-by-Step PCA](https://cran.r-project.org/package=LearnPCA/vignettes/Vig_03_Step_By_Step_PCA.pdf) \nvignette before being passed to stats::prcomp. The miss_handler parameter, set to \n'omit', ensures that only complete cases are included by subsetting the data frame \nrows with stats::complete.cases.\n\n```{r simple}\nlibrary(swaprinc)\n\n  # Create a small simulated dataset\n  set.seed(40)\n  n \u003c- 50\n  x1 \u003c- rnorm(n)\n  x2 \u003c- rnorm(n, 5, 15)\n  x3 \u003c- rnorm(n, -5.5, 20)\n  x4 \u003c- rnorm(n, 3, 3) + x3*1.5\n  x5 \u003c- rnorm(n, -2, 4) + x3*.25\n  x6 \u003c- rnorm(n, -5, 5) + x4\n  x7 \u003c- rnorm(n, -2, 6)\n  x8 \u003c- rnorm(n, 2, 7)\n  x9 \u003c- rnorm(n, -2, 3) +x2*.4\n  x10 \u003c- rnorm(n, 5, 4)\n  y \u003c- 1 + 2 * x1 + 3 * x2 + 2.5*x4 - 3.5*x5 + 2*x6 + 1.5*x7 + x8 + 2*x9 + x10 + rnorm(n)\n  data \u003c- data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)\n\n  # Run swaprinc with\n  swaprinc_result \u003c- swaprinc(data,\n                              formula = \"y ~ x1 + x2 + x3 + x4 + x5 + x5 + x6 + x7 + x8 + x9 + x10\",\n                              pca_vars = c(\"x2\", \"x3\", \"x4\", \"x5\", \"x6\", \"x7\", \"x8\", \"x9\", \"x10\"),\n                              n_pca_components = 3,\n                              lpca_center = \"pca\", \n                              lpca_scale = \"pca\",\n                              miss_handler = \"omit\")\n  \n  # Summarize raw model\n  summary(swaprinc_result$model_raw)\n  \n  # Summarize pca model\n  summary(swaprinc_result$model_pca)\n  \n  # Get model comparisons\n  print(swaprinc_result$comparison)\n\n```\n\n\n## The Motivating Example\n\nA common challenge in applied statistics and data science involves performing \nlogistic regression with a set of categorical independent variables. In this \nmotivating example, swaprinc is employed to compare a 'raw' logistic regression \nmodel containing seven categorical independent variables with a 'pca' logistic \nregression model. The latter model replaces six of the independent variables with \ntheir first three principal components, using Gifi::princals to extract principal\ncomponents. For a comprehensive tutorial on Gifi, refer to \n[Nonlinear Principal Components Analysis: Multivariate Analysis with Optimal Scaling (MVAOS)](https://www.css.cornell.edu/faculty/dgr2/_static/files/R_html/NonlinearPCA.html#2_Package).\n\nI recommend using the 'broom' and 'broom.mixed' packages to summarize model \nresults when utilizing the '*_options' parameters for passing arguments to \nfunctions within 'swaprinc'. This approach helps prevent \n[overly extensive summaries caused by 'do.call'](https://stackoverflow.com/questions/75512192/r-do-call-function-returns-to-much/75512429#75512429).\n\n```{r motivation}\n # Create a small simulated dataset\n  set.seed(42)\n  n \u003c- 50\n  x1 \u003c- rnorm(n, 0.5, 4)\n  x2 \u003c- rnorm(n, 3, 15)\n  x3 \u003c- rnorm(n, -2.5, 5)\n  x4 \u003c- -2.5*x2 + 3*x3 + rnorm(n, 0, 4)\n  x5 \u003c- x2*x3 + rnorm(n, -5, 5)*rnorm(n, 5, 10)\n  x6 \u003c- rnorm(n, -2, 4)*rnorm(n, 3, 5)\n  x7 \u003c- x4 + x6 + rnorm(n, 0, 3)\n  y \u003c- 1 + 2*x1 + 3*x2 + -2*x3 + .5*x4 + x5 + 1.5*x6 + x7 + rnorm(n)\n  data \u003c- data.frame(y, x1, x2, x3, x4, x5, x6, x7)\n\n  # Categorize the variables\n  yq \u003c- stats::quantile(data$y,c(0,1/2, 1))\n  x1q \u003c- stats::quantile(data$x1,c(0,1/2, 1))\n  x2q \u003c- stats::quantile(data$x2,c(0,1/4,3/4,1))\n  x3q \u003c- stats::quantile(data$x3,c(0,2/5,3/5,1))\n  x4q \u003c- stats::quantile(data$x4,c(0,1/5,4/5,1))\n  x5q \u003c- stats::quantile(data$x5,c(0,2/5,3/5,1))\n  x6q \u003c- stats::quantile(data$x6,c(0,2/5,4/5,1))\n  x7q \u003c- stats::quantile(data$x7,c(0,2/5,3/5,1))\n\n\n  data \u003c- data %\u003e% dplyr::mutate(\n    y = cut(y, breaks=yq, labels=c(\"0\", \"1\"),include.lowest = TRUE),\n    x1 = cut(x1, breaks=x1q, labels=c(\"control\", \"treatment\"),include.lowest = TRUE),\n    x2 = cut(x2, breaks=x2q, labels=c(\"small\",\"medium\",\"large\"),include.lowest = TRUE),\n    x3 = cut(x3, breaks=x3q, labels=c(\"short\",\"average\",\"tall\"),include.lowest = TRUE),\n    x4 = cut(x4, breaks=x4q, labels=c(\"lowbit\",\"most\",\"highbit\"),include.lowest = TRUE),\n    x5 = cut(x5, breaks=x5q, labels=c(\"under\",\"healthy\",\"over\"),include.lowest = TRUE),\n    x6 = cut(x6, breaks=x6q, labels=c(\"small\",\"medium\",\"large\"),include.lowest = TRUE),\n    x7 = cut(x7, breaks=x7q, labels=c(\"small\",\"medium\",\"large\"),include.lowest = TRUE)) %\u003e%\n    dplyr::mutate(y = as.numeric(ifelse(y == \"0\", 0, 1)))\n\n  # Run swaprinc with prc_eng set to Gifi\n  swaprinc_result \u003c- swaprinc(data,\n                              formula = \"y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7\",\n                              pca_vars = c(\"x2\", \"x3\", \"x4\", \"x5\", \"x6\", \"x7\"),\n                              n_pca_components = 3,\n                              prc_eng = \"Gifi\",\n                              model_options = list(family = binomial(link = \"logit\")))\n  \n  # Summarize raw model\n  broom::tidy(swaprinc_result$model_raw)\n  \n  # Summarize pca model\n  broom::tidy(swaprinc_result$model_pca)\n  \n  # Get model comparisons\n  print(swaprinc_result$comparison)\n```\n\n\n## Compare Multiple Models\n\nUtilizing the same dataset as in the logistic regression model mentioned earlier, \nit is beneficial to compare outcomes for various swaps. In the example below, \nthe compswap helper function facilitates the comparison of results with 2, 3, 4, \nand 5 principal components replacing six original independent variables. \n\n```{r compswap}\n  # Run swaprinc with prc_eng set to Gifi\n  compswap_results \u003c- compswap(data,\n                              formula = \"y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7\",\n                              .pca_varlist = list(c(\"x2\", \"x3\", \"x4\", \"x5\", \"x6\", \"x7\")),\n                              .n_pca_list = list(2, 3, 4, 5),\n                              .prc_eng_list = list(\"Gifi\"),\n                              .model_options_list = list(list(family = binomial(link = \"logit\"))))\n\n  # Show available models\n  summary(compswap_results$all_models)\n  \n  # Get model comparisons\n  print(compswap_results$all_comparisons)\n  \n  # View model summaries\n  lapply(compswap_results$all_models, broom::tidy)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmncube%2Fswaprinc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmncube%2Fswaprinc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmncube%2Fswaprinc/lists"}