{"id":32204030,"url":"https://github.com/blasbenito/collinear","last_synced_at":"2025-10-22T04:50:23.838Z","repository":{"id":197147296,"uuid":"698046617","full_name":"BlasBenito/collinear","owner":"BlasBenito","description":"R package to manage multicollinearity in modeling data frames.","archived":false,"fork":false,"pushed_at":"2025-09-23T16:35:34.000Z","size":22042,"stargazers_count":13,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-23T18:27:48.210Z","etag":null,"topics":["machine-learning","multicollinearity","r-package","statistics"],"latest_commit_sha":null,"homepage":"https://blasbenito.github.io/collinear/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BlasBenito.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-09-29T03:14:48.000Z","updated_at":"2025-09-11T09:01:04.000Z","dependencies_parsed_at":"2024-01-27T05:31:36.415Z","dependency_job_id":"da1ec600-cf3a-4eec-b839-003091cd9770","html_url":"https://github.com/BlasBenito/collinear","commit_stats":null,"previous_names":["blasbenito/collinear"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/BlasBenito/collinear","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlasBenito%2Fcollinear","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlasBenito%2Fcollinear/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlasBenito%2Fcollinear/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlasBenito%2Fcollinear/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BlasBenito","download_url":"https://codeload.github.com/BlasBenito/collinear/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BlasBenito%2Fcollinear/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382978,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","multicollinearity","r-package","statistics"],"created_at":"2025-10-22T04:50:22.836Z","updated_at":"2025-10-22T04:50:23.833Z","avatar_url":"https://github.com/BlasBenito.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  eval = TRUE,\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n# options(tibble.print_min = 5, tibble.print_max = 5)\n```\n\n\n# `collinear` \\n Seamless Multicollinearity Management \u003ca href=\"https://github.com/BlasBenito/collinear\"\u003e\u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"138\" /\u003e\u003c/a\u003e\n\n\n\n\u003c!-- Development badges \n\n[![R-CMD-check](https://github.com/BlasBenito/collinear/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/BlasBenito/collinear/actions/workflows/R-CMD-check.yaml)\n[![Devel-version](https://img.shields.io/badge/devel%20version-1.0.1-blue.svg)](https://github.com/blasbenito/collinear)\n\n\u003c!-- badges: start --\u003e\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10039489.svg)](https://doi.org/10.5281/zenodo.10039489)\n[![CRAN status](https://www.r-pkg.org/badges/version/collinear)](https://cran.r-project.org/package=collinear)\n[![CRAN\\_Download\\_Badge](http://cranlogs.r-pkg.org/badges/grand-total/collinear)](https://CRAN.R-project.org/package=collinear)\n[![R-CMD-check](https://github.com/BlasBenito/collinear/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/BlasBenito/collinear/actions/workflows/R-CMD-check.yaml)\n\n\u003c!-- badges: end --\u003e\n\n## Warning\n\nVersion 2.0.0 of `collinear` includes changes that may disrupt existing workflows, and results from previous versions may not be reproducible due to enhancements in the automated selection algorithms. Please refer to the Changelog for details.\n\n## Summary\n\n[Multicollinearity hinders the interpretability](https://www.blasbenito.com/post/multicollinearity-model-interpretability/) of linear and machine learning models.\n\nThe `collinear` package combines four methods for easy management of multicollinearity in modelling data frames with numeric and categorical variables:\n\n- **Target Encoding**: Transforms categorical predictors to numeric using a numeric response as reference.\n- **Preference Order**: Ranks predictors by their association with a response variable to preserve important ones in multicollinearity filtering.\n- **Pairwise Correlation Filtering**: Automated multicollinearity filtering of numeric and categorical predictors based on pairwise correlations.\n- **Variance Inflation Factor Filtering**: Automated multicollinearity filtering of numeric predictors based on Variance Inflation Factors.\n\nThese methods are combined in the function `collinear()`, which serves as single entry point for most of the functionalities in the package. The article [How It Works](https://blasbenito.github.io/collinear/articles/how_it_works.html) explains how `collinear()` works in detail.\n\n## Citation\n\nIf you find this package useful, please cite it as:\n\n*Blas M. Benito (2024). collinear: R Package for Seamless Multicollinearity Management. Version 2.0.0. doi: 10.5281/zenodo.10039489*\n\n## Main Improvements in Version 2.0.0\n\n1. **Expanded Functionality**: Functions `collinear()` and `preference_order()` support both categorical and numeric responses and predictors, and can handle several responses at once.\n2. **Robust Selection Algorithms**: Enhanced selection in `vif_select()` and `cor_select()`.\n3. **Enhanced Functionality to Rank Predictors**: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.\n4. **Simplified Target Encoding**: Streamlined and parallelized for better efficiency, and new default is \"loo\" (leave-one-out).\n5. **Parallelization and Progress Bars**: Utilizes `future` and `progressr` for enhanced performance and user experience.\n\n\n## Install\n\nThe package `collinear` can be installed from CRAN.\n\n```{r, eval = FALSE}\ninstall.packages(\"collinear\")\n```\n\n\nThe development version can be installed from GitHub.\n\n```{r, eval = FALSE}\nremotes::install_github(\n  repo = \"blasbenito/collinear\", \n  ref = \"development\"\n  )\n```\n\n\nPrevious versions are in the “archive_xxx” branches of the GitHub repository.\n\n```{r, eval = FALSE}\nremotes::install_github(\n  repo = \"blasbenito/collinear\", \n  ref = \"archive_v1.1.1\"\n  )\n```\n\n\n```{r packages, message = FALSE, warning = FALSE, include = FALSE}\nlibrary(collinear)\nlibrary(future)\nlibrary(parallelly)\n```\n\n\n## Getting Started\n\nThe function `collinear()` provides all tools required for a fully fledged multicollinearity filtering workflow. The code below shows a small example workflow.\n\n```{r}\n#parallelization setup\nfuture::plan(\n  future::multisession,\n  workers = parallelly::availableCores() - 1\n  )\n\n#progress bar (does not work in Rmarkdown)\n#progressr::handlers(global = TRUE)\n\n#example data frame\ndf \u003c- collinear::vi[1:5000, ]\n\n#there are many NA cases in this data frame\nsum(is.na(df))\n```\n\n```{r}\n#numeric and categorical predictors\npredictors \u003c- collinear::vi_predictors\n\ncollinear::identify_predictors(\n  df = df,\n  predictors = predictors\n)\n```\n\n```{r}\n#multicollinearity filtering\nselection \u003c- collinear::collinear(\n  df = df,\n  response = c(\n    \"vi_numeric\",    #numeric response\n    \"vi_categorical\" #categorical response\n    ),\n  predictors = predictors,\n  max_cor = 0.75,\n  max_vif = 5,\n  quiet = TRUE\n)\n```\n\nThe output is a named list of vectors with selected predictor names when more than one response is provided, and a character vector otherwise.  \n\n```{r}\nselection\n```\n\nThe output of `collinear()` can be easily converted into model formulas.\n\n```{r}\nformulas \u003c- collinear::model_formula(\n  predictors = selection\n)\n\nformulas\n```\n\nThese formulas can be used to fit models right away.\n\n```{r, eval = FALSE}\n#linear model\nm_vi_numeric \u003c- stats::glm(\n  formula = formulas[[\"vi_numeric\"]], \n  data = df,\n  na.action = na.omit\n  )\n\n#random forest model\nm_vi_categorical \u003c- ranger::ranger(\n  formula = formulas[[\"vi_categorical\"]],\n  data = na.omit(df)\n)\n```\n\n## Getting help\n\nIf you encounter bugs or issues with the documentation, please [file a issue on GitHub](https://github.com/BlasBenito/collinear/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblasbenito%2Fcollinear","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblasbenito%2Fcollinear","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblasbenito%2Fcollinear/lists"}