{"id":18612418,"url":"https://github.com/winvector/cvrtsencoder","last_synced_at":"2025-11-02T23:30:25.867Z","repository":{"id":142726359,"uuid":"181114836","full_name":"WinVector/CVRTSEncoder","owner":"WinVector","description":"Spectral encoding of categorical variables using model residual trajectories","archived":false,"fork":false,"pushed_at":"2020-06-22T20:57:03.000Z","size":2998,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-27T02:14:03.770Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://winvector.github.io/CVRTSEncoder/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WinVector.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-13T03:15:57.000Z","updated_at":"2023-07-25T14:25:04.000Z","dependencies_parsed_at":"2023-05-18T00:00:41.566Z","dependency_job_id":null,"html_url":"https://github.com/WinVector/CVRTSEncoder","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2FCVRTSEncoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2FCVRTSEncoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2FCVRTSEncoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2FCVRTSEncoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WinVector","download_url":"https://codeload.github.com/WinVector/CVRTSEncoder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239402818,"owners_count":19632459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T03:17:02.310Z","updated_at":"2025-11-02T23:30:25.681Z","avatar_url":"https://github.com/WinVector.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n\n```{r, echo = FALSE, warning=FALSE, message=FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \" # \",\n  fig.path = \"tools/README-\"\n)\n```\n\n\n[`CVRTSEncoder`](https://github.com/WinVector/CVRTSEncoder) is a categorical variable encoding for supervised learning.\n\nThis package is still in a research and development mode.  Functionality and interfaces may change.\n\nRe-encode a set of categorical variables jointly as a spectral projection of the trajectory of modeling residuals.  This is intended as a succinct numeric linear representation of a set of categorical variables in a manner that is useful for supervised learning.\n\nThe concept is y-aware encoding the trajectory of non-linear model residuals in terms of target categorical variables.\n\nThe idea is an extension of the [`vtreat`](https://github.com/WinVector/vtreat) coding [concepts](https://github.com/WinVector/vtreat/blob/master/extras/vtreat.pdf), the re-encoding concepts of [JavaLogistic](https://github.com/WinVector/Logistic), and of the y-aware scaling concepts of Nina Zumel and John Mount:\n\n  * [Principal Components Regression, Pt.1: The Standard Method](http://www.win-vector.com/blog/2016/05/pcr_part1_xonly/)\n  * [Principal Components Regression, Pt. 2: Y-Aware Methods](http://www.win-vector.com/blog/2016/05/pcr_part2_yaware/)\n  * [Principal Components Regression, Pt. 3: Picking the Number of Components](http://www.win-vector.com/blog/2016/05/pcr_part3_pickk/)\n  * [y-aware scaling in context](http://www.win-vector.com/blog/2016/06/y-aware-scaling-in-context/).\n  \n  \nThe core idea is: other models factor the quantity to be explained into an explainable versus residual portion (with respect to the given model).  Each of these components are possibly useful for modeling.\n\n```{r example}\nlibrary(\"CVRTSEncoder\")\nlibrary(\"wrapr\")\n\ndata \u003c- iris\navars \u003c- c(\"Sepal.Length\", \"Petal.Length\")\nevars \u003c- c(\"Sepal.Width\", \"Petal.Width\")\ndep_var \u003c- \"Species\"\ndep_target \u003c- \"versicolor\"\nfor(vi in evars) {\n  data[[vi]] \u003c- as.character(round(data[[vi]]))\n}\nstr(data)\n\ncross_enc \u003c- estimate_residual_encoding_c(\n  data = data,\n  avars = avars,\n  evars = evars,\n  dep_var = dep_var,\n  dep_target = dep_target,\n  n_comp = 4\n)\nenc \u003c- prepare(cross_enc$coder, data)\ndata \u003c- cbind(data, enc)\ndata %.\u003e%\n  head(.) %.\u003e% \n  knitr::kable(.)\n\nf0 \u003c- wrapr::mk_formula(dep_var, avars, outcome_target = dep_target)\nprint(f0)\n\nmodel0 \u003c- glm(f0, data = data, family = binomial)\nsummary(model0)\n\ndata$pred0 \u003c- predict(model0, newdata = data, type = \"response\")\ntable(data$Species, data$pred0\u003e0.5)\n\nnewvars \u003c- c(avars, colnames(enc))\nf \u003c- wrapr::mk_formula(dep_var, newvars, outcome_target = dep_target)\nprint(f)\n\nmodel \u003c- glmnet::cv.glmnet(as.matrix(data[, newvars, drop = FALSE]), \n                           as.numeric(data[[dep_var]]==dep_target), \n                           family = \"binomial\")\ncoef(model, lambda = \"lambda.min\")\ndata$pred \u003c- as.numeric(predict(model, newx = as.matrix(data[, newvars, drop = FALSE]), s = \"lambda.min\"))\ntable(data$Species, data$pred\u003e0.5)\n```\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fcvrtsencoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwinvector%2Fcvrtsencoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fcvrtsencoder/lists"}