{"id":33187098,"url":"https://mayer79.github.io/effectplots/","last_synced_at":"2025-11-25T18:00:38.701Z","repository":{"id":258352838,"uuid":"860900883","full_name":"mayer79/effectplots","owner":"mayer79","description":"Fast Effect Plots in R","archived":false,"fork":false,"pushed_at":"2025-10-05T11:27:45.000Z","size":12568,"stargazers_count":21,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-21T05:56:21.272Z","etag":null,"topics":["machine-learning","r","regression","xai"],"latest_commit_sha":null,"homepage":"https://mayer79.github.io/effectplots/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mayer79.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-21T13:22:04.000Z","updated_at":"2025-10-05T11:22:00.000Z","dependencies_parsed_at":"2024-12-16T21:25:27.824Z","dependency_job_id":"5496ff05-e7d5-4282-ab8c-aeaf52eca2d5","html_url":"https://github.com/mayer79/effectplots","commit_stats":null,"previous_names":["mayer79/marginalplot","mayer79/effectplots"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/mayer79/effectplots","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2Feffectplots","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2Feffectplots/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2Feffectplots/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2Feffectplots/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mayer79","download_url":"https://codeload.github.com/mayer79/effectplots/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mayer79%2Feffectplots/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286079811,"owners_count":27282121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-25T02:00:05.816Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","r","regression","xai"],"created_at":"2025-11-16T05:00:30.371Z","updated_at":"2025-11-25T18:00:38.689Z","avatar_url":"https://github.com/mayer79.png","language":"R","funding_links":[],"categories":["Data and models"],"sub_categories":[],"readme":"# effectplots \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"139\" alt=\"\" /\u003e\n\n\u003c!-- badges: start --\u003e\n\n[![R-CMD-check](https://github.com/mayer79/effectplots/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mayer79/effectplots/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/mayer79/effectplots/graph/badge.svg)](https://app.codecov.io/gh/mayer79/effectplots)\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/effectplots)](https://cran.r-project.org/package=effectplots)\n\n[![](https://cranlogs.r-pkg.org/badges/effectplots)](https://cran.r-project.org/package=effectplots) \n[![](https://cranlogs.r-pkg.org/badges/grand-total/effectplots?color=orange)](https://cran.r-project.org/package=effectplots)\n\n\u003c!-- badges: end --\u003e\n\n**{effectplots}** is an R package for calculating and plotting feature effects of any model. It is very fast thanks to [{collapse}](https://CRAN.R-project.org/package=collapse).\n\nThe main function `feature_effects()` crunches these statistics per feature X over values/bins:\n\n- Average observed y values: Descriptive associations between response y and features.\n- Average predictions: Combined effect of X and other features (M Plots, Apley [1]).\n- Partial dependence (Friedman [2]): How does the average prediction react on X, keeping other features fixed.\n- Accumulated local effects (Apley [1]): Alternative to partial dependence.\n\nFurthermore, it calculates counts, weight sums, average residuals, and standard deviations of observed y and residuals. All statistics respect optional case weights.\n\nWe highly recommend Christoph Molnar's book [3] for more info on feature effects.\n\n**It takes 1 second on a normal laptop to get all statistics for 10 features on 10 Mio rows (+ prediction time).**\n\n**Workflow**\n\n1. **Crunch** values via `feature_effects()` or the little helpers `average_observed()`, `partial_dependence()` etc.\n2. **Update** the results with `update()`: Combine rare levels of categorical features, sort results by importance, turn values of discrete features to factor etc.\n3. **Plot** the results with `plot()`: Choose between ggplot2/patchwork and plotly.\n\n**Outlier capping**: Extreme outliers in numeric features are capped by default (but not deleted).\nTo avoid capping, set `outlier_iqr = Inf`.\n\n## Installation\n\nYou can install the development version of {effectplots} from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"pak\")\npak::pak(\"mayer79/effectplots\", dependencies = TRUE)\n```\n\n## Usage\n\nWe use a 1 Mio row dataset on Motor TPL insurance. The aim is to model claim frequency. Before modeling, we want to study the association between features and response.\n\n``` r\nlibrary(effectplots)\nlibrary(OpenML)\nlibrary(lightgbm)\n\nset.seed(1)\n\ndf \u003c- getOMLDataSet(data.id = 45106L)$data\n\nxvars \u003c- c(\"year\", \"town\", \"driver_age\", \"car_weight\", \"car_power\", \"car_age\")\n\n# 0.1s on laptop\naverage_observed(df[xvars], y = df$claim_nb) |\u003e\n  update(to_factor = TRUE) |\u003e  # turn discrete numerics to factors\n  plot(share_y = \"all\")\n```\n\n![](man/figures/avg_obs.svg)\n\nA shared y axis helps to compare the strength of the association across features.\n\n### Fit model\n\nNext, let's fit a boosted trees model.\n\n```r\nix \u003c- sample(nrow(df), 0.8 * nrow(df))\ntrain \u003c- df[ix, ]\ntest \u003c- df[-ix, ]\nX_train \u003c- data.matrix(train[xvars])\nX_test \u003c- data.matrix(test[xvars])\n\n# Training, using slightly optimized parameters found via cross-validation\nparams \u003c- list(\n  learning_rate = 0.05,\n  objective = \"poisson\",\n  num_leaves = 7,\n  min_data_in_leaf = 50,\n  min_sum_hessian_in_leaf = 0.001,\n  colsample_bynode = 0.8,\n  bagging_fraction = 0.8,\n  lambda_l1 = 3,\n  lambda_l2 = 5,\n  num_threads = 7\n)\n\nfit \u003c- lgb.train(\n  params = params,\n  data = lgb.Dataset(X_train, label = train$claim_nb),\n  nrounds = 300\n)\n```\n\n### Inspect model\n\nLet's crunch all statistics on the test data. Sorting is done by weighted variance of partial dependence, a main-effect importance measure related to [4].\n\nThe average predictions closely follow the average observed, i.e., the model seems to do a good job. Comparing partial dependence/ALE with average predicted gives insights on whether an effect mainly comes from the feature on the x axis or from other, correlated, features.\n\n```r\n# 0.1s + 0.15s prediction time\nfeature_effects(fit, v = xvars, data = X_test, y = test$claim_nb) |\u003e\n  update(sort_by = \"pd\") |\u003e \n  plot()\n```\n\n![](man/figures/feature_effects.svg)\n\n\n### Flexibility\n\nWhat about combining training and test results? Or comparing different models or subgroups? No problem:\n\n```r\nm_train \u003c- feature_effects(fit, v = xvars, data = X_train, y = train$claim_nb)\nm_test \u003c- feature_effects(fit, v = xvars, data = X_test, y = test$claim_nb)\n\n# Pick top 3 based on train\nm_train \u003c- m_train |\u003e \n  update(sort_by = \"pd\") |\u003e \n  head(3)\nm_test \u003c- m_test[names(m_train)]\n\n# Concatenate train and test results and plot them\nc(m_train, m_test) |\u003e \n  plot(\n    share_y = \"rows\",\n    ncol = 2,\n    byrow = FALSE,\n    stats = c(\"y_mean\", \"pred_mean\"),\n    subplot_titles = FALSE,\n    # plotly = TRUE,\n    title = \"Left: Train - Right: Test\",\n  )\n```\n\n![](man/figures/train_test.svg)\n\nTo look closer at bias, let's select the statistic \"resid_mean\" along with pointwise 95% confidence intervals for the true conditional bias.\n\n```r\nc(m_train, m_test) |\u003e \n  update(drop_below_n = 50) |\u003e \n  plot(\n    ylim = c(-0.07, 0.08),\n    ncol = 2,\n    byrow = FALSE,\n    stats = \"resid_mean\",\n    subplot_titles = FALSE,\n    title = \"Left: Train - Right: Test\",\n    # plotly = TRUE,\n    interval = \"ci\"\n  )\n```\n\n![](man/figures/bias.svg)\n\n## More examples\n\nMost models work out-of-the box, including DALEX explainers and Tidymodels models. If not, a tailored prediction function can be specified.\n\n### DALEX\n\n```r\nlibrary(effectplots)\nlibrary(DALEX)\nlibrary(ranger)\n\nset.seed(1)\n\nfit \u003c- ranger(Sepal.Length ~ ., data = iris)\nex \u003c- DALEX::explain(fit, data = iris[, -1], y = iris[, 1])\n\nfeature_effects(ex, breaks = 5) |\u003e \n  plot(share_y = \"all\")\n```\n\n![](man/figures/dalex.svg)\n\n### Tidymodels\n\nNote that ALE plots are only available for continuous variables.\n\n```r\nlibrary(effectplots)\nlibrary(tidymodels)\n\nset.seed(1)\n\nxvars \u003c- c(\"carat\", \"color\", \"clarity\", \"cut\")\n\nsplit \u003c- initial_split(diamonds)\ntrain \u003c- training(split)\ntest \u003c- testing(split)\n\ndia_recipe \u003c- train |\u003e \n  recipe(reformulate(xvars, \"price\"))\n\nmod \u003c- rand_forest(trees = 100) |\u003e\n  set_engine(\"ranger\") |\u003e \n  set_mode(\"regression\")\n  \ndia_wf \u003c- workflow() |\u003e\n  add_recipe(dia_recipe) |\u003e\n  add_model(mod)\n\nfit \u003c- dia_wf |\u003e\n  fit(train)\n\nM_train \u003c- feature_effects(fit, v = xvars, data = train, y = \"price\")\nM_test \u003c- feature_effects(fit, v = xvars, data = test, y = \"price\")\n\nplot(\n  M_train + M_test,\n  byrow = FALSE,\n  ncol = 2,\n  share_y = \"rows\",\n  rotate_x = rep(45 * xvars %in% c(\"clarity\", \"cut\"), each = 2),\n  subplot_titles = FALSE,\n  # plotly = TRUE,\n  title = \"Left: train - Right: test\"\n)\n```\n\n![](man/figures/tidymodels.svg)\n\n### Probabilistic classification\n\nWe focus on a single class.\n\n```r\nlibrary(effectplots)\nlibrary(ranger)\n\nset.seed(1)\n\nfit \u003c- ranger(Species ~ ., data = iris, probability = TRUE)\n\nM \u003c- partial_dependence(\n  fit,\n  v = colnames(iris[1:4]), \n  data = iris,\n  which_pred = 1  # \"setosa\" is the first class\n)\nplot(M, bar_height = 0.33, ylim = c(0, 0.7))\n```\n\n![](man/figures/classification.svg)\n\n# References\n\n1. Apley, Daniel W., and Jingyu Zhu. 2020. *Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.* Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.\n2. Friedman, Jerome H. 2001. *Greedy Function Approximation: A Gradient Boosting Machine.* Annals of Statistics 29 (5): 1189–1232. doi:10.1214/aos/1013203451.\n3. Molnar, Christoph. 2019. *Interpretable Machine Learning: A Guide for\nMaking Black Box Models Explainable*. \u003chttps://christophm.github.io/interpretable-ml-book/\u003e.\n4. Greenwell, Brandon M., Bradley C. Boehmke, and Andrew J. McCarthy. 2018.\n*A Simple and Effective Model-Based Variable Importance Measure.* arXiv preprint. \u003chttps://arxiv.org/abs/1805.04755\u003e.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/mayer79.github.io%2Feffectplots%2F","html_url":"https://awesome.ecosyste.ms/projects/mayer79.github.io%2Feffectplots%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/mayer79.github.io%2Feffectplots%2F/lists"}