{"id":32203203,"url":"https://github.com/capnrefsmmat/regressinator","last_synced_at":"2026-02-21T01:31:32.605Z","repository":{"id":59913948,"uuid":"521798594","full_name":"capnrefsmmat/regressinator","owner":"capnrefsmmat","description":"Simulate regression data, build diagnostics, and construct lineup plots in R","archived":false,"fork":false,"pushed_at":"2025-10-07T16:26:22.000Z","size":1379,"stargazers_count":4,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-08T23:57:23.930Z","etag":null,"topics":["r","statistics"],"latest_commit_sha":null,"homepage":"https://www.refsmmat.com/regressinator/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/capnrefsmmat.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-08-05T23:18:10.000Z","updated_at":"2025-10-07T16:26:25.000Z","dependencies_parsed_at":"2023-11-16T03:23:15.629Z","dependency_job_id":"7f2b42d9-a460-4bf0-9a17-724a21f5cf26","html_url":"https://github.com/capnrefsmmat/regressinator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/capnrefsmmat/regressinator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/capnrefsmmat%2Fregressinator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/capnrefsmmat%2Fregressinator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/capnrefsmmat%2Fregressinator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/capnrefsmmat%2Fregressinator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/capnrefsmmat","download_url":"https://codeload.github.com/capnrefsmmat/regressinator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/capnrefsmmat%2Fregressinator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29670124,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T00:11:43.526Z","status":"ssl_error","status_checked_at":"2026-02-20T23:52:33.807Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","statistics"],"created_at":"2025-10-22T04:34:16.818Z","updated_at":"2026-02-21T01:31:32.584Z","avatar_url":"https://github.com/capnrefsmmat.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- WARNING: README.md is generated from README.Rmd; edit that\n     instead, then use rmarkdown::render(\"README.Rmd\") to regenerate --\u003e\n\n```{r, echo=FALSE}\nknitr::opts_chunk$set(\n  fig.path = \"man/figures/README-\"\n)\n```\n\n# the regressinator\n\nThe regressinator is a pedagogical tool for conducting simulations of regression\nanalyses and diagnostics. It can:\n\n* Simulate populations with predictor variables from arbitrary distributions\n* Simulate response variables that are functions of the predictor variables plus\n  error, or are drawn from a distribution related to the predictors\n* Given a model, simulate from the population sampling distribution of that\n  model's estimates\n* Given a model fit to data, generate new simulated data based on the model fit\n* Facilitate lineup plots comparing diagnostics on the fitted model to\n  diagnostics where all model assumptions are met.\n\nThese features make it easy to create activities and examples for classes\nteaching regression using R. Lineup plots are great for helping students\nunderstand how regression diagnostics behave with different types of model\nmisspecification, and for more advanced classes, simulating from sampling\ndistributions helps students explore the properties of regression estimators.\nAnd the easy simulation features aren't just useful for students---instructors\ncan easily generate examples for lectures, exams, and labs.\n\nFor example, suppose we want to teach students how residual plots look when a\nlinear regression is fit to nonlinear data. We can quickly specify a nonlinear\nrelationship and generate a lineup plot that hides the residual plot among 19\nothers where the model assumptions are met:\n\n```{r example-regression-lineup, fig.width=7, fig.height=6, fig.retina=2, fig.alt=\"Lineup of scatterplots of residuals versus X\"}\nlibrary(regressinator)\nlibrary(ggplot2)\n\n# define the population relationship\npop \u003c- population(\n  x = predictor(runif, min = -5, max = 5),\n  y = response(10 + 0.7 * x**2, family = gaussian(), error_scale = 2)\n)\n\n# obtain a random sample from the population\nnonlin_sample \u003c- pop |\u003e\n  sample_x(n = 20) |\u003e\n  sample_y()\n\n# fit a model to the sample\nfit \u003c- lm(y ~ x, data = nonlin_sample)\n\n# construct a lineup, with 19 models fit to data simulated from the linear model\nmodel_lineup(fit) |\u003e\n  ggplot(aes(x = x, y = .resid)) +\n  geom_point() +\n  facet_wrap(vars(.sample)) +\n  labs(x = \"x\", y = \"Residual\")\n```\n\nIf you can identify the \"true\" residual plot from the lineup, you have found\nmodel misspecification. This lineup is essentially a visual hypothesis test, and\nit helps students see the variation expected in diagnostics and what nonlinear\ntrends look like. (For more on lineups and how they can be used in teaching, see\n[Inspirations](#inspirations) below.) `model_lineup()` achieves this by\nsimulating 19 datasets from the fitted model (so the relationship is truly\nlinear), then `broom::augment()` to obtain the data and residuals from each fit.\n\nSimilarly, we can quickly obtain samples from the population sampling\ndistribution, to explore the behavior of this misspecified model's parameter\nestimates:\n\n```{r sampling-distribution}\nfit |\u003e\n  sampling_distribution(nonlin_sample, nsim = 5)\n```\n\nHere `sampling_distribution()` defaults to using `broom::tidy()` to obtain the\ncoefficients and standard errors for each fit. We could use this to assess bias,\nto compare the sampling distribution to standard errors and confidence\nintervals, or simply to visualize variation.\n\nThe premise of the regressinator is that this kind of simulation can be a\nvaluable teaching tool when students are learning to interpret regression\ndiagnostics and determine how different kinds of misspecification might affect\nmodel fits.\n\nCheck the Get Started guide (`vignette(\"regressinator\")`) for more detail.\n\n## Inspirations\n\nVisual inference -- hiding a plot of real data among \"null plots\" -- has been\naround for quite a while, and I certainly did not invent this idea myself. For\nexample:\n\n* Buja, A., Cook, D., Hofmann, H., Lawrence, M., Lee, E. K., Swayne, D. F., \u0026\n  Wickham, H. (2009). [Statistical inference for exploratory data analysis and\n  model diagnostics](https://doi.org/10.1098/rsta.2009.0120). *Philosophical\n  Transactions of the Royal Society A*, 367(1906), 4361–4383.\n\n* Wickham, H., Cook, D., Hofmann, H., \u0026 Buja, A. (2010). [Graphical inference\n  for infovis](https://doi.org/10.1109/TVCG.2010.161). *IEEE Transactions on\n  Visualization and Computer Graphics*, 16(6), 973–979.\n\n* Loy, A., Follett, L., \u0026 Hofmann, H. (2016). [Variations of Q-Q Plots: The\n  Power of Our Eyes!](https://doi.org/10.1080/00031305.2015.1077728) *The\n  American Statistician*, 70(2), 202–214.\n\nThe idea of using visual inference to teach regression diagnostics is also not\nnew:\n\n* Loy, A. (2021). [Bringing Visual Inference to the\n  Classroom](https://doi.org/10.1080/26939169.2021.1920866). *Journal of\n  Statistics and Data Science Education*, 29(2), 171-182.\n\nThe regressinator simply adds an easy-to-use framework to allow all kinds of\nteaching activities to be constructed quickly. Instructors can design example to\ndisplay in lecture, or students with R experience can run interactive examples\nand explore different situations. Ideally, if a student asks \"But what if *[some\nproblem with the model]* happens?\", you should be able to reply with a quick\nsimulation.\n\n## Compared to other packages\n\nThere have been several past efforts to support pedagogical simulation and\nlineup plots in R:\n\n* [nullabor](https://cran.r-project.org/package=nullabor), which supports lineup\n  plots. The regressinator uses nullabor underneath when building lineups.\n* [mosaic](https://cran.r-project.org/package=mosaic), part of [Project\n  MOSAIC](http://www.mosaic-web.org/), provides a simple set of functions for\n  doing EDA and basic inferential statistics, including the `do()` function for\n  easy simulation without loops.\n* [infer](https://infer.tidymodels.org/), part of the\n  [tidymodels](https://www.tidymodels.org/) framework, can conduct\n  simulation-based inference for proportions, means, regression slopes, and\n  other estimates commonly used in statistics courses, using only a few simple\n  functions.\n\nUnlike these packages, the regressinator provides a simple tool for specifying a\n*population* and sampling from it, rather than conducting bootstrapping or\npermutation on an observed dataset. This makes the regressinator suitable for,\nsay, exploring the properties of regression estimates and diagnostics in known\npopulations, but less suitable for simulation-based hypothesis testing.\n\nUnlike infer, the regressinator does not wrap R statistical methods or provide\nits own inference functions. Users must use `lm()`, `glm()`, or whatever other\nmethods they need for their modeling. This makes the regressinator less suitable\nfor introductory courses where extra complexity should be hidden away from\nstudents, but more suitable for more advanced work: as students advance to more\ncomplex models provided by other packages, they can use those models in the\nregressinator, without any special support being required.\n\nA useful counterpart to the regressinator might be\n[rsample](https://rsample.tidymodels.org/), a general framework for methods that\nresample from the observed data, such as bootstrapping. In the same way that the\nregressinator supports general-purpose simulation from the population without\nhard-coding specific use cases, rsample supports resampling and cross-validation\nin a general way that could be used for any kind of modeling, not just models\nbuilt into the package.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcapnrefsmmat%2Fregressinator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcapnrefsmmat%2Fregressinator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcapnrefsmmat%2Fregressinator/lists"}