{"id":32207866,"url":"https://github.com/iqss/clarify","last_synced_at":"2025-10-22T06:00:02.442Z","repository":{"id":65023160,"uuid":"518931254","full_name":"IQSS/clarify","owner":"IQSS","description":"clarify: Simulation-Based Inference for Regression Models","archived":false,"fork":false,"pushed_at":"2025-09-19T17:02:37.000Z","size":21224,"stargazers_count":24,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-22T05:59:48.532Z","etag":null,"topics":["r"],"latest_commit_sha":null,"homepage":"https://iqss.github.io/clarify/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IQSS.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-07-28T17:09:06.000Z","updated_at":"2025-09-19T16:57:32.000Z","dependencies_parsed_at":"2025-09-19T20:19:24.230Z","dependency_job_id":null,"html_url":"https://github.com/IQSS/clarify","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/IQSS/clarify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IQSS%2Fclarify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IQSS%2Fclarify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IQSS%2Fclarify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IQSS%2Fclarify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IQSS","download_url":"https://codeload.github.com/IQSS/clarify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IQSS%2Fclarify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280389301,"owners_count":26322507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r"],"created_at":"2025-10-22T05:59:52.533Z","updated_at":"2025-10-22T06:00:02.436Z","avatar_url":"https://github.com/IQSS.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\nbibliography: vignettes/references.bib\nlink-citations: true\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"80%\"\n)\n```\n\n# `clarify`: Simulation-Based Inference for Regression Models\n\n\u003c!-- badges: start --\u003e\n\n[![](https://r-pkg.org/badges/version/clarify)](https://CRAN.R-project.org/package=clarify) [![](https://cranlogs.r-pkg.org/badges/clarify)](https://CRAN.R-project.org/package=clarify)\n\n\u003c!-- badges: end --\u003e\n\n*clarify* implements simulation-based inference for functions of model parameters, such as average marginal effects and predictions at representative values of the predictors. See the *clarify* [website](https://iqss.github.io/clarify/) for documentation and other examples, and see @greiferClarifySimulationBasedInference2025 for the paper describing the package (also available at `vignette(\"clarify\")`). *clarify* was designed to replicate and expand on functionality previously provided by the *Zelig* package.\n\n## Installation\n\n*clarify* can be installed from [CRAN](https://CRAN.R-project.org/package=clarify) using\n\n```{r, eval = F}\ninstall.packages(\"clarify\")\n```\n\nYou can install the development version of *clarify* from [GitHub](https://github.com/iqss/clarify) with\n\n```{r, eval = F}\ninstall.packages(\"remotes\")\nremotes::install_github(\"iqss/clarify\")\n```\n\n## Example\n\nBelow is an example of performing g-computation for the average treatment effect on the treated (ATT) after logistic regression to compute the average causal risk ratio and its confidence interval. First we load the data (in this case the `lalonde` dataset from *MatchIt*) and fit a logistic regression using functions outside of *clarify*:\n\n```{r, fig.width=7, fig.height=3}\nlibrary(clarify)\n\ndata(\"lalonde\", package = \"MatchIt\")\n\n# Fit the model\nfit \u003c- glm(I(re78 \u003e 0) ~ treat + age + educ + race + married +\n             nodegree + re74 + re75,\n           data = lalonde, family = binomial)\n```\n\nNext, to estimate the ATT risk ratio, we simulate coefficients from their implied distribution and compute the effects of interest in each simulation, yielding a distribution of estimates that we can summarize and use for inference:\n\n```{r example, fig.width=7, fig.height=3}\n# Simulate coefficients from a multivariate normal distribution\nset.seed(123)\nsim_coefs \u003c- sim(fit)\n\n# Marginal risk ratio ATT, simulation-based\nsim_est \u003c- sim_ame(sim_coefs,\n                   var = \"treat\",\n                   subset = treat == 1,\n                   contrast = \"RR\",\n                   verbose = FALSE)\n\nsim_est\n\n# View the estimates, confidence intervals, and p-values\nsummary(sim_est, null = c(`RR` = 1))\n\n# Plot the resulting sampling distributions\nplot(sim_est)\n```\n\nBelow, we provide information on the framework *clarify* uses and some other examples. For a complete vignette, see `vignette(\"clarify\")`.\n\n## Introduction\n\nSimulation-based inference is an alternative to the delta method and bootstrapping for performing inference on quantities that are functions of model parameters. It involves simulating model coefficients from their multivariate distribution using their estimated values and covariance from a single model fit to the original data, computing the quantities of interest from each set of model coefficients, and then performing inference using the resulting distribution of the estimates as their sampling distribution. Confidence intervals can be computed using the percentiles of the resulting sampling distribution, and p-values can be computed by inverting the confidence intervals. Alternatively, if the resulting sampling distribution is normally distributed, its standard error can be estimated as the standard deviation of the estimates and normal-theory Wald confidence intervals and p-values can be computed. The methodology of simulation-based inference is explained in @kingMakingMostStatistical2000 and @herronPostestimationUncertaintyLimited1999.\n\n*clarify* was designed to provide a simple, general interface for simulation-based inference and includes a few convenience functions to perform common tasks like computing average marginal effects. The primary functions of *clarify* are `sim()`, `sim_apply()`, `summary()`, and `plot()`. These work together to create a simple workflow for simulation-based inference.\n\n-   `sim()` simulates model parameters from a fitted model\n-   `sim_apply()` applies an estimator to the simulated coefficients, or to the original object but with the new coefficients inserted\n-   `summary()` produces confidence intervals and p-values for the resulting estimates\n-   `plot()` produces plots of the simulated sampling distribution of the resulting estimates\n\nThere are also some wrappers for `sim_apply()` for performing some common operations: `sim_ame()` computes the average marginal effect of a variable, mirroring `marginaleffects::avg_predictions()` and `marginaleffects::avg_slopes()`; `sim_setx()` computes predictions at typical values of the covariates and differences between them, mirroring `Zelig::setx()` and `Zelig::setx1()`; and `sim_adrf()` computes average dose-response functions. *clarify* also offers support for models fit to multiply imputed data with the `misim()` function.\n\nIn the example above, we used `sim_ame()` to compute the ATT, but we could have also done so manually using `sim_apply()`, as demonstrated below:\n\n```{r example2, fig.width=7, fig.height=3}\n# Write a function that computes the g-computation estimate for the ATT\nATT_fun \u003c- function(fit) {\n  d \u003c- subset(lalonde, treat == 1)\n  d$treat \u003c- 1\n  p1 \u003c- mean(predict(fit, newdata = d, type = \"response\"))\n  d$treat \u003c- 0\n  p0 \u003c- mean(predict(fit, newdata = d, type = \"response\"))\n  c(`E[Y(0)]` = p0, `E[Y(1)]` = p1, `RR` = p1 / p0)\n}\n\n# Apply that function to the simulated coefficient\nsim_est \u003c- sim_apply(sim_coefs, ATT_fun, verbose = FALSE)\n\nsim_est\n\n# View the estimates, confidence intervals, and p-values;\n# they are the same as when using sim_ame() above\nsummary(sim_est, null = c(`RR` = 1))\n\n# Plot the resulting sampling distributions\nplot(sim_est, reference = TRUE, ci = FALSE)\n```\n\nThe plot of the simulated sampling distribution indicates that the sampling distribution for the risk ratio is not normally distributed around the estimate, indicating that the delta method may be a poor approximation and the asymmetric confidence intervals produced using the simulation may be more valid. Note that the estimates are those computed from the original model coefficients; the distribution is used only for computing confidence intervals, in line with recommendations by @raineyCarefulConsiderationCLARIFY2023.\n\nIf we want to compute the risk difference, we can do that using `transform()` on the already-produced output:\n\n```{r}\n#Transform estimates into new quantities of interest\nsim_est \u003c- transform(sim_est, `RD` = `E[Y(1)]` - `E[Y(0)]`)\n\nsummary(sim_est, null = c(`RR` = 1, `RD` = 0))\n```\n\nWe can also use *clarify* to compute predictions and first differences at set and typical values of the predictors, mimicking the functionality of *Zelig*'s `setx()` and `setx1()` functions, using `sim_setx()`:\n\n```{r, fig.width=7, fig.height=3}\n# Predictions across age and treat at typical values\n# of the other predictors\nsim_est \u003c- sim_setx(sim_coefs,\n                    x = list(age = 20:50, treat = 0:1),\n                    verbose = FALSE)\n\n#Plot of predicted values across age for each value of treat\nplot(sim_est)\n```\n\nSee `vignette(\"Zelig\", package = \"clarify\")` for more examples of translating a *Zelig*-based workflow into one that uses *clarify* to estimate the same quantities of interest.\n\n*clarify* offers parallel processing for all estimation functions to speed up computation. Functionality is also available for the analysis of models fit to multiply imputed data. See `vignette(\"clarify\")` for more details.\n\n## References\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiqss%2Fclarify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiqss%2Fclarify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiqss%2Fclarify/lists"}