{"id":13857589,"url":"https://github.com/IndrajeetPatil/statsExpressions","last_synced_at":"2025-07-13T22:30:35.760Z","repository":{"id":35077432,"uuid":"200110012","full_name":"IndrajeetPatil/statsExpressions","owner":"IndrajeetPatil","description":"Tidy data frames and expressions with statistical summaries 📜","archived":false,"fork":false,"pushed_at":"2024-11-11T18:42:49.000Z","size":39292,"stargazers_count":312,"open_issues_count":17,"forks_count":20,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-11-21T07:03:34.266Z","etag":null,"topics":["bayesian-inference","bayesian-statistics","contingency-table","correlation","effectsize","meta-analysis","parametric","robust","robust-statistics","statistical-details","statistical-tests","tidy"],"latest_commit_sha":null,"homepage":"https://indrajeetpatil.github.io/statsExpressions/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IndrajeetPatil.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json"},"funding":{"github":"easystats"}},"created_at":"2019-08-01T19:44:47.000Z","updated_at":"2024-11-11T18:37:11.000Z","dependencies_parsed_at":"2023-09-23T10:20:31.812Z","dependency_job_id":"b28da377-3998-4594-b0d2-22b88450e0b1","html_url":"https://github.com/IndrajeetPatil/statsExpressions","commit_stats":{"total_commits":540,"total_committers":5,"mean_commits":108.0,"dds":0.3759259259259259,"last_synced_commit":"b9a9f8638694e529ca554f649a4a6c1af59e9baf"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IndrajeetPatil%2FstatsExpressions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IndrajeetPatil%2FstatsExpressions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IndrajeetPatil%2FstatsExpressions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IndrajeetPatil%2FstatsExpressions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IndrajeetPatil","download_url":"https://codeload.github.com/IndrajeetPatil/statsExpressions/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225686817,"owners_count":17508142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","bayesian-statistics","contingency-table","correlation","effectsize","meta-analysis","parametric","robust","robust-statistics","statistical-details","statistical-tests","tidy"],"created_at":"2024-08-05T03:01:41.329Z","updated_at":"2025-07-13T22:30:35.747Z","avatar_url":"https://github.com/IndrajeetPatil.png","language":"R","funding_links":["https://github.com/sponsors/easystats"],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n  \u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r}\n#| echo = FALSE\noptions(pillar.width = Inf, pillar.bold = TRUE, pillar.subtle_num = TRUE)\n\nknitr::opts_chunk$set(\n  collapse  = TRUE,\n  dpi       = 300,\n  out.width = \"100%\",\n  comment   = \"#\u003e\",\n  warning   = FALSE,\n  message   = FALSE,\n  fig.path  = \"man/figures/README-\"\n)\n\nset.seed(123)\nsuppressPackageStartupMessages(library(statsExpressions))\n```\n\n# `{statsExpressions}`: Tidy dataframes and expressions with statistical details\n\nStatus | Usage | Miscellaneous\n----------------- | ----------------- | ----------------- \n[![R build status](https://github.com/IndrajeetPatil/statsExpressions/workflows/R-CMD-check/badge.svg)](https://github.com/IndrajeetPatil/statsExpressions/actions) | [![Total downloads](https://cranlogs.r-pkg.org/badges/grand-total/statsExpressions?color=blue)](https://CRAN.R-project.org/package=statsExpressions) | [![Codecov](https://codecov.io/gh/IndrajeetPatil/statsExpressions/branch/main/graph/badge.svg)](https://app.codecov.io/gh/IndrajeetPatil/statsExpressions?branch=main)\n[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html) | [![Daily downloads](https://cranlogs.r-pkg.org/badges/last-day/statsExpressions?color=blue)](https://CRAN.R-project.org/package=statsExpressions) | [![DOI](https://joss.theoj.org/papers/10.21105/joss.03236/status.svg)](https://doi.org/10.21105/joss.03236) \n \n# Introduction \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"240\" /\u003e\n\n```{r, child = \"man/rmd-fragments/statsExpressions-package.Rmd\"}\n```\n\n# Installation\n\n| Type        | Command                                       |\n| :---------- | :-------------------------------------------- |\n| Release     | `install.packages(\"statsExpressions\")`        |\n| Development | `pak::pak(\"IndrajeetPatil/statsExpressions\")` |\n\nOn Linux, `{statsExpressions}` installation may require additional system dependencies, which can be checked using:\n\n```{r, eval=FALSE}\npak::pkg_sysreqs(\"statsExpressions\")\n```\n\n# Citation\n\nThe package can be cited as:\n\n```{r}\n#| label = \"citation\",\n#| comment = \"\"\ncitation(\"statsExpressions\")\n```\n\n# General Workflow\n\n```{r}\n#| echo = FALSE,\n#| out.width = \"80%\"\nknitr::include_graphics(\"man/figures/card.png\")\n```\n\n# Summary of functionality\n\n```{r, child = \"man/rmd-fragments/functionality.Rmd\"}\n```\n\n# Tidy dataframes from statistical analysis\n\nTo illustrate the simplicity of this syntax, let's say we want to run a one-way\nANOVA. If we first run a non-parametric ANOVA and then decide to run a robust\nANOVA instead, the syntax remains the same and the statistical approach can be\nmodified by changing a single argument:\n\n```{r}\n#| label = \"df\"\n\nmtcars %\u003e% oneway_anova(cyl, wt, type = \"nonparametric\")\n\nmtcars %\u003e% oneway_anova(cyl, wt, type = \"robust\")\n```\n\nAll possible output dataframes from functions are tabulated here:\n\u003chttps://indrajeetpatil.github.io/statsExpressions/articles/web_only/dataframe_outputs.html\u003e\n\nNeedless to say this will also work with the `kable` function to generate a\ntable:\n\n```{r}\n#| label = \"kable\"\n\nset.seed(123)\n\n# one-sample robust t-test\n# we will leave `expression` column out; it's not needed for using only the dataframe\nmtcars %\u003e%\n  one_sample_test(wt, test.value = 3, type = \"robust\") %\u003e%\n  dplyr::select(-expression) %\u003e%\n  knitr::kable()\n```\n\nThese functions are also compatible with other popular data manipulation\npackages. \n\nFor example, let's say we want to run a one-sample *t*-test for all levels of a\ncertain grouping variable. We can use `dplyr` to do so:\n\n```{r}\n#| label = \"grouped_df\"\n# for reproducibility\nset.seed(123)\nlibrary(dplyr)\n\n# grouped operation\n# running one-sample test for all levels of grouping variable `cyl`\nmtcars %\u003e%\n  group_by(cyl) %\u003e%\n  group_modify(~ one_sample_test(.x, wt, test.value = 3), .keep = TRUE) %\u003e%\n  ungroup()\n```\n\n# Using expressions in custom plots\n\nNote that *expression* here means **a pre-formatted in-text statistical result**.\nIn addition to other details contained in the dataframe, there is also a column\ntitled `expression`, which contains expression with statistical details and can\nbe displayed in a plot.\n\nFor **all** statistical test expressions, the default template attempt to follow\nthe gold standard for statistical reporting.\n\nFor example, here are results from Welch's *t*-test:\n\n\u003cimg src=\"man/figures/stats_reporting_format.png\" align=\"center\" /\u003e\n\nLet's load the needed library for visualization:\n\n```{r}\nlibrary(ggplot2)\n```\n\n## Expressions for centrality measure\n\n**Note that when used in a geometric layer, the expression need to be parsed.**\n\n```{r}\n#| label = \"centrality\"\n\n# displaying mean for each level of `cyl`\ncentrality_description(mtcars, cyl, wt) |\u003e\n  ggplot(aes(cyl, wt)) +\n  geom_point() +\n  geom_label(aes(label = expression), parse = TRUE)\n```\n\nHere are a few examples for supported analyses.\n\n## Expressions for one-way ANOVAs\n\nThe returned data frame will always have a column called `expression`. \n\nAssuming there is only a single result you need to display in a plot, to use it in a plot, you have two options:\n\n- extract the expression from the list column (`results_data$expression[[1]]`) without parsing\n- use the list column as is, in which case you will need to parse it (`parse(text = results_data$expression)`)\n\nIf you want to display more than one expression in a plot, you will *have to* parse them.\n\n### Between-subjects design\n\n```{r}\n#| label = \"anova_rob1\"\n\nset.seed(123)\nlibrary(ggridges)\n\nresults_data \u003c- oneway_anova(iris, Species, Sepal.Length, type = \"robust\")\n\n# create a ridgeplot\nggplot(iris, aes(x = Sepal.Length, y = Species)) +\n  geom_density_ridges() +\n  labs(\n    title = \"A heteroscedastic one-way ANOVA for trimmed means\",\n    subtitle = results_data$expression[[1]]\n  )\n```\n\n### Within-subjects design\n\n```{r}\n#| label = \"anova_parametric2\"\n\nset.seed(123)\nlibrary(WRS2)\nlibrary(ggbeeswarm)\n\nresults_data \u003c- oneway_anova(\n  WineTasting,\n  Wine,\n  Taste,\n  paired = TRUE,\n  subject.id = Taster,\n  type = \"np\"\n)\n\nggplot2::ggplot(WineTasting, aes(Wine, Taste, color = Wine)) +\n  geom_quasirandom() +\n  labs(\n    title = \"Friedman's rank sum test\",\n    subtitle = parse(text = results_data$expression)\n  )\n```\n\n## Expressions for two-sample tests\n\n### Between-subjects design\n\n```{r}\n#| label = \"t_two\"\n\nset.seed(123)\nlibrary(gghalves)\n\nresults_data \u003c- two_sample_test(ToothGrowth, supp, len)\n\nggplot(ToothGrowth, aes(supp, len)) +\n  geom_half_dotplot() +\n  labs(\n    title = \"Two-Sample Welch's t-test\",\n    subtitle = parse(text = results_data$expression)\n  )\n```\n\n### Within-subjects design\n\n```{r}\n#| label = \"t_two_paired1\"\n\nset.seed(123)\nlibrary(tidyr)\nlibrary(PairedData)\ndata(PrisonStress)\n\n# get data in tidy format\ndf \u003c- pivot_longer(PrisonStress, starts_with(\"PSS\"), names_to = \"PSS\", values_to = \"stress\")\n\nresults_data \u003c- two_sample_test(\n  data = df,\n  x = PSS,\n  y = stress,\n  paired = TRUE,\n  subject.id = Subject,\n  type = \"np\"\n)\n\n# plot\npaired.plotProfiles(PrisonStress, \"PSSbefore\", \"PSSafter\", subjects = \"Subject\") +\n  labs(\n    title = \"Two-sample Wilcoxon paired test\",\n    subtitle = parse(text = results_data$expression)\n  )\n```\n\n## Expressions for one-sample tests\n\n```{r}\n#| label = \"t_one\"\n\nset.seed(123)\n\n# dataframe with results\nresults_data \u003c- one_sample_test(mtcars, wt, test.value = 3, type = \"bayes\")\n\n# creating a histogram plot\nggplot(mtcars, aes(wt)) +\n  geom_histogram(alpha = 0.5) +\n  geom_vline(xintercept = mean(mtcars$wt), color = \"red\") +\n  labs(subtitle = parse(text = results_data$expression))\n```\n\n## Expressions for correlation analysis\n\nLet's look at another example where we want to run correlation analysis:\n\n```{r}\n#| label = \"corr\"\n\nset.seed(123)\n\n# dataframe with results\nresults_data \u003c- corr_test(mtcars, mpg, wt, type = \"nonparametric\")\n\n# create a scatter plot\nggplot(mtcars, aes(mpg, wt)) +\n  geom_point() +\n  geom_smooth(method = \"lm\", formula = y ~ x) +\n  labs(\n    title = \"Spearman's rank correlation coefficient\",\n    subtitle = parse(text = results_data$expression)\n  )\n```\n\n## Expressions for contingency table analysis\n\nFor categorical/nominal data - one-sample:\n\n```{r}\n#| label = \"gof\"\n\nset.seed(123)\n\n# dataframe with results\nresults_data \u003c- contingency_table(\n  as.data.frame(table(mpg$class)),\n  Var1,\n  counts = Freq,\n  type = \"bayes\"\n)\n\n# create a pie chart\nggplot(as.data.frame(table(mpg$class)), aes(x = \"\", y = Freq, fill = factor(Var1))) +\n  geom_bar(width = 1, stat = \"identity\") +\n  theme(axis.line = element_blank()) +\n  # cleaning up the chart and adding results from one-sample proportion test\n  coord_polar(theta = \"y\", start = 0) +\n  labs(\n    fill = \"Class\",\n    x = NULL,\n    y = NULL,\n    title = \"Pie Chart of class (type of car)\",\n    caption = parse(text = results_data$expression)\n  )\n```\n\nYou can also use these function to get the expression in return without having\nto display them in plots:\n\n```{r}\n#| label = \"expr_output\"\n\nset.seed(123)\n\n# Pearson's chi-squared test of independence\ncontingency_table(mtcars, am, vs)$expression[[1]]\n```\n\n## Expressions for meta-analysis\n\n```{r}\n#| label = \"metaanalysis\",\n#| fig.height = 14,\n#| fig.width = 12\n\nset.seed(123)\nlibrary(metaviz)\n\n# dataframe with results\nresults_data \u003c- meta_analysis(dplyr::rename(mozart, estimate = d, std.error = se))\n\n# meta-analysis forest plot with results random-effects meta-analysis\nviz_forest(\n  x = mozart[, c(\"d\", \"se\")],\n  study_labels = mozart[, \"study_name\"],\n  xlab = \"Cohen's d\",\n  variant = \"thick\",\n  type = \"cumulative\"\n) +\n  labs(\n    title = \"Meta-analysis of Pietschnig, Voracek, and Formann (2010) on the Mozart effect\",\n    subtitle = parse(text = results_data$expression)\n  ) +\n  theme(text = element_text(size = 12))\n```\n\n# Customizing details to your liking\n\nSometimes you may not wish include so many details in the subtitle. In that\ncase, you can extract the expression and copy-paste only the part you wish to\ninclude. For example, here only statistic and *p*-values are included:\n\n```{r}\n#| label = \"custom_expr\"\n\nset.seed(123)\n\n# extracting detailed expression\n(res_expr \u003c- oneway_anova(iris, Species, Sepal.Length, var.equal = TRUE)$expression[[1]])\n\n# adapting the details to your liking\nggplot(iris, aes(x = Species, y = Sepal.Length)) +\n  geom_boxplot() +\n  labs(subtitle = ggplot2::expr(paste(\n    NULL, italic(\"F\"), \"(\", \"2\", \",\", \"147\", \") = \", \"119.26\", \", \",\n    italic(\"p\"), \" = \", \"1.67e-31\"\n  )))\n```\n\n# Summary of tests and effect sizes\n\nHere a go-to summary about statistical test carried out and the returned effect\nsize for each function is provided. This should be useful if one needs to find\nout more information about how an argument is resolved in the underlying package\nor if one wishes to browse the source code. So, for example, if you want to know\nmore about how one-way (between-subjects) ANOVA, you can run\n`?stats::oneway.test` in your R console.\n\n## `centrality_description`\n\n```{r, child = \"man/rmd-fragments/centrality_description.Rmd\"}\n```\n\n## `oneway_anova`\n\n```{r, child = \"man/rmd-fragments/oneway_anova.Rmd\"}\n```\n\n## `two_sample_test` \n\n```{r, child = \"man/rmd-fragments/two_sample_test.Rmd\"}\n```\n\n## `one_sample_test`\n\n```{r, child = \"man/rmd-fragments/one_sample_test.Rmd\"}\n```\n\n## `corr_test`\n\n```{r, child = \"man/rmd-fragments/corr_test.Rmd\"}\n```\n\n## `contingency_table`\n\n```{r, child = \"man/rmd-fragments/contingency_table.Rmd\"}\n```\n\n## `meta_analysis`\n\n```{r, child = \"man/rmd-fragments/meta_analysis.Rmd\"}\n```\n\n# Usage in `{ggstatsplot}`\n\nNote that these functions were initially written to display results from\nstatistical tests on ready-made `{ggplot2}` plots implemented in `{ggstatsplot}`.\n\nFor detailed documentation, see the package website:\n\u003chttps://indrajeetpatil.github.io/ggstatsplot/\u003e\n\nHere is an example from `{ggstatsplot}` of what the plots look like when the\nexpressions are displayed in the subtitle-\n\n\u003cimg src=\"man/figures/ggstatsplot.png\" align=\"center\" /\u003e\n\n# Acknowledgments\n\nThe hexsticker and the schematic illustration of general workflow were\ngenerously designed by Sarah Otterstetter (Max Planck Institute for Human\nDevelopment, Berlin).\n\n# Contributing\n\nBug reports, suggestions, questions, and (most of all)\ncontributions are welcome.\n\nPlease note that this project is released with a \n[Contributor Code of Conduct](https://github.com/IndrajeetPatil/statsExpressions/blob/main/.github/CODE_OF_CONDUCT.md). By participating in this project you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIndrajeetPatil%2FstatsExpressions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIndrajeetPatil%2FstatsExpressions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIndrajeetPatil%2FstatsExpressions/lists"}