{"id":13857326,"url":"https://github.com/easystats/datawizard","last_synced_at":"2025-04-04T09:09:49.411Z","repository":{"id":37896172,"uuid":"371028278","full_name":"easystats/datawizard","owner":"easystats","description":"Magic potions to clean and transform your data 🧙 ","archived":false,"fork":false,"pushed_at":"2024-04-12T09:08:53.000Z","size":90127,"stargazers_count":184,"open_issues_count":35,"forks_count":12,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-04-13T21:36:57.222Z","etag":null,"topics":["data","dplyr","hacktoberfest","janitor","manipulation","r-package","reshape","rstats","tidyr","wrangling"],"latest_commit_sha":null,"homepage":"https://easystats.github.io/datawizard/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/easystats.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null},"funding":{"github":"easystats"}},"created_at":"2021-05-26T12:36:27.000Z","updated_at":"2024-07-21T08:20:13.953Z","dependencies_parsed_at":"2023-12-30T20:31:35.096Z","dependency_job_id":"30c52380-9213-402d-90a1-49f989571152","html_url":"https://github.com/easystats/datawizard","commit_stats":{"total_commits":1114,"total_committers":13,"mean_commits":85.6923076923077,"dds":0.5107719928186715,"last_synced_commit":"f044e189a974fe87cd5185ace8ba255339258d1e"},"previous_names":[],"tags_count":27,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easystats%2Fdatawizard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easystats%2Fdatawizard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easystats%2Fdatawizard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/easystats%2Fdatawizard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/easystats","download_url":"https://codeload.github.com/easystats/datawizard/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247149502,"owners_count":20891954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","dplyr","hacktoberfest","janitor","manipulation","r-package","reshape","rstats","tidyr","wrangling"],"created_at":"2024-08-05T03:01:33.488Z","updated_at":"2025-04-04T09:09:49.393Z","avatar_url":"https://github.com/easystats.png","language":"R","funding_links":["https://github.com/sponsors/easystats"],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n# `datawizard`: Easy Data Wrangling and Statistical Transformations \u003cimg src='man/figures/logo.png' align=\"right\" height=\"139\" /\u003e\n\n```{r, echo=FALSE, warning=FALSE, message=FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  dpi = 300,\n  out.width = \"100%\",\n  fig.path = \"man/figures/\",\n  comment = \"#\u003e\"\n)\n\nset.seed(333)\nlibrary(datawizard)\n```\n\n[![DOI](https://joss.theoj.org/papers/10.21105/joss.04684/status.svg)](https://doi.org/10.21105/joss.04684)\n[![downloads](https://cranlogs.r-pkg.org/badges/datawizard)](https://cran.r-project.org/package=datawizard)\n[![total](https://cranlogs.r-pkg.org/badges/grand-total/datawizard)](https://cranlogs.r-pkg.org/)\n\n\u003c!-- ***:sparkles: Hockety pockety wockety wack, prepare this data forth and back*** --\u003e\n\n\u003c!-- ***Hockety pockety wockety wock, messy data is in shock*** --\u003e\n\n\u003c!-- ***Hockety pockety wockety woss, you can cite i-it from JOSS*** \u003csup\u003e(soon)\u003c/sup\u003e --\u003e\n\n\u003c!-- ***Hockety pockety wockety wass, datawizard saves your ass! :sparkles:*** --\u003e\n\n`{datawizard}` is a lightweight package to easily manipulate, clean, transform, and prepare your data for analysis. It is part of the [easystats ecosystem](https://easystats.github.io/easystats/), a suite of R packages to deal with your entire statistical analysis, from cleaning the data to reporting the results.\n\nIt covers two aspects of data preparation:\n\n- **Data manipulation**: `{datawizard}` offers a very similar set of functions to that of the *tidyverse* packages, such as a `{dplyr}` and `{tidyr}`, to select, filter and reshape data, with a few key differences. 1) All data manipulation functions start with the prefix `data_*` (which makes them easy to identify). 2) Although most functions can be used exactly as their *tidyverse* equivalents, they are also string-friendly (which makes them easy to program with and use inside functions). Finally, `{datawizard}` is super lightweight (no dependencies, similar to [poorman](https://github.com/nathaneastwood/poorman)), which makes it awesome for developers to use in their packages.\n\n- **Statistical transformations**: `{datawizard}` also has powerful functions to easily apply common data [transformations](https://easystats.github.io/datawizard/reference/index.html#statistical-transformations), including standardization, normalization, rescaling, rank-transformation, scale reversing, recoding, binning, etc.\n\n\n\n\u003c/br\u003e\n\n\u003cimg src='https://media.giphy.com/media/VcizxCUIgaKpa/giphy.gif' width=\"300\"/\u003e\n\n\u003c/br\u003e\n\n# Installation\n\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/datawizard)](https://cran.r-project.org/package=datawizard) [![datawizard status badge](https://easystats.r-universe.dev/badges/datawizard)](https://easystats.r-universe.dev) [![codecov](https://codecov.io/gh/easystats/datawizard/branch/main/graph/badge.svg)](https://app.codecov.io/gh/easystats/datawizard) [![R-CMD-check](https://github.com/easystats/datawizard/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/easystats/datawizard/actions)\n\nType | Source | Command\n---|---|---\nRelease | CRAN | `install.packages(\"datawizard\")`\nDevelopment | r-universe | `install.packages(\"datawizard\", repos = \"https://easystats.r-universe.dev\")`\nDevelopment | GitHub | `remotes::install_github(\"easystats/datawizard\")`\n\n\u003e **Tip**\n\u003e\n\u003e **Instead of `library(datawizard)`, use `library(easystats)`.**\n\u003e **This will make all features of the  easystats-ecosystem available.**\n\u003e\n\u003e **To stay updated, use `easystats::install_latest()`.**\n\n# Citation\n\nTo cite the package, run the following command:\n\n```{r, comment=\"\"}\ncitation(\"datawizard\")\n```\n\n# Features\n\n[![Documentation](https://img.shields.io/badge/documentation-datawizard-orange.svg?colorB=E91E63)](https://easystats.github.io/datawizard/)\n[![Blog](https://img.shields.io/badge/blog-easystats-orange.svg?colorB=FF9800)](https://easystats.github.io/blog/posts/)\n[![Features](https://img.shields.io/badge/features-datawizard-orange.svg?colorB=2196F3)](https://easystats.github.io/datawizard/reference/index.html)\n\nMost courses and tutorials about statistical modeling assume that you are working with a clean and tidy dataset. In practice, however, a major part of doing statistical modeling is preparing your data--cleaning up values, creating new columns, reshaping the dataset, or transforming some variables. `{datawizard}` provides easy to use tools to perform these common, critical, and sometimes tedious data preparation tasks.\n\n## Data wrangling\n\n### Select, filter and remove variables\n\nThe package provides helpers to filter rows meeting certain conditions...\n\n```{r}\ndata_match(mtcars, data.frame(vs = 0, am = 1))\n```\n\n... or logical expressions:\n\n```{r}\ndata_filter(mtcars, vs == 0 \u0026 am == 1)\n```\n\nFinding columns in a data frame, or retrieving the data of selected columns, can be  achieved using `extract_column_names()` or `data_select()`:\n\n```{r}\n# find column names matching a pattern\nextract_column_names(iris, starts_with(\"Sepal\"))\n\n# return data columns matching a pattern\ndata_select(iris, starts_with(\"Sepal\")) |\u003e head()\n```\n\nIt is also possible to extract one or more variables:\n\n```{r}\n# single variable\ndata_extract(mtcars, \"gear\")\n\n# more variables\nhead(data_extract(iris, ends_with(\"Width\")))\n```\n\nDue to the consistent API, removing variables is just as simple:\n\n```{r}\nhead(data_remove(iris, starts_with(\"Sepal\")))\n```\n\n### Reorder or rename\n\n```{r}\nhead(data_relocate(iris, select = \"Species\", before = \"Sepal.Length\"))\n```\n\n```{r}\nhead(data_rename(iris, c(\"Sepal.Length\", \"Sepal.Width\"), c(\"length\", \"width\")))\n```\n\n### Merge\n\n```{r}\nx \u003c- data.frame(a = 1:3, b = c(\"a\", \"b\", \"c\"), c = 5:7, id = 1:3)\ny \u003c- data.frame(c = 6:8, d = c(\"f\", \"g\", \"h\"), e = 100:102, id = 2:4)\n\nx\ny\n\ndata_merge(x, y, join = \"full\")\n\ndata_merge(x, y, join = \"left\")\n\ndata_merge(x, y, join = \"right\")\n\ndata_merge(x, y, join = \"semi\", by = \"c\")\n\ndata_merge(x, y, join = \"anti\", by = \"c\")\n\ndata_merge(x, y, join = \"inner\")\n\ndata_merge(x, y, join = \"bind\")\n```\n\n### Reshape\n\nA common data wrangling task is to reshape data.\n\nEither to go from wide/Cartesian to long/tidy format\n\n```{r}\nwide_data \u003c- data.frame(replicate(5, rnorm(10)))\n\nhead(data_to_long(wide_data))\n```\n\nor the other way\n\n```{r}\nlong_data \u003c- data_to_long(wide_data, rows_to = \"Row_ID\") # Save row number\n\ndata_to_wide(long_data,\n  names_from = \"name\",\n  values_from = \"value\",\n  id_cols = \"Row_ID\"\n)\n```\n\n### Empty rows and columns\n\n```{r}\ntmp \u003c- data.frame(\n  a = c(1, 2, 3, NA, 5),\n  b = c(1, NA, 3, NA, 5),\n  c = c(NA, NA, NA, NA, NA),\n  d = c(1, NA, 3, NA, 5)\n)\n\ntmp\n\n# indices of empty columns or rows\nempty_columns(tmp)\nempty_rows(tmp)\n\n# remove empty columns or rows\nremove_empty_columns(tmp)\nremove_empty_rows(tmp)\n\n# remove empty columns and rows\nremove_empty(tmp)\n```\n\n### Recode or cut dataframe\n\n```{r}\nset.seed(123)\nx \u003c- sample(1:10, size = 50, replace = TRUE)\n\ntable(x)\n\n# cut into 3 groups, based on distribution (quantiles)\ntable(categorize(x, split = \"quantile\", n_groups = 3))\n```\n\n## Data Transformations\n\nThe packages also contains multiple functions to help transform data.\n\n### Standardize\n\nFor example, to standardize (*z*-score) data:\n\n```{r}\n# before\nsummary(swiss)\n\n# after\nsummary(standardize(swiss))\n```\n\n### Winsorize\n\nTo winsorize data:\n\n```{r}\n# before\nanscombe\n\n# after\nwinsorize(anscombe)\n```\n\n### Center\n\nTo grand-mean center data\n\n```{r}\ncenter(anscombe)\n```\n\n### Ranktransform\n\nTo rank-transform data:\n\n```{r}\n# before\nhead(trees)\n\n# after\nhead(ranktransform(trees))\n```\n\n### Rescale\n\nTo rescale a numeric variable to a new range:\n\n```{r}\nchange_scale(c(0, 1, 5, -5, -2))\n```\n\n### Rotate or transpose\n\n```{r}\nx \u003c- mtcars[1:3, 1:4]\n\nx\n\ndata_rotate(x)\n```\n\n\n## Data properties\n\n`datawizard` provides a way to provide comprehensive descriptive summary for all variables in a dataframe:\n\n```{r}\ndata(iris)\ndescribe_distribution(iris)\n```\n\nOr even just a variable\n\n```{r}\ndescribe_distribution(mtcars$wt)\n```\n\nThere are also some additional data properties that can be computed using this package.\n\n```{r}\nx \u003c- (-10:10)^3 + rnorm(21, 0, 100)\nsmoothness(x, method = \"diff\")\n```\n\n## Function design and pipe-workflow\n\nThe design of the `{datawizard}` functions follows a design principle that makes it easy for user to understand and remember how functions work:\n\n1. the first argument is the data\n2. for methods that work on data frames, two arguments are following to `select` and `exclude` variables\n3. the following arguments are arguments related to the specific tasks of the functions\n\nMost important, functions that accept data frames usually have this as their first argument, and also return a (modified) data frame again. Thus, `{datawizard}` integrates smoothly into a \"pipe-workflow\".\n\n```{r}\niris |\u003e\n  # all rows where Species is \"versicolor\" or \"virginica\"\n  data_filter(Species %in% c(\"versicolor\", \"virginica\")) |\u003e\n  # select only columns with \".\" in names (i.e. drop Species)\n  data_select(contains(\"\\\\.\")) |\u003e\n  # move columns that ends with \"Length\" to start of data frame\n  data_relocate(ends_with(\"Length\")) |\u003e\n  # remove fourth column\n  data_remove(4) |\u003e\n  head()\n```\n\n# Contributing and Support\n\nIn case you want to file an issue or contribute in another way to the package, please follow [this guide](https://easystats.github.io/datawizard/CONTRIBUTING.html). For questions about the functionality, you may either contact us via email or also file an issue.\n\n# Code of Conduct\n\nPlease note that this project is released with a\n[Contributor Code of Conduct](https://easystats.github.io/datawizard/CODE_OF_CONDUCT.html). By participating in this project you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasystats%2Fdatawizard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feasystats%2Fdatawizard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feasystats%2Fdatawizard/lists"}