{"id":17224222,"url":"https://github.com/ekstroem/datamaid","last_synced_at":"2025-04-09T07:08:17.840Z","repository":{"id":56934556,"uuid":"69242550","full_name":"ekstroem/dataMaid","owner":"ekstroem","description":"An R package for data screening","archived":false,"fork":false,"pushed_at":"2022-01-25T10:21:02.000Z","size":26744,"stargazers_count":143,"open_issues_count":15,"forks_count":26,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-02T04:09:24.478Z","etag":null,"topics":["data-cleaning","data-screening","reproducible-research"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ekstroem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-26T11:15:17.000Z","updated_at":"2024-08-16T20:02:28.000Z","dependencies_parsed_at":"2022-08-21T06:50:47.949Z","dependency_job_id":null,"html_url":"https://github.com/ekstroem/dataMaid","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ekstroem%2FdataMaid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ekstroem%2FdataMaid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ekstroem%2FdataMaid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ekstroem%2FdataMaid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ekstroem","download_url":"https://codeload.github.com/ekstroem/dataMaid/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247994121,"owners_count":21030050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-screening","reproducible-research"],"created_at":"2024-10-15T04:10:33.818Z","updated_at":"2025-04-09T07:08:17.819Z","avatar_url":"https://github.com/ekstroem.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dataMaid \u003cimg src=\"man/figures/logo.png\" width=\"121px\" height=\"140px\" align=\"right\" style=\"padding-left:10px;background-color:white;\" /\u003e\n\n\n[![Travis-CI Build\nStatus](https://travis-ci.org/ekstroem/dataMaid.svg?branch=master)](https://travis-ci.org/ekstroem/dataMaid)\n[![CRAN\\_Release\\_Badge](http://www.r-pkg.org/badges/version-ago/dataMaid)](https://CRAN.R-project.org/package=dataMaid)\n![Download counter](http://cranlogs.r-pkg.org/badges/grand-total/dataMaid)\n\ndataMaid is an R package for documenting and creating reports on data cleanliness. \n\n## dataMaid has become dataReporter\n\ndataMaid has been renamed to dataReporter. **dataMaid is no longer maintained. All future updates and development will be made for dataReporter.** Install the new package from CRAN like this\n ```{r}\n install.packages(\"dataReporter\")\n ``` \n or install the development version from Github:\n ```{r]\n devtools::install_github(\"ekstroem/dataReporter\")\n ```\n **Please report bugs at our [new repository](https://github.com/ekstroem/dataReporter). **\n\n\n\n## Installation\n\nThis github page contains the *development version* of dataMaid. For the\nlatest stable version download the package from CRAN directly using\n\n```{r}\ninstall.packages(\"dataMaid\")\n```\n\nTo install the development version of dataMaid run the following\ncommands from within R (requires that the `devtools` package is already installed)\n\n```{r}\ndevtools::install_github(\"ekstroem/dataMaid\")\n```\n\n## Package overview\n\nA super simple way to get started is to load the package and use the\n`makeDataReport()` function on a data frame (if you try to generate several\nreports for the same data, then it may be necessary to add the `replace=TRUE`\nargument to overwrite the existing report). \n\n```{r}\nlibrary(\"dataMaid\")\ndata(trees)\nmakeDataReport(trees)\n```\n\nThis will create a report with summaries and error checks for each\nvariable in the `trees` data frame. The format of the report depends on your OS and whether \nyou have have a [LaTeX](https://www.latex-project.org/) installation on your computer, which\nis needed for creating pdf reports. \n\n\n### Using dataMaid interactively\n\nThe dataMaid package can also be used interactively by running checks\nfor the individual variables or for all variables in the dataset\n\n```{r}\ndata(toyData)\ncheck(toyData$events)  # Individual check of events\ncheck(toyData) # Check all variables at once\n```\n\nBy default the standard battery of tests is run depending on the\nvariable type. If we just want a specific test for, say, a numeric\nvariable then we can specify that. All available checks can be viewed\nby calling `allCheckFunctions()`. See [the\ndocumentation](https://github.com/ekstroem/dataMaid/blob/master/latex/article_vol2.pdf)\nfor an overview of the checks available or how to create and include\nyour own tests.\n\n\n```{r}\ncheck(toyData$events, checks = setChecks(numeric = \"identifyMissing\"))\n```\n\nWe can also access the graphics or summary tables that are produced for a variable by calling the `visualize` or `summarize` functions. One can visualize a single variable or a full dataset:\n\n```{r}\n#Visualize a variable\nvisualize(toyData$events)\n\n#Visualize a dataset\nvisualize(toyData)\n```  \n\nThe same is true for summaries. Note also that the choice of checks/visualizations/summaries are customizable:\n\n```{r}\n#Summarize a variable with default settings:\nsummarize(toyData$events) \n\n#Summarize a variable with user-specified settings:\nsummarize(toyData$events, summaries = setSummaries(all =  c(\"centralValue\", \"minMax\"))  \n```\n\n\n## Detailed documentation\n\nYou can read the main paper accompanying the package at the [Journal\nof Statistical\nSoftware](https://www.jstatsoft.org/article/view/v090i06). It provides\na detailed introduction to the dataMaid package.\n\nWe also have two blog posts that provide an introduction to the package. The can be found [here (the primary one)](https://sandsynligvis.dk/2017/08/21/datamaid-your-personal-assistant-for-cleaning-up-the-data-cleaning-process/) and [here](https://sandsynligvis.dk/2018/03/03/generating-codebooks-in-r/).\n\nMoreover, we have\ncreated a vignette that describes how to extend dataMaid to include\nuser-defined data screening checks, summaries and visualizations. This\nvignette is called `extending_dataMaid`:\n\n```{r}\nvignette(\"extending_dataMaid\")\n```\n\n\n\n\n## Online app\n\nWe are currently working on an online version of the tool, where users\ncan upload their data and get a report. A prototype\nis already up and running - we just need to configure the R server correctly.\n\nUntil we have set it up online, you can try it out on your own machine:\n```{r}\nlibrary(shiny)\nrunUrl(\"https://github.com/ekstroem/dataMaid/raw/master/app/app.zip\")\n``` \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fekstroem%2Fdatamaid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fekstroem%2Fdatamaid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fekstroem%2Fdatamaid/lists"}