Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/data-cleaning/dcmodifydb
Deterministic, documented correction rules on a database
https://github.com/data-cleaning/dcmodifydb
correction data-cleaning database r
Last synced: 3 months ago
JSON representation
Deterministic, documented correction rules on a database
- Host: GitHub
- URL: https://github.com/data-cleaning/dcmodifydb
- Owner: data-cleaning
- License: other
- Created: 2021-05-02T15:22:27.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-07T16:37:20.000Z (almost 2 years ago)
- Last Synced: 2024-06-05T01:46:50.913Z (5 months ago)
- Topics: correction, data-cleaning, database, r
- Language: R
- Homepage: https://data-cleaning.github.io/dcmodifydb
- Size: 585 KB
- Stars: 5
- Watchers: 3
- Forks: 4
- Open Issues: 7
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - data-cleaning/dcmodifydb - Deterministic, documented correction rules on a database (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# dcmodifydb
[![CRAN status](https://www.r-pkg.org/badges/version/dcmodifydb)](https://CRAN.R-project.org/package=dcmodifydb)
[![R-CMD-check](https://github.com/data-cleaning/dcmodifydb/workflows/R-CMD-check/badge.svg)](https://github.com/data-cleaning/dcmodifydb/actions)
[![Downloads](https://cranlogs.r-pkg.org/badges/dcmodifydb)](https://cran.r-project.org/package=dcmodifydb)
[![Codecov test coverage](https://codecov.io/gh/data-cleaning/dcmodifydb/branch/main/graph/badge.svg)](https://codecov.io/gh/data-cleaning/dcmodifydb?branch=main)The goal of dcmodifydb is to apply modification rules specified with `dcmodify`
on a database table, allowing for documented, reproducable data cleaning adjustments
in a database.`dcmodify` separates **intent** from **execution**: a user specifies _what_, _why_ and _how_ of an automatic data change and uses `dcmodifydb` to execute them on a `tbl` database table.
## Installation
The development version from [GitHub](https://github.com/) can be installed with:
``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/dcmodifydb")
```
## Example```{r, code = readLines("./example/modify.R")}
```### Documented rules
```{r}
library(DBI)
library(dcmodify)
library(dcmodifydb)
con <- dbConnect(RSQLite::SQLite())
```You can use YAML to store the modification rules: "example.yml"
```yaml
```{r, result="asis", child="example/example.yml"}
```
```Let's load the rules and apply them to a data set:
```{r, eval = FALSE}
m <- modifier(.file = "example.yml")
``````{r, echo = FALSE, eval = TRUE}
m <- modifier(.file = "example/example.yml")
``````{r}
print(m)
``````{r}
# setup the data
"age, income
11, 2000
150, 300
25, 2000
-10, 2000
" -> csv
income <- read.csv(text = csv, strip.white = TRUE)
dbWriteTable(con, "income", income)
tbl_income <- dplyr::tbl(con, "income")# this is the table in the data base
tbl_income# and now after modification
modify(tbl_income, m, copy = FALSE)
```Generated sql can be written with `dump_sql`
```{r, eval=FALSE}
dump_sql(m, tbl_income, file = "modify.sql")
```modify.sql:
```sql
```{r, echo=FALSE, results='asis'}
dump_sql(m, tbl_income)
```
``````{r}
dbDisconnect(con)
```Note: Modification rules can be written to yaml with `as_yaml` and `export_yaml`.
```{r, eval = FALSE}
dcmodify::export_yaml(m, "cleaning_steps.yml")
```