Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/data-cleaning/dcmodifydb

Deterministic, documented correction rules on a database
https://github.com/data-cleaning/dcmodifydb

correction data-cleaning database r

Last synced: 3 months ago
JSON representation

Deterministic, documented correction rules on a database

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# dcmodifydb

[![CRAN status](https://www.r-pkg.org/badges/version/dcmodifydb)](https://CRAN.R-project.org/package=dcmodifydb)
[![R-CMD-check](https://github.com/data-cleaning/dcmodifydb/workflows/R-CMD-check/badge.svg)](https://github.com/data-cleaning/dcmodifydb/actions)
[![Downloads](https://cranlogs.r-pkg.org/badges/dcmodifydb)](https://cran.r-project.org/package=dcmodifydb)
[![Codecov test coverage](https://codecov.io/gh/data-cleaning/dcmodifydb/branch/main/graph/badge.svg)](https://codecov.io/gh/data-cleaning/dcmodifydb?branch=main)

The goal of dcmodifydb is to apply modification rules specified with `dcmodify`
on a database table, allowing for documented, reproducable data cleaning adjustments
in a database.

`dcmodify` separates **intent** from **execution**: a user specifies _what_, _why_ and _how_ of an automatic data change and uses `dcmodifydb` to execute them on a `tbl` database table.

## Installation

The development version from [GitHub](https://github.com/) can be installed with:

``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/dcmodifydb")
```
## Example

```{r, code = readLines("./example/modify.R")}
```

### Documented rules

```{r}
library(DBI)
library(dcmodify)
library(dcmodifydb)
con <- dbConnect(RSQLite::SQLite())
```

You can use YAML to store the modification rules: "example.yml"

```yaml
```{r, result="asis", child="example/example.yml"}
```
```

Let's load the rules and apply them to a data set:

```{r, eval = FALSE}
m <- modifier(.file = "example.yml")
```

```{r, echo = FALSE, eval = TRUE}
m <- modifier(.file = "example/example.yml")
```

```{r}
print(m)
```

```{r}
# setup the data
"age, income
11, 2000
150, 300
25, 2000
-10, 2000
" -> csv
income <- read.csv(text = csv, strip.white = TRUE)
dbWriteTable(con, "income", income)
tbl_income <- dplyr::tbl(con, "income")

# this is the table in the data base
tbl_income

# and now after modification
modify(tbl_income, m, copy = FALSE)
```

Generated sql can be written with `dump_sql`

```{r, eval=FALSE}
dump_sql(m, tbl_income, file = "modify.sql")
```

modify.sql:
```sql
```{r, echo=FALSE, results='asis'}
dump_sql(m, tbl_income)
```
```

```{r}
dbDisconnect(con)
```

Note: Modification rules can be written to yaml with `as_yaml` and `export_yaml`.

```{r, eval = FALSE}
dcmodify::export_yaml(m, "cleaning_steps.yml")
```