Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/data-cleaning/validatedb
Validate on a table in a DB, using dbplyr
https://github.com/data-cleaning/validatedb
database datacleaning validation
Last synced: 3 months ago
JSON representation
Validate on a table in a DB, using dbplyr
- Host: GitHub
- URL: https://github.com/data-cleaning/validatedb
- Owner: data-cleaning
- License: gpl-3.0
- Created: 2020-11-11T23:18:56.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-06-03T14:11:33.000Z (over 2 years ago)
- Last Synced: 2024-06-05T02:32:08.925Z (9 months ago)
- Topics: database, datacleaning, validation
- Language: R
- Homepage: https://data-cleaning.github.io/validatedb
- Size: 623 KB
- Stars: 32
- Watchers: 2
- Forks: 4
- Open Issues: 4
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
- jimsghstars - data-cleaning/validatedb - Validate on a table in a DB, using dbplyr (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# validatedb[](https://CRAN.R-project.org/package=validatedb)
[](https://github.com/data-cleaning/validatedb/actions)
[](https://codecov.io/gh/data-cleaning/validatedb?branch=master)
[](http://www.awesomeofficialstatistics.org)`validatedb` executes validation checks written with R package `validate` on a
database. This allows for checking the validity of records in a database.## Installation
You can install a development version with
``` r
remotes::install_github("data-cleaning/validatedb")
```## Example
```{r example}
library(validatedb)
```First we setup a table in a database (for demo purpose)
```{r}
# create a table in a database
income <- data.frame(id=1:2, age=c(12,35), salary = c(1000,NA))
con <- DBI::dbConnect(RSQLite::SQLite())
DBI::dbWriteTable(con, "income", income)
```We retrieve a reference/handle to the table in the DB with `dplyr`
```{r}
tbl_income <- tbl(con, "income")
print(tbl_income)
```Let's define a rule set and confront the table with it:
```{r}
rules <- validator( is_adult = age >= 18
, has_income = salary > 0
, mean_age = mean(age,na.rm=TRUE) > 24
, has_values = is_complete(age, salary)
)# and confront!
cf <- confront(tbl_income, rules, key = "id")print(cf)
summary(cf)
```Values (i.e. validations on the table) can be retrieved like in `validate` with
`type="matrix"` or `type="list"````{r}
values(cf, type = "matrix")
```But often this seems more handy:
```{r}
values(cf, type = "tbl")
```or
```{r}
values(cf, type = "tbl", sparse=TRUE)
```We can see the sql code by using `show_query`:
```{r}
show_query(cf)
```Or write the sql to a file for documentation (and inspiration)
```{r, eval = FALSE}
dump_sql(cf, "validation.sql")
``````sql
```{r, echo = FALSE, results="asis"}
dump_sql(cf)
```
```### Aggregate example
```{r aggregate, code = readLines("./example/aggregate.R")}
```
## validate specific functions
### Added:
- [x] `is_complete`, `all_complete`
- [x] `is_unique`, `all_unique`
- [x] `exists_any`, `exists_one`
- [x] `do_by`, `sum_by`, `mean_by`, `min_by`, `max_by`### Todo
Some newly added `validate` utility functions are (still) missing from `validatedb`.- [ ] `contains_exactly`
- [ ] `is_linear_sequence`
- [ ] `hierachy`