An open API service indexing awesome lists of open source software.

https://github.com/data-cleaning/validatetools


https://github.com/data-cleaning/validatetools

data-cleaning r rules validation

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(validatetools)
```

[![R-CMD-check](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/validatetools)](https://CRAN.R-project.org/package=validatetools)
[![Mentioned in Awesome Official Statistics](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)
[![Codecov test coverage](https://codecov.io/gh/data-cleaning/validatetools/graph/badge.svg)](https://app.codecov.io/gh/data-cleaning/validatetools)

# validatetools

`validatetools` is a utility package for managing validation rule sets that are defined with `validate`.
In production systems validation rule sets tend to grow organically and accumulate redundant or (partially)
contradictory rules. `validatetools` helps to identify problems with large rule sets and includes simplification
methods for resolving issues.

## Installation

`validatetools` is available from CRAN and can be installed with

```r
install.packages("validatetools")
```

The adventurous can install an (unstable) development version of `validatetools` from github with:

``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")
```
or use

```r
install.packages('validatetools', repos = c('https://data-cleaning.r-universe.dev', 'https://cloud.r-project.org'))
```

## Example

### Check for feasibility

```{r}
rules <- validator( x > 0)
is_infeasible(rules)

rules <- validator(
rule1 = x > 0,
rule2 = x < 0
)
is_infeasible(rules)

detect_infeasible_rules(rules, verbose=TRUE)
# find out the conflict with this rule
is_contradicted_by(rules, "rule1", verbose=TRUE)

# we prefer to keep rule1, so we can give rule1 Inf weight
detect_infeasible_rules(
rules,
weight=c(rule1 = Inf),
verbose=TRUE
)

make_feasible(rules, weight=c(rule1=Inf), verbose=TRUE)
```

### Finding contradicting if rules

```{r}
rules <- validator(
rule1 = if (income > 0) job == "yes",
rule2 = if (job == "yes") income == 0
)

is_infeasible(rules, verbose=TRUE)
conflicts <- detect_contradicting_if_rules(rules, verbose=TRUE)
```

```{r}
print(conflicts)
```

## Simplifying

The function `simplify_rules` combines most simplification methods of `validatetools` to simplify a rule set.
For example, it reduces the following rule set to a simpler form:

```{r}
rules <- validator(
rule1 = if (age < 16) income == 0,
rule2 = job %in% c("yes", "no"),
rule3 = if (job == "yes") income > 0
)

simplify_rules(rules, age = 13)
#or
simplify_rules(rules, job = "yes")
```

`simplify_rules` combines the following simplification and substitution methods:

### Value substitution

```{r}
rules <- validator(
rule1 = height > 4,
rule2 = height <= max_height,
rule3 = if (gender == "male") weight > 100,
rule4 = gender %in% c("male", "female")
)
substitute_values(rules, max_height = 6, gender = "male")
```

### Finding fixed values

```{r}
rules <- validator(
rule1 = x >= 0,
rule2 = x <=0
)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)

rules <- validator(
rule1 = x1 + x2 + x3 == 0,
rule2 = x1 + x2 >= 0,
rule3 = x3 >=0
)
simplify_fixed_variables(rules)
```

### Simplifying conditional statements

```{r}
# superfluous conditions
rules <- validator(
r1 = if (age > 18) age <= 67,
r2 = if (income > 0 && income > 1000) job == TRUE
)
# implies that age always is <= 67
simplify_conditional(rules)

# non-relaxing clause
rules <- validator(
r1 = if (income > 0) age >= 16,
r2 = age < 12
)
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)

# non-constraining clause
rules <- validator(
rule1 = if (age < 16) income == 0,
rule2 = if (age >=16) income >= 0
)
simplify_conditional(rules)
```

### Removing redundant rules

```{r}
rules <- validator(
rule1 = age > 12,
rule2 = age > 18
)

# rule1 is superfluous
remove_redundancy(rules, verbose=TRUE)

rules <- validator(
rule1 = age > 12,
rule2 = age > 12
)

# standout: rule1 and rule2, first rule wins
remove_redundancy(rules, verbose=TRUE)

# Note that detection signifies both rules!
detect_redundancy(rules, verbose=TRUE)
```