Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/data-cleaning/validatetools


https://github.com/data-cleaning/validatetools

data-cleaning r rules validation

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(validatetools)
```

[![R-CMD-check](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/validatetools)](https://CRAN.R-project.org/package=validatetools)
[![codecov](https://codecov.io/github/data-cleaning/validatetools/graph/badge.svg?token=3tIe5HAUWm)](https://codecov.io/github/data-cleaning/validatetools)
[![Mentioned in Awesome Official Statistics](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)

# validatetools

`validatetools` is a utility package for managing validation rule sets that are defined with `validate`.
In production systems validation rule sets tend to grow organically and accumulate redundant or (partially)
contradictory rules. `validatetools` helps to identify problems with large rule sets and includes simplification
methods for resolving issues.

## Installation

`validatetools` is available from CRAN and can be installed with

```r
install.packages("validatetools")
```

The latest beta version of `validatetools` can be installed with
``` r
install.packages("validatetools", repos = "https://data-cleaning.github.io/drat")
```

The adventurous can install an (unstable) development version of `validatetools` from github with:

``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")
```

## Example

### Check for feasibility

```{r}
rules <- validator( x > 0)
is_infeasible(rules)

rules <- validator( rule1 = x > 0
, rule2 = x < 0
)
is_infeasible(rules)

detect_infeasible_rules(rules)
make_feasible(rules)

# find out the conflict with this rule
is_contradicted_by(rules, "rule1")
```

## Simplifying

The function `simplify_rules` combines most simplification methods of `validatetools` to simplify a rule set.
For example, it reduces the following rule set to a simpler form:

```{r}
rules <- validator( if (age < 16) income == 0
, job %in% c("yes", "no")
, if (job == "yes") income > 0
)
simplify_rules(rules, age = 13)
#or
simplify_rules(rules, job = "yes")
```

`simplify_rules` combines the following simplification and substitution methods:

### Value substitution

```{r}
rules <- validator( rule1 = height > 5
, rule2 = max_height >= height
, rule3 = if (gender == "male") weight > 100
, rule4 = gender %in% c("male", "female")
)
substitute_values(rules, height = 6, gender = "male")
```

### Finding fixed values

```{r}
rules <- validator( x >= 0, x <=0)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)

rules <- validator( rule1 = x1 + x2 + x3 == 0
, rule2 = x1 + x2 >= 0
, rule3 = x3 >=0
)
simplify_fixed_variables(rules)
```

### Simplifying conditional statements

```{r}
# non-relaxing clause
rules <- validator( r1 = if (income > 0) age >= 16
, r2 = age < 12
)
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)

# non-constraining clause
rules <- validator( if (age < 16) income == 0
, if (age >=16) income >= 0
)
simplify_conditional(rules)
```

### Removing redundant rules

```{r}
rules <- validator( rule1 = age > 12
, rule2 = age > 18
)

# rule1 is superfluous
remove_redundancy(rules)

rules <- validator( rule1 = age > 12
, rule2 = age > 12
)

# standout: rule1 and rule2, first rule wins
remove_redundancy(rules)

# Note that detection signifies both rules!
detect_redundancy(rules)
```