https://github.com/data-cleaning/validatetools
https://github.com/data-cleaning/validatetools
data-cleaning r rules validation
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/data-cleaning/validatetools
- Owner: data-cleaning
- License: other
- Created: 2017-07-27T12:10:17.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-06-14T09:32:49.000Z (over 1 year ago)
- Last Synced: 2024-10-13T12:44:00.603Z (12 months ago)
- Topics: data-cleaning, r, rules, validation
- Language: R
- Homepage: https://data-cleaning.github.io/validatetools
- Size: 6.13 MB
- Stars: 15
- Watchers: 5
- Forks: 3
- Open Issues: 6
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - data-cleaning/validatetools - (R)
README
---
output: github_document
---```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(validatetools)
```[](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=validatetools)
[](http://www.awesomeofficialstatistics.org)
[](https://app.codecov.io/gh/data-cleaning/validatetools)# validatetools
`validatetools` is a utility package for managing validation rule sets that are defined with `validate`.
In production systems validation rule sets tend to grow organically and accumulate redundant or (partially)
contradictory rules. `validatetools` helps to identify problems with large rule sets and includes simplification
methods for resolving issues.## Installation
`validatetools` is available from CRAN and can be installed with
```r
install.packages("validatetools")
```The adventurous can install an (unstable) development version of `validatetools` from github with:
``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")
```
or use```r
install.packages('validatetools', repos = c('https://data-cleaning.r-universe.dev', 'https://cloud.r-project.org'))
```## Example
### Check for feasibility
```{r}
rules <- validator( x > 0)
is_infeasible(rules)rules <- validator(
rule1 = x > 0,
rule2 = x < 0
)
is_infeasible(rules)detect_infeasible_rules(rules, verbose=TRUE)
# find out the conflict with this rule
is_contradicted_by(rules, "rule1", verbose=TRUE)# we prefer to keep rule1, so we can give rule1 Inf weight
detect_infeasible_rules(
rules,
weight=c(rule1 = Inf),
verbose=TRUE
)make_feasible(rules, weight=c(rule1=Inf), verbose=TRUE)
```### Finding contradicting if rules
```{r}
rules <- validator(
rule1 = if (income > 0) job == "yes",
rule2 = if (job == "yes") income == 0
)
is_infeasible(rules, verbose=TRUE)
conflicts <- detect_contradicting_if_rules(rules, verbose=TRUE)
``````{r}
print(conflicts)
```## Simplifying
The function `simplify_rules` combines most simplification methods of `validatetools` to simplify a rule set.
For example, it reduces the following rule set to a simpler form:```{r}
rules <- validator(
rule1 = if (age < 16) income == 0,
rule2 = job %in% c("yes", "no"),
rule3 = if (job == "yes") income > 0
)simplify_rules(rules, age = 13)
#or
simplify_rules(rules, job = "yes")
````simplify_rules` combines the following simplification and substitution methods:
### Value substitution
```{r}
rules <- validator(
rule1 = height > 4,
rule2 = height <= max_height,
rule3 = if (gender == "male") weight > 100,
rule4 = gender %in% c("male", "female")
)
substitute_values(rules, max_height = 6, gender = "male")
```### Finding fixed values
```{r}
rules <- validator(
rule1 = x >= 0,
rule2 = x <=0
)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)rules <- validator(
rule1 = x1 + x2 + x3 == 0,
rule2 = x1 + x2 >= 0,
rule3 = x3 >=0
)
simplify_fixed_variables(rules)
```### Simplifying conditional statements
```{r}
# superfluous conditions
rules <- validator(
r1 = if (age > 18) age <= 67,
r2 = if (income > 0 && income > 1000) job == TRUE
)
# implies that age always is <= 67
simplify_conditional(rules)# non-relaxing clause
rules <- validator(
r1 = if (income > 0) age >= 16,
r2 = age < 12
)
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)# non-constraining clause
rules <- validator(
rule1 = if (age < 16) income == 0,
rule2 = if (age >=16) income >= 0
)
simplify_conditional(rules)
```### Removing redundant rules
```{r}
rules <- validator(
rule1 = age > 12,
rule2 = age > 18
)# rule1 is superfluous
remove_redundancy(rules, verbose=TRUE)rules <- validator(
rule1 = age > 12,
rule2 = age > 12
)# standout: rule1 and rule2, first rule wins
remove_redundancy(rules, verbose=TRUE)# Note that detection signifies both rules!
detect_redundancy(rules, verbose=TRUE)
```