https://github.com/data-cleaning/validatetools
https://github.com/data-cleaning/validatetools
data-cleaning r rules validation
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/data-cleaning/validatetools
- Owner: data-cleaning
- License: other
- Created: 2017-07-27T12:10:17.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-06-14T09:32:49.000Z (8 months ago)
- Last Synced: 2024-10-13T12:44:00.603Z (4 months ago)
- Topics: data-cleaning, r, rules, validation
- Language: R
- Homepage: https://data-cleaning.github.io/validatetools
- Size: 6.13 MB
- Stars: 15
- Watchers: 5
- Forks: 3
- Open Issues: 6
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - data-cleaning/validatetools - (R)
README
---
output: github_document
---```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(validatetools)
```[](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=validatetools)
[](https://codecov.io/github/data-cleaning/validatetools)
[](http://www.awesomeofficialstatistics.org)# validatetools
`validatetools` is a utility package for managing validation rule sets that are defined with `validate`.
In production systems validation rule sets tend to grow organically and accumulate redundant or (partially)
contradictory rules. `validatetools` helps to identify problems with large rule sets and includes simplification
methods for resolving issues.## Installation
`validatetools` is available from CRAN and can be installed with
```r
install.packages("validatetools")
```The latest beta version of `validatetools` can be installed with
``` r
install.packages("validatetools", repos = "https://data-cleaning.github.io/drat")
```The adventurous can install an (unstable) development version of `validatetools` from github with:
``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")
```## Example
### Check for feasibility
```{r}
rules <- validator( x > 0)
is_infeasible(rules)rules <- validator( rule1 = x > 0
, rule2 = x < 0
)
is_infeasible(rules)detect_infeasible_rules(rules)
make_feasible(rules)# find out the conflict with this rule
is_contradicted_by(rules, "rule1")
```## Simplifying
The function `simplify_rules` combines most simplification methods of `validatetools` to simplify a rule set.
For example, it reduces the following rule set to a simpler form:```{r}
rules <- validator( if (age < 16) income == 0
, job %in% c("yes", "no")
, if (job == "yes") income > 0
)
simplify_rules(rules, age = 13)
#or
simplify_rules(rules, job = "yes")
````simplify_rules` combines the following simplification and substitution methods:
### Value substitution
```{r}
rules <- validator( rule1 = height > 5
, rule2 = max_height >= height
, rule3 = if (gender == "male") weight > 100
, rule4 = gender %in% c("male", "female")
)
substitute_values(rules, height = 6, gender = "male")
```### Finding fixed values
```{r}
rules <- validator( x >= 0, x <=0)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)rules <- validator( rule1 = x1 + x2 + x3 == 0
, rule2 = x1 + x2 >= 0
, rule3 = x3 >=0
)
simplify_fixed_variables(rules)
```### Simplifying conditional statements
```{r}
# non-relaxing clause
rules <- validator( r1 = if (income > 0) age >= 16
, r2 = age < 12
)
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)# non-constraining clause
rules <- validator( if (age < 16) income == 0
, if (age >=16) income >= 0
)
simplify_conditional(rules)
```### Removing redundant rules
```{r}
rules <- validator( rule1 = age > 12
, rule2 = age > 18
)# rule1 is superfluous
remove_redundancy(rules)rules <- validator( rule1 = age > 12
, rule2 = age > 12
)# standout: rule1 and rule2, first rule wins
remove_redundancy(rules)# Note that detection signifies both rules!
detect_redundancy(rules)
```