https://github.com/aaronpeikert/iv
Independend Validation (IV) with 'rsample'
https://github.com/aaronpeikert/iv
Last synced: about 2 months ago
JSON representation
Independend Validation (IV) with 'rsample'
- Host: GitHub
- URL: https://github.com/aaronpeikert/iv
- Owner: aaronpeikert
- License: other
- Created: 2019-10-03T20:26:35.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-10-05T06:19:15.000Z (over 1 year ago)
- Last Synced: 2025-01-29T05:42:42.345Z (4 months ago)
- Language: R
- Size: 27.3 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# Independend Validation (IV)
[](https://opensource.org/licenses/MIT) [](https://github.com/aaronpeikert/iv/issues/new)Independend Validation is a procedure proposed by von Oertzen (in prep), which produces independend assessment sets. This property is assumed when performing most statistical tests on the performance measures associated with the assesment sets. Importantly classical resampling procedures (like cross validation or bootstrapping) do violate this assumption, because even when the original sample are independend, the resulting assesment and holdout sets are not.
## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("aaronpeikert/iv")
```## Examples
```{r data}
# install.packages("modeldata")
library(modeldata)
data("attrition")
# downsample
attrition <- attrition[sample(seq_len(nrow(attrition)), 100), ]
``````{r iv}
library(iv)
library(rsample)
iv_obj <- iv(attrition, m = 20)
iv_obj
``````{r lm_func}
mod_form <- as.formula(Attrition ~ JobSatisfaction + Gender + MonthlyIncome)
## splits will be the `rsplit` object
holdout_results <- function(splits, ...) {
# Fit the model to the 90%
mod <- glm(..., data = analysis(splits), family = binomial)
# Save the 10%
holdout <- assessment(splits)
# `augment` will save the predictions with the holdout data set
res <- broom::augment(mod, newdata = holdout)
# Class predictions on the assessment set from class probs
lvls <- levels(holdout$Attrition)
predictions <- factor(ifelse(res$.fitted > 0, lvls[2], lvls[1]),
levels = lvls)
# Calculate whether the prediction was correct
res$correct <- predictions == holdout$Attrition
# Return the assessment data set with the additional columns
res
}
``````{r model_purrr, warning=FALSE}
library(purrr)
iv_obj$results <- map(iv_obj$splits,
holdout_results,
mod_form)
iv_obj$accuracy <- map_dbl(iv_obj$results, function(x) mean(x$correct))
summary(iv_obj$accuracy)
```by [Aaron Peikert](https://orcid.org/0000-0001-7813-818X)
and [Andreas Brandmaier](http://orcid.org/0000-0001-8765-6982).