An open API service indexing awesome lists of open source software.

https://github.com/poissonconsulting/newdata

An R Package to Generate New Data Frames for Prediction
https://github.com/poissonconsulting/newdata

Last synced: 4 months ago
JSON representation

An R Package to Generate New Data Frames for Prediction

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# newdata

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/poissonconsulting/newdata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/poissonconsulting/newdata/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/gh/poissonconsulting/newdata/graph/badge.svg?token=pJO8edj5Wu)](https://codecov.io/gh/poissonconsulting/newdata)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/license/mit)
[![CRAN status](https://www.r-pkg.org/badges/version/newdata)](https://cran.r-project.org/package=newdata)

## Introduction

`newdata` is an R package to generate new data frames for predictive purposes.
By default, all specified variables vary across their range
while all other variables are held constant at the default reference value.
Types, classes, factor levels and time zones are always preserved.
The user can specify the length of each sequence, require that only
observed values and combinations are used and add new variables.

Consider the following observed data frame.
```{r}
library(newdata)

obs_data
```

### Length of Sequences

By default all variables are held constant (length of 1).
```{r}
xnew_data(obs_data)
```

Specifying a variable causes it to vary sequentially across its range.
```{r}
xnew_data(obs_data, int)
```

The user can specify the length of each sequence.
```{r}
xnew_data(obs_data, xnew_seq(int, length_out = 3))
```

### Observed Values and Combinations

The user can also indicate whether only observed values should be used in the sequence.
```{r}
xnew_data(obs_data, xnew_seq(int, length_out = 3, obs_only = TRUE))
```

The `xobs_only()` function can be used to filter out unobserved values after the sequence has been generated.
```{r}
xnew_data(obs_data, xobs_only(xnew_seq(int, length_out = 3)))
```

With two or more variables all combinations are used.
```{r}
xnew_data(obs_data, int, fct)
```

To only get observed combinations use `xobs_only()`
```{r}
xnew_data(obs_data, xobs_only(int, fct))
```

### Add New Variables

Adding a new variable is simple.
```{r}
xnew_data(obs_data, new = c(TRUE, FALSE))
```

### Casting Variables

Casting variables is easy.
```{r}
xnew_data(obs_data, xcast(int = 7, dbl = 10L, fct = "a rarity"))
```

## Installation

To install the latest release version from CRAN.
```r
install.packages("newdata")
```

To install the latest development version from [r-universe](https://poissonconsulting.r-universe.dev/newdata).
```r
install.packages("newdata", repos = c("https://poissonconsulting.r-universe.dev", "https://cloud.r-project.org"))
```

To install the latest development version from [GitHub](https://github.com/poissonconsulting/newdata)
```r
# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
pak::pak("poissonconsulting/newdata")
```

## Contribution

Please report any [issues](https://github.com/poissonconsulting/newdata/issues).

[Pull requests](https://github.com/poissonconsulting/newdata/pulls) are always welcome.

## Code of Conduct

Please note that the newdata project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.