Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rafzamb/sknifedatar
sknifedatar is a package that serves primarily as an extension to the modeltime 📦 ecosystem. In addition to some functionalities of spatial data and visualization.
https://github.com/rafzamb/sknifedatar
data data-analysis data-science data-visualization forecasting r statistics time-series
Last synced: 2 months ago
JSON representation
sknifedatar is a package that serves primarily as an extension to the modeltime 📦 ecosystem. In addition to some functionalities of spatial data and visualization.
- Host: GitHub
- URL: https://github.com/rafzamb/sknifedatar
- Owner: rafzamb
- License: other
- Created: 2020-12-22T04:42:06.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-02-06T21:43:02.000Z (almost 2 years ago)
- Last Synced: 2024-08-06T03:03:25.976Z (6 months ago)
- Topics: data, data-analysis, data-science, data-visualization, forecasting, r, statistics, time-series
- Language: R
- Homepage: https://rafzamb.github.io/sknifedatar/
- Size: 34 MB
- Stars: 36
- Watchers: 2
- Forks: 11
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sknifedatar :package: “Swiss Knife of Data for R”
[![CRAN
status](https://www.r-pkg.org/badges/version/sknifedatar)](https://CRAN.R-project.org/package=sknifedatar)
[![Total
Downloads](http://cranlogs.r-pkg.org/badges/grand-total/sknifedatar?color=brightgreen)](https://cran.r-project.org/package=sknifedatar)
![](http://cranlogs.r-pkg.org/badges/sknifedatar?color=brightgreen)> Serves primarily as an extension to the
> [modeltime](https://business-science.github.io/modeltime/) :package:
> ecosystem. In addition to some functionalities of spatial data and
> visualization.## Installation
CRAN version:
``` r
install.packages("sknifedatar")
```Or install the development version from GitHub with:
``` r
# install.packages("devtools")
devtools::install_github("rafzamb/sknifedatar")
```## Features:
- multifit: multiple models into multiple time series (no panel).
- workflowsets: [worklowset over a time
series](https://rafzamb.github.io/sknifedatar/articles/workflowsets_times.html).
- workflowset multifit: [worklowset over multiple time series (no
panel)](https://rafzamb.github.io/sknifedatar/articles/workflowsets_multi_times.html).
- automagic tabs: [automatic generation of Tabs in Distill/Rmarkdown
files](https://rafzamb.github.io/sknifedatar/articles/automatic_tabs.html).
- sliding windows: [data partitioning in sliding
windows](https://rafzamb.github.io/sknifedatar/articles/sliding_windows.html).## Usage
### Fit multiple models into multiple time series.
### libraries
``` r
library(modeltime)
library(rsample)
library(parsnip)
library(recipes)
library(workflows)
library(dplyr)
library(tidyr)
library(sknifedatar)
```### Data
``` r
data("emae_series")
nested_serie = emae_series %>% filter(date < '2020-02-01') %>% nest(nested_column=-sector)
nested_serie
#> # A tibble: 16 x 2
#> sector nested_column
#>
#> 1 Comercio
#> 2 Ensenanza
#> 3 Administracion publica
#> 4 Transporte y comunicaciones
#> 5 Servicios sociales/Salud
#> 6 Impuestos netos
#> 7 Sector financiero
#> 8 Mineria
#> 9 Agro/Ganaderia/Caza/Silvicultura
#> 10 Electricidad/Gas/Agua
#> 11 Hoteles/Restaurantes
#> 12 Inmobiliarias
#> 13 Otras actividades
#> 14 Pesca
#> 15 Industria manufacturera
#> 16 Construccion
```### Recipes
``` r
recipe_1 = recipe(value ~ ., data = emae_series %>% select(-sector)) %>%
step_date(date, features = c("month", "quarter", "year"), ordinal = TRUE)
```### Models
``` r
m_auto_arima <- arima_reg() %>% set_engine('auto_arima')m_stlm_arima <- seasonal_reg() %>%
set_engine("stlm_arima")m_nnetar <- workflow() %>%
add_recipe(recipe_1) %>%
add_model(nnetar_reg() %>% set_engine("nnetar"))
```### modeltime\_multifit()
``` r
model_table_emae <- modeltime_multifit(serie = nested_serie %>% head(3),
.prop = 0.8,
m_auto_arima,
m_stlm_arima,
m_nnetar)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#>
#> ── 3 models fitted ♥ ───────────────────────────────────────────────────────────model_table_emae
#> $table_time
#> # A tibble: 3 x 7
#> sector nested_column m_auto_arima m_stlm_arima m_nnetar nested_model
#>
#> 1 Comercio 2 Ensenanza 3 Administrac… # … with 1 more variable: calibration
#>
#> $models_accuracy
#> # A tibble: 9 x 10
#> name_serie .model_id .model_desc .type mae mape mase smape rmse rsq
#>
#> 1 Comercio 1 ARIMA(0,1,1… Test 8.54 5.55 0.656 5.69 10.7 0.588
#> 2 Comercio 2 SEASONAL DE… Test 9.33 6.28 0.717 6.24 11.2 0.415
#> 3 Comercio 3 NNAR(1,1,10… Test 9.71 6.36 0.746 6.52 11.5 0.510
#> 4 Ensenanza 1 ARIMA(1,1,1… Test 5.38 3.35 3.90 3.28 6.00 0.730
#> 5 Ensenanza 2 SEASONAL DE… Test 5.56 3.46 4.03 3.38 6.21 0.726
#> 6 Ensenanza 3 NNAR(1,1,10… Test 2.73 1.70 1.98 1.69 3.05 0.874
#> 7 Administra… 1 ARIMA(0,1,1… Test 6.10 3.96 12.6 3.86 7.05 0.0384
#> 8 Administra… 2 SEASONAL DE… Test 6.45 4.19 13.4 4.07 7.61 0.0480
#> 9 Administra… 3 NNAR(1,1,10… Test 6.26 4.07 13.0 3.97 6.88 0.0524
```### modeltime\_multiforecast()
``` r
forecast_emae <- modeltime_multiforecast(
model_table_emae$table_time,
.prop = 0.8
)
`````` r
forecast_emae %>%
select(sector, nested_forecast) %>%
unnest(nested_forecast) %>%
group_by(sector) %>%
plot_modeltime_forecast(
.legend_max_width = 12,
.facet_ncol = 2,
.line_size = 0.5,
.interactive = FALSE,
.facet_scales = 'free_y',
.title='Forecasting test')
```### modeltime\_multibestmodel()
``` r
best_model_emae <- modeltime_multibestmodel(
.table = model_table_emae$table_time,
.metric = "rmse",
.minimize = TRUE,
.forecast = FALSE
)best_model_emae
#> # A tibble: 3 x 8
#> sector nested_column m_auto_arima m_stlm_arima m_nnetar nested_model
#>
#> 1 Comercio 2 Ensenanza 3 Administrac… # … with 2 more variables: calibration , best_model
```### modeltime\_multirefit()
``` r
model_refit_emae <- modeltime_multirefit(models_table = best_model_emae)
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 year
#> frequency = 12 observations per 1 yearmodel_refit_emae
#> # A tibble: 3 x 8
#> sector nested_column m_auto_arima m_stlm_arima m_nnetar nested_model
#>
#> 1 Comercio 2 Ensenanza 3 Administrac… # … with 2 more variables: calibration , best_model
`````` r
forecast_emae <- modeltime_multiforecast(
model_refit_emae,
.prop = 0.8,
.h = "1 years"
)
`````` r
forecast_emae %>%
select(sector, nested_forecast) %>%
unnest(nested_forecast) %>%
group_by(sector) %>%
plot_modeltime_forecast(
.legend_max_width = 12,
.facet_ncol = 2,
.line_size = 0.5,
.interactive = FALSE,
.facet_scales = 'free_y',
.title='Forecasting'
)
```## Others functions :cyclone:
### 🔹 Function multieval()
For a set of predictions from different models, it allows you to
evaluate multiple metrics and return the results in a tabular format
that makes it easy to compare the predictions.``` r
library(yardstick)
library(erer)set.seed(123)
predictions =
data.frame(truth = runif(100),
predict_model_1 = rnorm(100, mean = 1,sd =2),
predict_model_2 = rnorm(100, mean = 0,sd =2))tibble(predictions)
#> # A tibble: 100 x 3
#> truth predict_model_1 predict_model_2
#>
#> 1 0.288 1.51 1.58
#> 2 0.788 0.943 1.54
#> 3 0.409 0.914 0.664
#> 4 0.883 3.74 -2.02
#> 5 0.940 0.548 -0.239
#> 6 0.0456 4.03 -0.561
#> 7 0.528 -2.10 1.13
#> 8 0.892 2.17 -0.745
#> 9 0.551 1.25 1.95
#> 10 0.457 1.43 -0.749
#> # … with 90 more rows
`````` r
multieval(.dataset = predictions,
.observed = "truth",
.predictions = c("predict_model_1","predict_model_2"),
.metrics = listn(rmse, rsq, mae))
#> $summary_table
#> # A tibble: 2 x 4
#> modelo rmse rsq mae
#>
#> 1 predict_model_1 1.99 0.000704 1.59
#> 2 predict_model_2 1.95 0.00115 1.61
```### 🔹 Function insert\_na()
This function allows adding NA values to a data frame, being able to
select the columns and the proportion of NAs desired.``` r
insert_na(.dataset = iris, columns = c("Sepal.Length","Petal.Length"), .p = 0.25)
#> # A tibble: 150 x 5
#> Sepal.Width Petal.Width Species Sepal.Length Petal.Length
#>
#> 1 3.5 0.2 setosa 5.1 NA
#> 2 3 0.2 setosa NA 1.4
#> 3 3.2 0.2 setosa 4.7 1.3
#> 4 3.1 0.2 setosa NA 1.5
#> 5 3.6 0.2 setosa NA 1.4
#> 6 3.9 0.4 setosa 5.4 1.7
#> 7 3.4 0.3 setosa 4.6 1.4
#> 8 3.4 0.2 setosa NA 1.5
#> 9 2.9 0.2 setosa 4.4 1.4
#> 10 3.1 0.1 setosa 4.9 1.5
#> # … with 140 more rows
```## Website
[sknifedatar website](https://rafzamb.github.io/sknifedatar/)