https://github.com/nredell/forecastml

An R package with Python support for multi-step-ahead forecasting with machine learning and deep learning algorithms
https://github.com/nredell/forecastml
deep-learning direct-forecasting forecast forecasting machine-learning multi-step-ahead-forecasting neural-network package python r r-package time-series
Last synced: about 1 month ago
JSON representation
An R package with Python support for multi-step-ahead forecasting with machine learning and deep learning algorithms
Host: GitHub
URL: https://github.com/nredell/forecastml
Owner: nredell
License: other
Created: 2019-02-10T21:33:53.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-06-11T03:48:43.000Z (almost 5 years ago)
Last Synced: 2024-08-23T09:18:46.950Z (9 months ago)
Topics: deep-learning, direct-forecasting, forecast, forecasting, machine-learning, multi-step-ahead-forecasting, neural-network, package, python, r, r-package, time-series
Language: R
Homepage:
Size: 28.9 MB
Stars: 131
Watchers: 13
Forks: 23
Open Issues: 11
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        


  



[![CRAN](https://www.r-pkg.org/badges/version/forecastML)](https://cran.r-project.org/package=forecastML)

[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)

[![Travis Build

Status](https://travis-ci.org/nredell/forecastML.svg?branch=master)](https://travis-ci.org/nredell/forecastML) 

[![codecov](https://codecov.io/github/nredell/forecastML/branch/master/graphs/badge.svg)](https://codecov.io/github/nredell/forecastML)

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nredell/forecastML/master?urlpath=https%3A%2F%2Fgithub.com%2Fnredell%2FforecastML%2Ftree%2Fmaster%2Fnotebooks%2F)

# package::forecastML 

The purpose of `forecastML` is to provide a series of functions and visualizations that simplify the process of 

**multi-step-ahead forecasting with standard machine learning algorithms**. It's a wrapper package aimed at providing maximum flexibility in model-building--**choose any machine learning algorithm from any `R` or `Python` package**--while helping the user quickly assess the (a) accuracy, (b) stability, and (c) generalizability of grouped (i.e., 

multiple related time series) and ungrouped forecasts produced from potentially high-dimensional modeling datasets.

This package is inspired by Bergmeir, Hyndman, and Koo's 2018 paper 

[A note on the validity of cross-validation for evaluating autoregressive time series prediction](https://doi.org/10.1016/j.csda.2017.11.003). 

which supports--under certain conditions--forecasting with high-dimensional ML models **without having to use methods that are time series specific**. 

The following quote from Bergmeir et al.'s article nicely sums up the aim of this package:

> "When purely (non-linear, nonparametric) autoregressive methods are applied to forecasting problems, as is often the case

> (e.g., when using Machine Learning methods), the aforementioned problems of CV are largely

> irrelevant, and CV can and should be used without modification, as in the independent case."

## Featured Notebooks

* **[Forecasting with big data - Spark and H2O](https://github.com/nredell/forecastML/blob/master/notebooks/Forecasting%20with%20big%20data%20-%20Spark%20and%20H2O.ipynb)**

* **[Forecasting with Python - scikit-learn in parallel](https://github.com/nredell/forecastML/blob/master/notebooks/python_sklearn_and_r_in_parallel/Forecasting%20with%20Python%20-%20scikit%20learn%20in%20parallel.ipynb)**

* **[Forecast reconciliation across planning horizons - coherent weekly ML and monthly ARIMA forecasts](https://github.com/nredell/forecastML/blob/master/notebooks/forecast_reconciliation/Forecast%20reconciliation%20across%20planning%20horizons%20-%20coherent%20weekly%20ML%20and%20monthly%20ARIMA%20forecasts.ipynb)**

User-contributed notebooks welcome!

## Lightning Example

* Requires `packageVersion("forecastML")` >= v0.9.1

``` r

library(glmnet)

library(forecastML)

data("data_seatbelts", package = "forecastML")

data_train <- forecastML::create_lagged_df(data_seatbelts, type = "train", method = "direct",

                                           outcome_col = 1, lookback = 1:15, horizons = 1:12)

windows <- forecastML::create_windows(data_train, window_length = 0)

model_fn <- function(data) {

  x <- as.matrix(data[, -1, drop = FALSE])

  y <- as.matrix(data[, 1, drop = FALSE])

  model <- glmnet::cv.glmnet(x, y)

}

model_results <- forecastML::train_model(data_train, windows, model_name = "LASSO", model_function = model_fn)

predict_fn <- function(model, data) {

  data_pred <- as.data.frame(predict(model, as.matrix(data)))

}

data_fit <- predict(model_results, prediction_function = list(predict_fn), data = data_train)

residuals <- residuals(data_fit)

data_forecast <- forecastML::create_lagged_df(data_seatbelts, type = "forecast", method = "direct",

                                              outcome_col = 1, lookback = 1:15, horizons = 1:12)

data_forecasts <- predict(model_results, prediction_function = list(predict_fn), data = data_forecast)

data_forecasts <- forecastML::combine_forecasts(data_forecasts)

set.seed(224)

data_forecasts <- forecastML::calculate_intervals(data_forecasts, residuals, 

                                                  levels = seq(.5, .95, .05), times = 200)

plot(data_forecasts, data_seatbelts[-(1:160), ], (1:nrow(data_seatbelts))[-(1:160)], interval_alpha = seq(.1, .2, length.out = 10))

```

![](./tools/lightning_example.png)

## README Contents

* **[Install](#install)**

* **[Approach to forecasting](#approach-to-forecasting)**

* **[Vignettes](#vignettes)**

* **[Cheat sheets](#cheat-sheets)**

* **[FAQ](#faq)**

* **Examples**

    + **[Forecasting numeric outcomes](#examples---numeric-outcomes-with-r-and-python)**

        + **[Direct forecasting](#direct-forecast-in-r)**

        + **[Multi-output forecasting](#multi-output-forecast-in-r)**

    + **[Forecasting factor outcomes (forecasting sequences)](#examples---factor-outcomes-with-r-and-python)**

## Install

* CRAN

``` r

install.packages("forecastML")

library(forecastML)

```

* Development

``` r

remotes::install_github("nredell/forecastML")

library(forecastML)

```

## Approach to Forecasting

### Direct forecasting

The direct forecasting approach used in `forecastML` involves the following steps:

**1.** Build a series of horizon-specific short-, medium-, and long-term forecast models.

**2.** Assess model generalization performance across a variety of heldout datasets through time.

**3.** Select those models that consistently performed the best at each forecast horizon and 

combine them to produce a single ensemble forecast.

* Below is a plot of 5 forecast models used to produce a single 12-step-ahead forecast where each color 

represents a distinct horizon-specific ML model. From left to right these models are:

* **1**: A feed-forward neural network (purple); **2**: An ensemble of ML models; 

**3**: A boosted tree model; **4**: A LASSO regression model; **5**: A LASSO regression model (yellow).

![](./tools/forecastML_plot.png)

* Below is a similar combination of horizon-specific models with a factor outcome and forecasting factor 

probabilities 12 steps ahead.

![](./tools/forecastML_factor_plot.png)

### Multi-output forecasting

The multi-output forecasting approach used in `forecastML` involves the following steps:

**1.** Build a single multi-output model that simultaneously forecasts over both short- and long-term forecast horizons.

**2.** Assess model generalization performance across a variety of heldout datasets through time.

**3.** Select the hyperparamters that minimize forecast error over all the relevant forecast horizons and re-train.

## Vignettes

The main functions covered in each vignette are shown below as `function()`.

* Detailed **[forecastML overview vignette](https://nredell.github.io/forecastML/doc/package_overview.html)**. 

`create_lagged_df()`, `create_windows()`, `train_model()`, `return_error()`, `return_hyper()`, `combine_forecasts()`

* **[Creating custom feature lags for model training](https://nredell.github.io/forecastML/doc/lagged_features.html)**. `create_lagged_df(lookback_control = ...)`

* **[Direct Forecasting with multiple or grouped time series](https://nredell.github.io/forecastML/doc/grouped_forecast.html)**. 

`fill_gaps()`, 

`create_lagged_df(dates = ..., dynamic_features = ..., groups = ..., static_features = ...)`, `create_windows()`, `train_model()`, `combine_forecasts()`

* **[Direct Forecasting with multiple or grouped time series - Sequences](https://nredell.github.io/forecastML/doc/grouped_forecast_sequences.html)**. 

`fill_gaps()`, 

`create_lagged_df(dates = ..., dynamic_features = ..., groups = ..., static_features = ...)`, `create_windows()`, `train_model()`, `combine_forecasts()`

* **[Customizing the user-defined wrapper functions](https://nredell.github.io/forecastML/doc/custom_functions.html)**. 

`train()` and `predict()`

* **[Forecast combinations](https://nredell.github.io/forecastML/doc/combine_forecasts)**. `combine_forecasts()`

## Cheat Sheets

![](./tools/forecastML_cheat_sheet.PNG)

1. **`fill_gaps`:** Optional if no temporal gaps/missing rows in data collection. Fill gaps in data collection and 

prepare a dataset of evenly-spaced time series for modeling with lagged features. Returns a 'data.frame' with 

missing rows added in so that you can either (a) impute, remove, or ignore `NA`s prior to the `forecastML` pipeline 

or (b) impute, remove, or ignore them in the user-defined modeling function--depending on the `NA` handling 

capabilities of the user-specified model.

2. **`create_lagged_df`:** Create model training and forecasting datasets with lagged, grouped, dynamic, and static features.

3. **`create_windows`:** Create time-contiguous validation datasets for model evaluation.

4. **`train_model`:** Train the user-defined model across forecast horizons and validation datasets.

5. **`return_error`:** Compute forecast error across forecast horizons and validation datasets.

6. **`return_hyper`:** Return user-defined model hyperparameters across validation datasets.

7. **`combine_forecasts`:** Combine multiple horizon-specific forecast models to produce one forecast.

![](./tools/forecastML_cheat_sheet_data.PNG)




![](./tools/forecastML_cheat_sheet_model.PNG)

## FAQ

* **Q:** Where does `forecastML` fit in with respect to popular `R` machine learning packages like [mlr3](https://mlr3.mlr-org.com/) and [caret](https://github.com/topepo/caret)?

* **A:** The idea is that `forecastML` takes care of the tedious parts of forecasting with ML methods: creating training and forecasting datasets with different 

types of features--grouped, static, and dynamic--as well as simplifying validation dataset creation to assess model performance at specific points in time. 

That said, the workflow for packages like `mlr3` and `caret` would mostly occur inside of the user-supplied 

modeling function which is passed into `forecastML::train_model()`. Refer to the wrapper function customization 

vignette for more details.

* **Q:** How do I get the model training and forecasting datasets as well as the trained models out of the 

`forecastML` pipeline?

* **A:** After running `forecastML::create_lagged_df()` with either `type = "train"` or `type = "forecast"`, 

the `data.frame`s can be accessed with `my_lagged_df$horizon_h` where "h" is an integer marking the 

horizon-specific dataset (e.g., the value(s) passed in `horizons = ...`). The trained models from 

`forecastML::train_model()` can be accessed with `my_trained_model$horizon_h$window_w$model` where "w" is 

the validation window number from `forecastML::create_windows()`.

## Examples - Numeric Outcomes with R and Python

### Direct forecast in R

Below is an example of how to create 12 horizon-specific ML models to forecast the number of `DriversKilled` 

12 time periods into the future using the `Seatbelts` dataset. Notice in the last plot that there are multiple forecasts; 

these are from the slightly different LASSO models trained in the nested cross-validation. An example of selecting optimal 

hyperparameters and retraining to create a single forecast model (i.e., `create_windows(..., window_length = 0)`) can be found 

in the overview vignette.

``` r

library(glmnet)

library(forecastML)

# Sampled Seatbelts data from the R package datasets.

data("data_seatbelts", package = "forecastML")

# Example - Training data for 12 horizon-specific models w/ common lags per feature. The data do 

# not have any missing rows or temporal gaps in data collection; if there were gaps, 

# we would need to use fill_gaps() first.

horizons <- 1:12  # 12 models that forecast 1, 1:2, 1:3, ..., and 1:12 time steps ahead.

lookback <- 1:15  # A lookback of 1 to 15 dataset rows (1:15 * 'date frequency' if dates are given).

#------------------------------------------------------------------------------

# Create a dataset of lagged features for modeling.

data_train <- forecastML::create_lagged_df(data_seatbelts, type = "train",

                                           outcome_col = 1, lookback = lookback,

                                           horizon = horizons)

#------------------------------------------------------------------------------

# Create validation datasets for outer-loop nested cross-validation.

windows <- forecastML::create_windows(data_train, window_length = 12)

#------------------------------------------------------------------------------

# User-define model - LASSO

# A user-defined wrapper function for model training that takes the following

# arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = "train")

# (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments

# which can also be passed in '...' in train_model(). The function returns a model object suitable for 

# the user-defined predict function. The returned model may also be a list that holds meta-data such 

# as hyperparameter settings.

model_function <- function(data, my_outcome_col) {  # my_outcome_col = 1 could be defined here.

  x <- data[, -(my_outcome_col), drop = FALSE]

  y <- data[, my_outcome_col, drop = FALSE]

  x <- as.matrix(x, ncol = ncol(x))

  y <- as.matrix(y, ncol = ncol(y))

  model <- glmnet::cv.glmnet(x, y)

  return(model)  # This model is the first argument in the user-defined predict() function below.

}

#------------------------------------------------------------------------------

# Train a model across forecast horizons and validation datasets.

# my_outcome_col = 1 is passed in ... but could have been defined in the user-defined model function.

model_results <- forecastML::train_model(data_train,

                                         windows = windows,

                                         model_name = "LASSO", 

                                         model_function = model_function,

                                         my_outcome_col = 1,  # ...

                                         use_future = FALSE)

#------------------------------------------------------------------------------

# User-defined prediction function - LASSO

# The predict() wrapper function takes 2 positional arguments. First,

# the returned model from the user-defined modeling function (model_function() above).

# Second, a data.frame of model features. If predicting on validation data, expect the input data to be 

# passed in the same format as returned by create_lagged_df(type = 'train') but with the outcome column 

# removed. If forecasting, expect the input data to be in the same format as returned by 

# create_lagged_df(type = 'forecast') but with the 'index' and 'horizon' columns removed. The function 

# can return a 1- or 3-column data.frame with either (a) point

# forecasts or (b) point forecasts plus lower and upper forecast bounds (column order and names do not matter).

prediction_function <- function(model, data_features) {

  x <- as.matrix(data_features, ncol = ncol(data_features))

  data_pred <- data.frame("y_pred" = predict(model, x, s = "lambda.min"),  # 1 column is required.

                          "y_pred_lower" = predict(model, x, s = "lambda.min") - 50,  # optional.

                          "y_pred_upper" = predict(model, x, s = "lambda.min") + 50)  # optional.

  return(data_pred)

}

# Predict on the validation datasets.

data_valid <- predict(model_results, prediction_function = list(prediction_function), data = data_train)

#------------------------------------------------------------------------------

# Plot forecasts for each validation dataset.

plot(data_valid, horizons = c(1, 6, 12))

#------------------------------------------------------------------------------

# Forecast.

# Forward-looking forecast data.frame.

data_forecast <- forecastML::create_lagged_df(data_seatbelts, type = "forecast",

                                              outcome_col = 1, lookback = lookback, horizons = horizons)

# Forecasts.

data_forecasts <- predict(model_results, prediction_function = list(prediction_function), data = data_forecast)

# We'll plot a background dataset of actuals as well.

plot(data_forecasts,

     data_actual = data_seatbelts[-(1:150), ], 

     actual_indices = as.numeric(row.names(data_seatbelts[-(1:150), ])), 

     horizons = c(1, 6, 12), windows = c(5, 10, 15))

```

![](./tools/validation_data_forecasts.png)

![](./tools/forecasts.png)

***

### Direct forecast in R & Python

Now we'll look at an example similar to above. The main difference is that our user-defined modeling 

and prediction functions are now written in `Python`. Thanks to the [reticulate](https://github.com/rstudio/reticulate) 

`R` package, entire ML workflows already written in `Python` can be imported into `forecastML` with the 

simple addition of 2 lines of `R` code.

* The `reticulate::source_python()` function will run a .py file and import any objects into your `R` environment. As we'll 

see below, we'll only be importing library calls and functions to keep our `R` environment clean.

``` r

library(forecastML)

library(reticulate)  # Move Python objects in and out of R. See the reticulate package for setup info.

reticulate::source_python("modeling_script.py")  # Run a Python file and import objects into R.

```




* Below is a simple, slightly different `forecastML` setup for the seatbelt forecasting problem from the 

previous example.

``` r

data("data_seatbelts", package = "forecastML")

horizons <- c(1, 12)  # 2 models that forecast 1 and 1:12 time steps ahead.

# A lookback across select time steps in the past. Feature lags 1 through 9 will be silently dropped from the 12-step-ahead model.

lookback <- c(1, 3, 6, 9, 12, 15)

date_frequency <- "1 month"  # Time step frequency.

# The date indices, which don't come with the stock dataset, should not be included in the modeling data.frame.

dates <- seq(as.Date("1969-01-01"), as.Date("1984-12-01"), by = date_frequency)

# Create a dataset of features for modeling.

data_train <- forecastML::create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,

                                           lookback = lookback, horizon = horizons,

                                           dates = dates, frequency = date_frequency)

# Create 2 custom validation datasets for outer-loop nested cross-validation. The purpose of

# the multiple validation windows is to assess expected forecast accuracy for specific

# time periods while supporting an investigation of the hyperparameter stability for

# models trained on different time periods. Validation windows can overlap.

window_start <- c(as.Date("1983-01-01"), as.Date("1984-01-01"))

window_stop <- c(as.Date("1983-12-01"), as.Date("1984-12-01"))

windows <- forecastML::create_windows(data_train, window_start = window_start, window_stop = window_stop)

```




#### modeling_script.py

* Let's look at the content of our `Python` modeling file that we source()'d above. The `Python` wrapper function inputs 

and returns for `py_model_function()` and `py_prediction_function()` are the same as their `R` counterparts. Just 

be sure to expect and return `pandas` `DataFrame`s as conversion from `numpy` arrays has not been tested.

``` python

import pandas as pd

from sklearn import linear_model

from sklearn.preprocessing import StandardScaler

# User-defined model.

# A user-defined wrapper function for model training that takes the following

# arguments: (1) a horizon-specific pandas DataFrame made with create_lagged_df(..., type = "train")

# (e.g., my_lagged_df$horizon_h)

def py_model_function(data):

  

  X = data.iloc[:, 1:]

  y = data.iloc[:, 0]

  

  scaler = StandardScaler()

  X = scaler.fit_transform(X)

  

  model_lasso = linear_model.Lasso(alpha = 0.1)

  

  model_lasso.fit(X = X, y = y)

  

  return({'model': model_lasso, 'scaler': scaler})

# User-defined prediction function.

# The predict() wrapper function takes 2 positional arguments. First,

# the returned model from the user-defined modeling function (py_model_function() above).

# Second, a pandas DataFrame of model features. For numeric outcomes, the function 

# can return a 1- or 3-column pandas DataFrame with either (a) point

# forecasts or (b) point forecasts plus lower and upper forecast bounds (column order and names do not matter).

def py_prediction_function(model_list, data_x):

  

  data_x = model_list['scaler'].transform(data_x)

  

  data_pred = pd.DataFrame({'y_pred': model_list['model'].predict(data_x)})

  

  return(data_pred)

```




* Train and predict on historical validation data with the imported `Python` wrapper functions.

``` r

# Train a model across forecast horizons and validation datasets.

model_results <- forecastML::train_model(data_train,

                                         windows = windows,

                                         model_name = "LASSO",

                                         model_function = py_model_function,

                                         use_future = FALSE)

# Predict on the validation datasets.

data_valid <- predict(model_results, prediction_function = list(py_prediction_function), data = data_train)

# Plot forecasts for each validation dataset.

plot(data_valid, horizons = c(1, 12))

```

![](./tools/validation_data_forecasts_python.png)




* Forecast with the same imported `Python` wrapper functions. The final wrapper functions may eventually have 

fixed hyperparameters or complicated model ensembles based on repeated model training and investigation.

``` r

# Forward-looking forecast data.frame.

data_forecast <- forecastML::create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,

                                              lookback = lookback, horizon = horizons,

                                              dates = dates, frequency = date_frequency)

# Forecasts.

data_forecasts <- predict(model_results, prediction_function = list(py_prediction_function),

                          data = data_forecast)

# We'll plot a background dataset of actuals as well.

plot(data_forecasts, data_actual = data_seatbelts[-(1:150), ], 

     actual_indices = dates[-(1:150)], horizons = c(1, 12))

```

![](./tools/forecasts_python.png)

***

### Multi-output forecast in R

* This is the same seatbelt dataset example except now, instead of 1 model for each 

forecast horizon, we'll build 1 multi-output neural network model that forecasts 12 

steps into the future.

* Given that this is a small dataset, the multi-output approach would require a decent 

amount of tuning to produce accurate results. An alternative would be to forecast, say, 

horizons 6 through 12 if longer term forecasts were of interest to reduce the number of 

parameters; the output neurons do not have to start at a horizon of 1 or even be contiguous.

``` r

library(forecastML)

library(keras)  # Using the TensorFlow 2.0 backend.

data("data_seatbelts", package = "forecastML")

data_seatbelts[] <- lapply(data_seatbelts, function(x) {

  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)

})

date_frequency <- "1 month"

dates <- seq(as.Date("1969-01-01"), as.Date("1984-12-01"), by = date_frequency)

data_train <- forecastML::create_lagged_df(data_seatbelts, type = "train", method = "multi_output",

                                           outcome_col = 1, lookback = 1:15, horizons = 1:12,

                                           dates = dates, frequency = date_frequency,

                                           dynamic_features = "law")

# 'window_length = 0' creates 1 historical training dataset with no external validation datasets. 

# Set it to, say, 24 to see the model and forecast stability when trained across different slices 

# of historical data.

windows <- forecastML::create_windows(data_train, window_length = 0)

#------------------------------------------------------------------------------

# 'data_y' consists of 1 column for each forecast horizon--here, 12.

model_fun <- function(data, horizons) {  # 'horizons' is passed in train_model().

  data_x <- apply(as.matrix(data[, -(1:length(horizons))]), 2, function(x){ifelse(is.na(x), 0, x)})

  data_y <- apply(as.matrix(data[, 1:length(horizons)]), 2, function(x){ifelse(is.na(x), 0, x)})

  layers_x_input <- keras::layer_input(shape = ncol(data_x))

  layers_x_output <- layers_x_input %>%

    keras::layer_dense(ncol(data_x), activation = "relu") %>%

    keras::layer_dense(ncol(data_x), activation = "relu") %>%

    keras::layer_dense(length(horizons))

  model <- keras::keras_model(inputs = layers_x_input, outputs = layers_x_output) %>%

    keras::compile(optimizer = 'adam', loss = 'mean_absolute_error')

  early_stopping <- callback_early_stopping(monitor = 'val_loss', patience = 2)

  tensorflow::tf$random$set_seed(224)

  model_results <- model %>%

    keras::fit(x = list(as.matrix(data_x)), y = list(as.matrix(data_y)),

               validation_split = 0.2, callbacks = c(early_stopping), verbose = FALSE)

  return(list("model" = model, "model_results" = model_results))

}

#------------------------------------------------------------------------------

# The predict() wrapper function will return a data.frame with a number of columns 

# equaling the number of forecast horizons.

prediction_fun <- function(model, data_features) {

  data_features[] <- lapply(data_features, function(x){ifelse(is.na(x), 0, x)})

  data_features <- list(as.matrix(data_features, ncol = ncol(data_features)))

  data_pred <- data.frame(predict(model$model, data_features))

  names(data_pred) <- paste0("y_pred_", 1:ncol(data_pred))

  return(data_pred)

}

#------------------------------------------------------------------------------

model_results <- forecastML::train_model(data_train, windows, model_name = "Multi-Output NN",

                                         model_function = model_fun,

                                         horizons = 1:12)

data_valid <- predict(model_results, prediction_function = list(prediction_fun), data = data_train)

# We'll plot select forecast horizons to reduce visual clutter.

plot(data_valid, facet = ~ model, horizons = c(1, 3, 6, 12))

```

![](./tools/multi_outcome_train_plot.png)

* Forecast combinations from `combine_forecasts()` aren't necessary as we've trained only 1 model.

``` r

data_forecast <- forecastML::create_lagged_df(data_seatbelts, type = "forecast", method = "multi_output",

                                              outcome_col = 1, lookback = 1:15, horizons = 1:12,

                                              dates = dates, frequency = date_frequency,

                                              dynamic_features = "law")

data_forecasts <- predict(model_results, prediction_function = list(prediction_fun), data = data_forecast)

plot(data_forecasts, facet = NULL, data_actual = data_seatbelts[-(1:100), ], actual_indices = dates[-(1:100)])

```

![](./tools/multi_outcome_forecast_plot.png)

## Examples - Factor Outcomes with R and Python

### R

* This example is similar to the numeric outcome examples with the exception that the outcome has been 

factorized to illustrate how factors or sequences are forecasted.

``` r

data("data_seatbelts", package = "forecastML")

# Create an artifical factor outcome for illustration' sake.

data_seatbelts$DriversKilled <- cut(data_seatbelts$DriversKilled, 3)

horizons <- c(1, 12)  # 2 models that forecast 1 and 1:12 time steps ahead.

# A lookback across select time steps in the past. Feature lag 1 will be silently dropped from the 12-step-ahead model.

lookback <- c(1, 12, 18)

date_frequency <- "1 month"  # Time step frequency.

# The date indices, which don't come with the stock dataset, should not be included in the modeling data.frame.

dates <- seq(as.Date("1969-01-01"), as.Date("1984-12-01"), by = date_frequency)

# Create a dataset of features for modeling.

data_train <- forecastML::create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,

                                           lookback = lookback, horizon = horizons,

                                           dates = dates, frequency = date_frequency)

# We won't use nested cross-validation; rather, we'll train a model over the entire training dataset.

windows <- forecastML::create_windows(data_train, window_length = 0)

# This is the model-training dataset.

plot(windows, data_train)

```

![](./tools/sequence_windows.png)

* Model training and historical fit.

``` r

model_function <- function(data, my_outcome_col) {  # my_outcome_col = 1 could be defined here.

  

  outcome_names <- names(data)[1]

  model_formula <- formula(paste0(outcome_names,  "~ ."))

  

  set.seed(224)

  model <- randomForest::randomForest(formula = model_formula, data = data, ntree = 3)

  return(model)  # This model is the first argument in the user-defined predict() function below.

}

#------------------------------------------------------------------------------

# Train a model across forecast horizons and validation datasets.

# my_outcome_col = 1 is passed in ... but could have been defined in the user-defined model function.

model_results <- forecastML::train_model(data_train,

                                         windows = windows,

                                         model_name = "RF", 

                                         model_function = model_function,

                                         my_outcome_col = 1,  # ...

                                         use_future = FALSE)

#------------------------------------------------------------------------------

# User-defined prediction function.

#

# The predict() wrapper function takes 2 positional arguments. First,

# the returned model from the user-defined modeling function (model_function() above).

# Second, a data.frame of model features. If predicting on validation data, expect the input data to be 

# passed in the same format as returned by create_lagged_df(type = 'train') but with the outcome column 

# removed. If forecasting, expect the input data to be in the same format as returned by 

# create_lagged_df(type = 'forecast') but with the 'index' and 'horizon' columns removed.

# 

# For factor outcomes, the function can return either (a) a 1-column data.frame with factor level 

# predictions or (b) an L-column data.frame of predicted class probabilities where 'L' equals the 

# number of levels in the outcome; the order of the return()'d columns should match the order of the 

# outcome factor levels from left to right which is the default behavior of most predict() functions.

# Predict/forecast a single factor level.

prediction_function_level <- function(model, data_features) {

  

  data_pred <- data.frame("y_pred" = predict(model, data_features, type = "response"))

  

  return(data_pred)

}

# Predict/forecast outcome class probabilities.

prediction_function_prob <- function(model, data_features) {

  

  data_pred <- data.frame("y_pred" = predict(model, data_features, type = "prob"))

  

  return(data_pred)

}

# Predict on the validation datasets.

data_valid_level <- predict(model_results, 

                            prediction_function = list(prediction_function_level), 

                            data = data_train)

data_valid_prob <- predict(model_results, 

                           prediction_function = list(prediction_function_prob), 

                           data = data_train)

```

* Predict historical factor levels.

* With `window_length = 0` these are essentially plots of model fit.

``` r

plot(data_valid_level, horizons = c(1, 12))

```

![](./tools/sequence_valid_level.png)

* Predict historical class probabilities.

``` r

plot(data_valid_prob, horizons = c(1, 12))

```

![](./tools/sequence_valid_prob.png)

* Forecast

``` r

# Forward-looking forecast data.frame.

data_forecast <- forecastML::create_lagged_df(data_seatbelts, type = "forecast",

                                              outcome_col = 1, lookback = lookback, horizons = horizons)

# Forecasts.

data_forecasts_level <- predict(model_results,

                                prediction_function = list(prediction_function_level),

                                data = data_forecast)

data_forecasts_prob <- predict(model_results,

                                prediction_function = list(prediction_function_prob),

                                data = data_forecast)

```

* Forecast factor levels

``` r

plot(data_forecasts_level)

```

![](./tools/sequence_forecast_level.png)

* Forecast class probabilities

``` r

plot(data_forecasts_prob)

```

![](./tools/sequence_forecast_prob.png)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nredell/forecastml

Awesome Lists containing this project

README