{"id":13400561,"url":"https://github.com/nredell/forecastML","last_synced_at":"2025-03-14T06:31:45.799Z","repository":{"id":56936365,"uuid":"170029881","full_name":"nredell/forecastML","owner":"nredell","description":"An R package with Python support for multi-step-ahead forecasting with machine learning and deep learning algorithms","archived":false,"fork":false,"pushed_at":"2020-06-11T03:48:43.000Z","size":30345,"stargazers_count":131,"open_issues_count":11,"forks_count":23,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-23T09:18:46.950Z","etag":null,"topics":["deep-learning","direct-forecasting","forecast","forecasting","machine-learning","multi-step-ahead-forecasting","neural-network","package","python","r","r-package","time-series"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nredell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-10T21:33:53.000Z","updated_at":"2024-08-13T20:52:17.000Z","dependencies_parsed_at":"2022-08-21T01:10:28.946Z","dependency_job_id":null,"html_url":"https://github.com/nredell/forecastML","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FforecastML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FforecastML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FforecastML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nredell%2FforecastML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nredell","download_url":"https://codeload.github.com/nredell/forecastML/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221440198,"owners_count":16821600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","direct-forecasting","forecast","forecasting","machine-learning","multi-step-ahead-forecasting","neural-network","package","python","r","r-package","time-series"],"created_at":"2024-07-30T19:00:53.334Z","updated_at":"2025-03-14T06:31:45.784Z","avatar_url":"https://github.com/nredell.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"tools/direct_forecast_illustration.PNG\" width=\"400px\"\u003e\u003c/img\u003e\n\u003c/p\u003e\n\n[![CRAN](https://www.r-pkg.org/badges/version/forecastML)](https://cran.r-project.org/package=forecastML)\n[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)\n[![Travis Build\nStatus](https://travis-ci.org/nredell/forecastML.svg?branch=master)](https://travis-ci.org/nredell/forecastML) \n[![codecov](https://codecov.io/github/nredell/forecastML/branch/master/graphs/badge.svg)](https://codecov.io/github/nredell/forecastML)\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nredell/forecastML/master?urlpath=https%3A%2F%2Fgithub.com%2Fnredell%2FforecastML%2Ftree%2Fmaster%2Fnotebooks%2F)\n\n# package::forecastML \u003cimg src=\"./man/figures/forecastML_logo.png\" alt=\"forecastML logo\" align=\"right\" height=\"138.5\" style=\"display: inline-block;\"\u003e\n\nThe purpose of `forecastML` is to provide a series of functions and visualizations that simplify the process of \n**multi-step-ahead forecasting with standard machine learning algorithms**. It's a wrapper package aimed at providing maximum flexibility in model-building--**choose any machine learning algorithm from any `R` or `Python` package**--while helping the user quickly assess the (a) accuracy, (b) stability, and (c) generalizability of grouped (i.e., \nmultiple related time series) and ungrouped forecasts produced from potentially high-dimensional modeling datasets.\n\nThis package is inspired by Bergmeir, Hyndman, and Koo's 2018 paper \n[A note on the validity of cross-validation for evaluating autoregressive time series prediction](https://doi.org/10.1016/j.csda.2017.11.003). \nwhich supports--under certain conditions--forecasting with high-dimensional ML models **without having to use methods that are time series specific**. \n\nThe following quote from Bergmeir et al.'s article nicely sums up the aim of this package:\n\n\u003e \"When purely (non-linear, nonparametric) autoregressive methods are applied to forecasting problems, as is often the case\n\u003e (e.g., when using Machine Learning methods), the aforementioned problems of CV are largely\n\u003e irrelevant, and CV can and should be used without modification, as in the independent case.\"\n\n## Featured Notebooks\n\n* **[Forecasting with big data - Spark and H2O](https://github.com/nredell/forecastML/blob/master/notebooks/Forecasting%20with%20big%20data%20-%20Spark%20and%20H2O.ipynb)**\n\n* **[Forecasting with Python - scikit-learn in parallel](https://github.com/nredell/forecastML/blob/master/notebooks/python_sklearn_and_r_in_parallel/Forecasting%20with%20Python%20-%20scikit%20learn%20in%20parallel.ipynb)**\n\n* **[Forecast reconciliation across planning horizons - coherent weekly ML and monthly ARIMA forecasts](https://github.com/nredell/forecastML/blob/master/notebooks/forecast_reconciliation/Forecast%20reconciliation%20across%20planning%20horizons%20-%20coherent%20weekly%20ML%20and%20monthly%20ARIMA%20forecasts.ipynb)**\n\nUser-contributed notebooks welcome!\n\n\n## Lightning Example\n\n* Requires `packageVersion(\"forecastML\")` \u003e= v0.9.1\n\n``` r\nlibrary(glmnet)\nlibrary(forecastML)\n\ndata(\"data_seatbelts\", package = \"forecastML\")\n\ndata_train \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"train\", method = \"direct\",\n                                           outcome_col = 1, lookback = 1:15, horizons = 1:12)\n\nwindows \u003c- forecastML::create_windows(data_train, window_length = 0)\n\nmodel_fn \u003c- function(data) {\n  x \u003c- as.matrix(data[, -1, drop = FALSE])\n  y \u003c- as.matrix(data[, 1, drop = FALSE])\n  model \u003c- glmnet::cv.glmnet(x, y)\n}\n\nmodel_results \u003c- forecastML::train_model(data_train, windows, model_name = \"LASSO\", model_function = model_fn)\n\npredict_fn \u003c- function(model, data) {\n  data_pred \u003c- as.data.frame(predict(model, as.matrix(data)))\n}\n\ndata_fit \u003c- predict(model_results, prediction_function = list(predict_fn), data = data_train)\n\nresiduals \u003c- residuals(data_fit)\n\ndata_forecast \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"forecast\", method = \"direct\",\n                                              outcome_col = 1, lookback = 1:15, horizons = 1:12)\n\ndata_forecasts \u003c- predict(model_results, prediction_function = list(predict_fn), data = data_forecast)\n\ndata_forecasts \u003c- forecastML::combine_forecasts(data_forecasts)\n\nset.seed(224)\ndata_forecasts \u003c- forecastML::calculate_intervals(data_forecasts, residuals, \n                                                  levels = seq(.5, .95, .05), times = 200)\n\nplot(data_forecasts, data_seatbelts[-(1:160), ], (1:nrow(data_seatbelts))[-(1:160)], interval_alpha = seq(.1, .2, length.out = 10))\n```\n![](./tools/lightning_example.png)\n\n## README Contents\n\n* **[Install](#install)**\n* **[Approach to forecasting](#approach-to-forecasting)**\n* **[Vignettes](#vignettes)**\n* **[Cheat sheets](#cheat-sheets)**\n* **[FAQ](#faq)**\n* **Examples**\n    + **[Forecasting numeric outcomes](#examples---numeric-outcomes-with-r-and-python)**\n        + **[Direct forecasting](#direct-forecast-in-r)**\n        + **[Multi-output forecasting](#multi-output-forecast-in-r)**\n    + **[Forecasting factor outcomes (forecasting sequences)](#examples---factor-outcomes-with-r-and-python)**\n\n\n## Install\n\n* CRAN\n\n``` r\ninstall.packages(\"forecastML\")\nlibrary(forecastML)\n```\n\n* Development\n\n``` r\nremotes::install_github(\"nredell/forecastML\")\nlibrary(forecastML)\n```\n\n\n## Approach to Forecasting\n\n### Direct forecasting\n\nThe direct forecasting approach used in `forecastML` involves the following steps:\n\n**1.** Build a series of horizon-specific short-, medium-, and long-term forecast models.\n\n**2.** Assess model generalization performance across a variety of heldout datasets through time.\n\n**3.** Select those models that consistently performed the best at each forecast horizon and \ncombine them to produce a single ensemble forecast.\n\n* Below is a plot of 5 forecast models used to produce a single 12-step-ahead forecast where each color \nrepresents a distinct horizon-specific ML model. From left to right these models are:\n\n* **1**: A feed-forward neural network (purple); **2**: An ensemble of ML models; \n**3**: A boosted tree model; **4**: A LASSO regression model; **5**: A LASSO regression model (yellow).\n\n![](./tools/forecastML_plot.png)\n\n* Below is a similar combination of horizon-specific models with a factor outcome and forecasting factor \nprobabilities 12 steps ahead.\n\n![](./tools/forecastML_factor_plot.png)\n\n\n### Multi-output forecasting\n\nThe multi-output forecasting approach used in `forecastML` involves the following steps:\n\n**1.** Build a single multi-output model that simultaneously forecasts over both short- and long-term forecast horizons.\n\n**2.** Assess model generalization performance across a variety of heldout datasets through time.\n\n**3.** Select the hyperparamters that minimize forecast error over all the relevant forecast horizons and re-train.\n\n\n## Vignettes\n\nThe main functions covered in each vignette are shown below as `function()`.\n\n* Detailed **[forecastML overview vignette](https://nredell.github.io/forecastML/doc/package_overview.html)**. \n`create_lagged_df()`, `create_windows()`, `train_model()`, `return_error()`, `return_hyper()`, `combine_forecasts()`\n\n* **[Creating custom feature lags for model training](https://nredell.github.io/forecastML/doc/lagged_features.html)**. `create_lagged_df(lookback_control = ...)`\n\n* **[Direct Forecasting with multiple or grouped time series](https://nredell.github.io/forecastML/doc/grouped_forecast.html)**. \n`fill_gaps()`, \n`create_lagged_df(dates = ..., dynamic_features = ..., groups = ..., static_features = ...)`, `create_windows()`, `train_model()`, `combine_forecasts()`\n\n* **[Direct Forecasting with multiple or grouped time series - Sequences](https://nredell.github.io/forecastML/doc/grouped_forecast_sequences.html)**. \n`fill_gaps()`, \n`create_lagged_df(dates = ..., dynamic_features = ..., groups = ..., static_features = ...)`, `create_windows()`, `train_model()`, `combine_forecasts()`\n\n* **[Customizing the user-defined wrapper functions](https://nredell.github.io/forecastML/doc/custom_functions.html)**. \n`train()` and `predict()`\n\n* **[Forecast combinations](https://nredell.github.io/forecastML/doc/combine_forecasts)**. `combine_forecasts()`\n\n\n## Cheat Sheets\n\n![](./tools/forecastML_cheat_sheet.PNG)\n\n1. **`fill_gaps`:** Optional if no temporal gaps/missing rows in data collection. Fill gaps in data collection and \nprepare a dataset of evenly-spaced time series for modeling with lagged features. Returns a 'data.frame' with \nmissing rows added in so that you can either (a) impute, remove, or ignore `NA`s prior to the `forecastML` pipeline \nor (b) impute, remove, or ignore them in the user-defined modeling function--depending on the `NA` handling \ncapabilities of the user-specified model.\n\n2. **`create_lagged_df`:** Create model training and forecasting datasets with lagged, grouped, dynamic, and static features.\n\n3. **`create_windows`:** Create time-contiguous validation datasets for model evaluation.\n\n4. **`train_model`:** Train the user-defined model across forecast horizons and validation datasets.\n\n5. **`return_error`:** Compute forecast error across forecast horizons and validation datasets.\n\n6. **`return_hyper`:** Return user-defined model hyperparameters across validation datasets.\n\n7. **`combine_forecasts`:** Combine multiple horizon-specific forecast models to produce one forecast.\n\n![](./tools/forecastML_cheat_sheet_data.PNG)\n\n\u003cbr\u003e\n\n![](./tools/forecastML_cheat_sheet_model.PNG)\n\n\n## FAQ\n\n* **Q:** Where does `forecastML` fit in with respect to popular `R` machine learning packages like [mlr3](https://mlr3.mlr-org.com/) and [caret](https://github.com/topepo/caret)?\n* **A:** The idea is that `forecastML` takes care of the tedious parts of forecasting with ML methods: creating training and forecasting datasets with different \ntypes of features--grouped, static, and dynamic--as well as simplifying validation dataset creation to assess model performance at specific points in time. \nThat said, the workflow for packages like `mlr3` and `caret` would mostly occur inside of the user-supplied \nmodeling function which is passed into `forecastML::train_model()`. Refer to the wrapper function customization \nvignette for more details.\n\n* **Q:** How do I get the model training and forecasting datasets as well as the trained models out of the \n`forecastML` pipeline?\n* **A:** After running `forecastML::create_lagged_df()` with either `type = \"train\"` or `type = \"forecast\"`, \nthe `data.frame`s can be accessed with `my_lagged_df$horizon_h` where \"h\" is an integer marking the \nhorizon-specific dataset (e.g., the value(s) passed in `horizons = ...`). The trained models from \n`forecastML::train_model()` can be accessed with `my_trained_model$horizon_h$window_w$model` where \"w\" is \nthe validation window number from `forecastML::create_windows()`.\n\n\n## Examples - Numeric Outcomes with R and Python\n\n### Direct forecast in R\n\nBelow is an example of how to create 12 horizon-specific ML models to forecast the number of `DriversKilled` \n12 time periods into the future using the `Seatbelts` dataset. Notice in the last plot that there are multiple forecasts; \nthese are from the slightly different LASSO models trained in the nested cross-validation. An example of selecting optimal \nhyperparameters and retraining to create a single forecast model (i.e., `create_windows(..., window_length = 0)`) can be found \nin the overview vignette.\n\n``` r\nlibrary(glmnet)\nlibrary(forecastML)\n\n# Sampled Seatbelts data from the R package datasets.\ndata(\"data_seatbelts\", package = \"forecastML\")\n\n# Example - Training data for 12 horizon-specific models w/ common lags per feature. The data do \n# not have any missing rows or temporal gaps in data collection; if there were gaps, \n# we would need to use fill_gaps() first.\nhorizons \u003c- 1:12  # 12 models that forecast 1, 1:2, 1:3, ..., and 1:12 time steps ahead.\nlookback \u003c- 1:15  # A lookback of 1 to 15 dataset rows (1:15 * 'date frequency' if dates are given).\n\n#------------------------------------------------------------------------------\n# Create a dataset of lagged features for modeling.\ndata_train \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"train\",\n                                           outcome_col = 1, lookback = lookback,\n                                           horizon = horizons)\n\n#------------------------------------------------------------------------------\n# Create validation datasets for outer-loop nested cross-validation.\nwindows \u003c- forecastML::create_windows(data_train, window_length = 12)\n\n#------------------------------------------------------------------------------\n# User-define model - LASSO\n# A user-defined wrapper function for model training that takes the following\n# arguments: (1) a horizon-specific data.frame made with create_lagged_df(..., type = \"train\")\n# (e.g., my_lagged_df$horizon_h) and, optionally, (2) any number of additional named arguments\n# which can also be passed in '...' in train_model(). The function returns a model object suitable for \n# the user-defined predict function. The returned model may also be a list that holds meta-data such \n# as hyperparameter settings.\n\nmodel_function \u003c- function(data, my_outcome_col) {  # my_outcome_col = 1 could be defined here.\n\n  x \u003c- data[, -(my_outcome_col), drop = FALSE]\n  y \u003c- data[, my_outcome_col, drop = FALSE]\n  x \u003c- as.matrix(x, ncol = ncol(x))\n  y \u003c- as.matrix(y, ncol = ncol(y))\n\n  model \u003c- glmnet::cv.glmnet(x, y)\n  return(model)  # This model is the first argument in the user-defined predict() function below.\n}\n\n#------------------------------------------------------------------------------\n# Train a model across forecast horizons and validation datasets.\n# my_outcome_col = 1 is passed in ... but could have been defined in the user-defined model function.\nmodel_results \u003c- forecastML::train_model(data_train,\n                                         windows = windows,\n                                         model_name = \"LASSO\", \n                                         model_function = model_function,\n                                         my_outcome_col = 1,  # ...\n                                         use_future = FALSE)\n\n#------------------------------------------------------------------------------\n# User-defined prediction function - LASSO\n# The predict() wrapper function takes 2 positional arguments. First,\n# the returned model from the user-defined modeling function (model_function() above).\n# Second, a data.frame of model features. If predicting on validation data, expect the input data to be \n# passed in the same format as returned by create_lagged_df(type = 'train') but with the outcome column \n# removed. If forecasting, expect the input data to be in the same format as returned by \n# create_lagged_df(type = 'forecast') but with the 'index' and 'horizon' columns removed. The function \n# can return a 1- or 3-column data.frame with either (a) point\n# forecasts or (b) point forecasts plus lower and upper forecast bounds (column order and names do not matter).\n\nprediction_function \u003c- function(model, data_features) {\n\n  x \u003c- as.matrix(data_features, ncol = ncol(data_features))\n  data_pred \u003c- data.frame(\"y_pred\" = predict(model, x, s = \"lambda.min\"),  # 1 column is required.\n                          \"y_pred_lower\" = predict(model, x, s = \"lambda.min\") - 50,  # optional.\n                          \"y_pred_upper\" = predict(model, x, s = \"lambda.min\") + 50)  # optional.\n  return(data_pred)\n}\n\n# Predict on the validation datasets.\ndata_valid \u003c- predict(model_results, prediction_function = list(prediction_function), data = data_train)\n\n#------------------------------------------------------------------------------\n# Plot forecasts for each validation dataset.\nplot(data_valid, horizons = c(1, 6, 12))\n\n#------------------------------------------------------------------------------\n# Forecast.\n\n# Forward-looking forecast data.frame.\ndata_forecast \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"forecast\",\n                                              outcome_col = 1, lookback = lookback, horizons = horizons)\n\n# Forecasts.\ndata_forecasts \u003c- predict(model_results, prediction_function = list(prediction_function), data = data_forecast)\n\n# We'll plot a background dataset of actuals as well.\nplot(data_forecasts,\n     data_actual = data_seatbelts[-(1:150), ], \n     actual_indices = as.numeric(row.names(data_seatbelts[-(1:150), ])), \n     horizons = c(1, 6, 12), windows = c(5, 10, 15))\n```\n![](./tools/validation_data_forecasts.png)\n![](./tools/forecasts.png)\n\n***\n\n### Direct forecast in R \u0026 Python\n\nNow we'll look at an example similar to above. The main difference is that our user-defined modeling \nand prediction functions are now written in `Python`. Thanks to the [reticulate](https://github.com/rstudio/reticulate) \n`R` package, entire ML workflows already written in `Python` can be imported into `forecastML` with the \nsimple addition of 2 lines of `R` code.\n\n* The `reticulate::source_python()` function will run a .py file and import any objects into your `R` environment. As we'll \nsee below, we'll only be importing library calls and functions to keep our `R` environment clean.\n\n``` r\nlibrary(forecastML)\nlibrary(reticulate)  # Move Python objects in and out of R. See the reticulate package for setup info.\n\nreticulate::source_python(\"modeling_script.py\")  # Run a Python file and import objects into R.\n```\n\n\u003cbr\u003e\n\n* Below is a simple, slightly different `forecastML` setup for the seatbelt forecasting problem from the \nprevious example.\n\n``` r\ndata(\"data_seatbelts\", package = \"forecastML\")\n\nhorizons \u003c- c(1, 12)  # 2 models that forecast 1 and 1:12 time steps ahead.\n\n# A lookback across select time steps in the past. Feature lags 1 through 9 will be silently dropped from the 12-step-ahead model.\nlookback \u003c- c(1, 3, 6, 9, 12, 15)\n\ndate_frequency \u003c- \"1 month\"  # Time step frequency.\n\n# The date indices, which don't come with the stock dataset, should not be included in the modeling data.frame.\ndates \u003c- seq(as.Date(\"1969-01-01\"), as.Date(\"1984-12-01\"), by = date_frequency)\n\n# Create a dataset of features for modeling.\ndata_train \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"train\", outcome_col = 1,\n                                           lookback = lookback, horizon = horizons,\n                                           dates = dates, frequency = date_frequency)\n\n# Create 2 custom validation datasets for outer-loop nested cross-validation. The purpose of\n# the multiple validation windows is to assess expected forecast accuracy for specific\n# time periods while supporting an investigation of the hyperparameter stability for\n# models trained on different time periods. Validation windows can overlap.\nwindow_start \u003c- c(as.Date(\"1983-01-01\"), as.Date(\"1984-01-01\"))\nwindow_stop \u003c- c(as.Date(\"1983-12-01\"), as.Date(\"1984-12-01\"))\n\nwindows \u003c- forecastML::create_windows(data_train, window_start = window_start, window_stop = window_stop)\n```\n\n\u003cbr\u003e\n\n#### modeling_script.py\n\n* Let's look at the content of our `Python` modeling file that we source()'d above. The `Python` wrapper function inputs \nand returns for `py_model_function()` and `py_prediction_function()` are the same as their `R` counterparts. Just \nbe sure to expect and return `pandas` `DataFrame`s as conversion from `numpy` arrays has not been tested.\n\n``` python\n\nimport pandas as pd\nfrom sklearn import linear_model\nfrom sklearn.preprocessing import StandardScaler\n\n# User-defined model.\n# A user-defined wrapper function for model training that takes the following\n# arguments: (1) a horizon-specific pandas DataFrame made with create_lagged_df(..., type = \"train\")\n# (e.g., my_lagged_df$horizon_h)\ndef py_model_function(data):\n  \n  X = data.iloc[:, 1:]\n  y = data.iloc[:, 0]\n  \n  scaler = StandardScaler()\n  X = scaler.fit_transform(X)\n  \n  model_lasso = linear_model.Lasso(alpha = 0.1)\n  \n  model_lasso.fit(X = X, y = y)\n  \n  return({'model': model_lasso, 'scaler': scaler})\n\n# User-defined prediction function.\n# The predict() wrapper function takes 2 positional arguments. First,\n# the returned model from the user-defined modeling function (py_model_function() above).\n# Second, a pandas DataFrame of model features. For numeric outcomes, the function \n# can return a 1- or 3-column pandas DataFrame with either (a) point\n# forecasts or (b) point forecasts plus lower and upper forecast bounds (column order and names do not matter).\ndef py_prediction_function(model_list, data_x):\n  \n  data_x = model_list['scaler'].transform(data_x)\n  \n  data_pred = pd.DataFrame({'y_pred': model_list['model'].predict(data_x)})\n  \n  return(data_pred)\n```\n\n\u003cbr\u003e\n\n* Train and predict on historical validation data with the imported `Python` wrapper functions.\n\n``` r\n# Train a model across forecast horizons and validation datasets.\nmodel_results \u003c- forecastML::train_model(data_train,\n                                         windows = windows,\n                                         model_name = \"LASSO\",\n                                         model_function = py_model_function,\n                                         use_future = FALSE)\n\n# Predict on the validation datasets.\ndata_valid \u003c- predict(model_results, prediction_function = list(py_prediction_function), data = data_train)\n\n# Plot forecasts for each validation dataset.\nplot(data_valid, horizons = c(1, 12))\n```\n![](./tools/validation_data_forecasts_python.png)\n\n\u003cbr\u003e\n\n* Forecast with the same imported `Python` wrapper functions. The final wrapper functions may eventually have \nfixed hyperparameters or complicated model ensembles based on repeated model training and investigation.\n\n``` r\n# Forward-looking forecast data.frame.\ndata_forecast \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"forecast\", outcome_col = 1,\n                                              lookback = lookback, horizon = horizons,\n                                              dates = dates, frequency = date_frequency)\n\n# Forecasts.\ndata_forecasts \u003c- predict(model_results, prediction_function = list(py_prediction_function),\n                          data = data_forecast)\n\n# We'll plot a background dataset of actuals as well.\nplot(data_forecasts, data_actual = data_seatbelts[-(1:150), ], \n     actual_indices = dates[-(1:150)], horizons = c(1, 12))\n```\n![](./tools/forecasts_python.png)\n\n***\n\n### Multi-output forecast in R\n\n* This is the same seatbelt dataset example except now, instead of 1 model for each \nforecast horizon, we'll build 1 multi-output neural network model that forecasts 12 \nsteps into the future.\n\n* Given that this is a small dataset, the multi-output approach would require a decent \namount of tuning to produce accurate results. An alternative would be to forecast, say, \nhorizons 6 through 12 if longer term forecasts were of interest to reduce the number of \nparameters; the output neurons do not have to start at a horizon of 1 or even be contiguous.\n\n``` r\nlibrary(forecastML)\nlibrary(keras)  # Using the TensorFlow 2.0 backend.\n\ndata(\"data_seatbelts\", package = \"forecastML\")\n\ndata_seatbelts[] \u003c- lapply(data_seatbelts, function(x) {\n  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)\n})\n\ndate_frequency \u003c- \"1 month\"\ndates \u003c- seq(as.Date(\"1969-01-01\"), as.Date(\"1984-12-01\"), by = date_frequency)\n\ndata_train \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"train\", method = \"multi_output\",\n                                           outcome_col = 1, lookback = 1:15, horizons = 1:12,\n                                           dates = dates, frequency = date_frequency,\n                                           dynamic_features = \"law\")\n\n# 'window_length = 0' creates 1 historical training dataset with no external validation datasets. \n# Set it to, say, 24 to see the model and forecast stability when trained across different slices \n# of historical data.\nwindows \u003c- forecastML::create_windows(data_train, window_length = 0)\n\n#------------------------------------------------------------------------------\n# 'data_y' consists of 1 column for each forecast horizon--here, 12.\nmodel_fun \u003c- function(data, horizons) {  # 'horizons' is passed in train_model().\n\n  data_x \u003c- apply(as.matrix(data[, -(1:length(horizons))]), 2, function(x){ifelse(is.na(x), 0, x)})\n  data_y \u003c- apply(as.matrix(data[, 1:length(horizons)]), 2, function(x){ifelse(is.na(x), 0, x)})\n\n  layers_x_input \u003c- keras::layer_input(shape = ncol(data_x))\n\n  layers_x_output \u003c- layers_x_input %\u003e%\n    keras::layer_dense(ncol(data_x), activation = \"relu\") %\u003e%\n    keras::layer_dense(ncol(data_x), activation = \"relu\") %\u003e%\n    keras::layer_dense(length(horizons))\n\n  model \u003c- keras::keras_model(inputs = layers_x_input, outputs = layers_x_output) %\u003e%\n    keras::compile(optimizer = 'adam', loss = 'mean_absolute_error')\n\n  early_stopping \u003c- callback_early_stopping(monitor = 'val_loss', patience = 2)\n\n  tensorflow::tf$random$set_seed(224)\n\n  model_results \u003c- model %\u003e%\n    keras::fit(x = list(as.matrix(data_x)), y = list(as.matrix(data_y)),\n               validation_split = 0.2, callbacks = c(early_stopping), verbose = FALSE)\n\n  return(list(\"model\" = model, \"model_results\" = model_results))\n}\n#------------------------------------------------------------------------------\n# The predict() wrapper function will return a data.frame with a number of columns \n# equaling the number of forecast horizons.\nprediction_fun \u003c- function(model, data_features) {\n\n  data_features[] \u003c- lapply(data_features, function(x){ifelse(is.na(x), 0, x)})\n  data_features \u003c- list(as.matrix(data_features, ncol = ncol(data_features)))\n\n  data_pred \u003c- data.frame(predict(model$model, data_features))\n  names(data_pred) \u003c- paste0(\"y_pred_\", 1:ncol(data_pred))\n\n  return(data_pred)\n}\n#------------------------------------------------------------------------------\n\nmodel_results \u003c- forecastML::train_model(data_train, windows, model_name = \"Multi-Output NN\",\n                                         model_function = model_fun,\n                                         horizons = 1:12)\n\ndata_valid \u003c- predict(model_results, prediction_function = list(prediction_fun), data = data_train)\n\n# We'll plot select forecast horizons to reduce visual clutter.\nplot(data_valid, facet = ~ model, horizons = c(1, 3, 6, 12))\n```\n![](./tools/multi_outcome_train_plot.png)\n\n* Forecast combinations from `combine_forecasts()` aren't necessary as we've trained only 1 model.\n\n``` r\ndata_forecast \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"forecast\", method = \"multi_output\",\n                                              outcome_col = 1, lookback = 1:15, horizons = 1:12,\n                                              dates = dates, frequency = date_frequency,\n                                              dynamic_features = \"law\")\n\ndata_forecasts \u003c- predict(model_results, prediction_function = list(prediction_fun), data = data_forecast)\n\nplot(data_forecasts, facet = NULL, data_actual = data_seatbelts[-(1:100), ], actual_indices = dates[-(1:100)])\n```\n![](./tools/multi_outcome_forecast_plot.png)\n\n\n## Examples - Factor Outcomes with R and Python\n\n### R\n\n* This example is similar to the numeric outcome examples with the exception that the outcome has been \nfactorized to illustrate how factors or sequences are forecasted.\n\n``` r\ndata(\"data_seatbelts\", package = \"forecastML\")\n\n# Create an artifical factor outcome for illustration' sake.\ndata_seatbelts$DriversKilled \u003c- cut(data_seatbelts$DriversKilled, 3)\n\nhorizons \u003c- c(1, 12)  # 2 models that forecast 1 and 1:12 time steps ahead.\n\n# A lookback across select time steps in the past. Feature lag 1 will be silently dropped from the 12-step-ahead model.\nlookback \u003c- c(1, 12, 18)\n\ndate_frequency \u003c- \"1 month\"  # Time step frequency.\n\n# The date indices, which don't come with the stock dataset, should not be included in the modeling data.frame.\ndates \u003c- seq(as.Date(\"1969-01-01\"), as.Date(\"1984-12-01\"), by = date_frequency)\n\n# Create a dataset of features for modeling.\ndata_train \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"train\", outcome_col = 1,\n                                           lookback = lookback, horizon = horizons,\n                                           dates = dates, frequency = date_frequency)\n\n# We won't use nested cross-validation; rather, we'll train a model over the entire training dataset.\nwindows \u003c- forecastML::create_windows(data_train, window_length = 0)\n\n# This is the model-training dataset.\nplot(windows, data_train)\n```\n\n![](./tools/sequence_windows.png)\n\n* Model training and historical fit.\n\n``` r\nmodel_function \u003c- function(data, my_outcome_col) {  # my_outcome_col = 1 could be defined here.\n  \n  outcome_names \u003c- names(data)[1]\n  model_formula \u003c- formula(paste0(outcome_names,  \"~ .\"))\n  \n  set.seed(224)\n  model \u003c- randomForest::randomForest(formula = model_formula, data = data, ntree = 3)\n  return(model)  # This model is the first argument in the user-defined predict() function below.\n}\n\n#------------------------------------------------------------------------------\n# Train a model across forecast horizons and validation datasets.\n# my_outcome_col = 1 is passed in ... but could have been defined in the user-defined model function.\nmodel_results \u003c- forecastML::train_model(data_train,\n                                         windows = windows,\n                                         model_name = \"RF\", \n                                         model_function = model_function,\n                                         my_outcome_col = 1,  # ...\n                                         use_future = FALSE)\n\n#------------------------------------------------------------------------------\n# User-defined prediction function.\n#\n# The predict() wrapper function takes 2 positional arguments. First,\n# the returned model from the user-defined modeling function (model_function() above).\n# Second, a data.frame of model features. If predicting on validation data, expect the input data to be \n# passed in the same format as returned by create_lagged_df(type = 'train') but with the outcome column \n# removed. If forecasting, expect the input data to be in the same format as returned by \n# create_lagged_df(type = 'forecast') but with the 'index' and 'horizon' columns removed.\n# \n# For factor outcomes, the function can return either (a) a 1-column data.frame with factor level \n# predictions or (b) an L-column data.frame of predicted class probabilities where 'L' equals the \n# number of levels in the outcome; the order of the return()'d columns should match the order of the \n# outcome factor levels from left to right which is the default behavior of most predict() functions.\n\n# Predict/forecast a single factor level.\nprediction_function_level \u003c- function(model, data_features) {\n  \n  data_pred \u003c- data.frame(\"y_pred\" = predict(model, data_features, type = \"response\"))\n  \n  return(data_pred)\n}\n\n# Predict/forecast outcome class probabilities.\nprediction_function_prob \u003c- function(model, data_features) {\n  \n  data_pred \u003c- data.frame(\"y_pred\" = predict(model, data_features, type = \"prob\"))\n  \n  return(data_pred)\n}\n\n# Predict on the validation datasets.\ndata_valid_level \u003c- predict(model_results, \n                            prediction_function = list(prediction_function_level), \n                            data = data_train)\ndata_valid_prob \u003c- predict(model_results, \n                           prediction_function = list(prediction_function_prob), \n                           data = data_train)\n\n```\n\n* Predict historical factor levels.\n\n* With `window_length = 0` these are essentially plots of model fit.\n\n``` r\nplot(data_valid_level, horizons = c(1, 12))\n```\n\n![](./tools/sequence_valid_level.png)\n\n* Predict historical class probabilities.\n\n``` r\nplot(data_valid_prob, horizons = c(1, 12))\n```\n\n![](./tools/sequence_valid_prob.png)\n\n* Forecast\n\n``` r\n# Forward-looking forecast data.frame.\ndata_forecast \u003c- forecastML::create_lagged_df(data_seatbelts, type = \"forecast\",\n                                              outcome_col = 1, lookback = lookback, horizons = horizons)\n\n# Forecasts.\ndata_forecasts_level \u003c- predict(model_results,\n                                prediction_function = list(prediction_function_level),\n                                data = data_forecast)\n\ndata_forecasts_prob \u003c- predict(model_results,\n                                prediction_function = list(prediction_function_prob),\n                                data = data_forecast)\n```\n\n* Forecast factor levels\n\n``` r\nplot(data_forecasts_level)\n```\n\n![](./tools/sequence_forecast_level.png)\n\n* Forecast class probabilities\n\n``` r\nplot(data_forecasts_prob)\n```\n\n![](./tools/sequence_forecast_prob.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnredell%2FforecastML","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnredell%2FforecastML","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnredell%2FforecastML/lists"}