Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thiyangt/seer

:chart_with_upwards_trend: Feature-based Forecast Model Selection (FFORMS) :clock1: :clock130: :clock10: :clock1030: :clock11: :clock1130: :clock12: :clock1230: :clock2: :clock230: :clock3: :clock330: :clock4: :clock430: :clock5: :clock530: :clock6: :clock630: :clock7: :clock730: :clock8: :clock830: :clock9: :clock930:
https://github.com/thiyangt/seer

Last synced: 4 days ago
JSON representation

Host: GitHub
URL: https://github.com/thiyangt/seer
Owner: thiyangt
Created: 2018-02-08T03:25:49.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-10-01T06:51:25.000Z (over 2 years ago)
Last Synced: 2024-12-26T03:06:56.037Z (11 days ago)
Language: R
Homepage: https://thiyangt.github.io/seer/
Size: 405 MB
Stars: 79
Watchers: 9
Forks: 12
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

output: github_document

---

# seer 

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

The `seer` package provides implementations of a novel framework for forecast model selection using time series features. We call this framework **FFORMS** (**F**eature-based **FOR**ecast **M**odel **S**election). For more details see our [paper](https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf).

## Installation

You could install the stable version on CRAN:

```{r, eval = FALSE}

install.packages("seer")

```

You could install the development version from Github using

```{r gh-installation, eval = FALSE}

# install.packages("devtools")

devtools::install_github("thiyangt/seer")

library(seer)

```

## Usage

The FFORMS framework consists of two main phases: i) offline phase, which includes the development of a classification model and ii) online phase, use the classification model developed in the offline phase to identify "best" forecast-model. This document explains the main functions using a simple dataset based on M3-competition data. To load data,

```{r, warning=FALSE, message=FALSE}

library(Mcomp)

data(M3)

yearly_m3 <- subset(M3, "yearly")

m3y <- M3[1:2]

```

#### FFORMS: offline phase

**1. Augmenting the observed sample with simulated time series.**

We augment our reference set of time series by simulating new time series. In order to produce simulated series, we use several standard automatic forecasting algorithms such as ETS or automatic ARIMA models, and then simulate multiple time series from the selected model within each model class. `sim_arimabased` can be used to simulate time series based on (S)ARIMA models.

```{r, message=FALSE}

library(seer)

simulated_arima <- lapply(m3y, sim_arimabased, Future=TRUE, Nsim=2, extralength=6, Combine=FALSE)

simulated_arima

```

Similarly, `sim_etsbased` can be used to simulate time series based on ETS models.

```{r, message=FALSE, eval=FALSE}

simulated_ets <- lapply(m3y, sim_etsbased, Future=TRUE, Nsim=2, extralength=6, Combine=FALSE)

simulated_ets

```

**2. Calculate features based on the training period of time series.**

Our proposed framework operates on the features of the time series.

`cal_features` function can be used to calculate relevant features for a given list of time series.

```{r features, warning=FALSE, message=FALSE}

library(tsfeatures)

M3yearly_features <- seer::cal_features(yearly_m3, database="M3", h=6, highfreq = FALSE)

head(M3yearly_features)

```

**Calculate features from the simulated time series in the step 1**

```{r features_simulated}

features_simulated_arima <- lapply(simulated_arima, function(temp){

	lapply(temp, cal_features, h=6, database="other", highfreq=FALSE)})

fea_sim <- lapply(features_simulated_arima, function(temp){do.call(rbind, temp)})

do.call(rbind, fea_sim)

```

**3. Calculate forecast accuracy measure(s)**

`fcast_accuracy` function can be used to calculate forecast error measure (in the following example MASE) from each candidate model. This step is the most computationally intensive and time-consuming, as each candidate model has to be estimated on each series. In the following example ARIMA(arima), ETS(ets), random walk(rw), random walk with drift(rwd),  standard theta method(theta) and neural network time series forecasts(nn) are used as possible models. In addition to these models following models can also be used in the case of handling seasonal time series,

* snaive: seasonal naive method

* stlar: STL decomposition is applied to the time series and then

seasonal naive method is used to forecast seasonal component. AR model is used to forecast seasonally adjusted data.

* mstlets: STL decomposition is applied to the time series and then

seasonal naive method is used to forecast seasonal component. ETS model is used to forecast seasonally adjusted data.

* mstlarima: STL decomposition is applied to the time series and then

seasonal naive method is used to forecast seasonal component. ARIMA model is used to forecast seasonally adjusted data.

* tbats: TBATS models

```{r accuracy, message=FALSE}

tslist <- list(M3[[1]], M3[[2]])

accuracy_info <- fcast_accuracy(tslist=tslist, models= c("arima","ets","rw","rwd", "theta", "nn"), database ="M3", cal_MASE, h=6, length_out = 1, fcast_save = TRUE)

accuracy_info

```

**4. Construct a dataframe of input:features and output:lables to train a random forest** 

`prepare_trainingset` can be used to create a data frame of input:features and output: labels. 

```{r}

# steps 3 and 4 applied to yearly series of M1 competition

data(M1)

yearly_m1 <- subset(M1, "yearly")

accuracy_m1 <- fcast_accuracy(tslist=yearly_m1, models= c("arima","ets","rw","rwd", "theta", "nn"), database ="M1", cal_MASE, h=6, length_out = 1, fcast_save = TRUE)

features_m1 <- cal_features(yearly_m1, database="M1", h=6, highfreq = FALSE)

# prepare training set

prep_tset <- prepare_trainingset(accuracy_set = accuracy_m1, feature_set = features_m1)

# provides the training set to build a rf classifier

head(prep_tset$trainingset)

# provides additional information about the fitted models

head(prep_tset$modelinfo)

```

#### FFORMS: online phase is activated.

**5. Train a random forest and predict class labels for new series (FFORMS: online phase)**

`build_rf` in the `seer` package enables the training of a random forest model and predict class labels ("best" forecast-model) for new time series. In the following example we use only yearly series of the M1 and M3 competitions to illustrate the code. A random forest classifier is build based on the yearly series on M1 data and predicted class labels for yearly series in the M3 competition. Users can further add the features and classlabel information calculated based on the simulated time series.

```{r}

rf <- build_rf(training_set = prep_tset$trainingset, testset=M3yearly_features,  rf_type="rcp", ntree=100, seed=1, import=FALSE, mtry = 8)

# to get the predicted class labels

predictedlabels_m3 <- rf$predictions

table(predictedlabels_m3)

# to obtain the random forest for future use

randomforest <- rf$randomforest

```

**6. Generate point foecasts and 95% prediction intervals**

`rf_forecast` function can be used to generate point forecasts and 95% prediction intervals based on the predicted class labels obtained in step 5.

```{r}

forecasts <- rf_forecast(predictions=predictedlabels_m3[1:2], tslist=yearly_m3[1:2], database="M3", function_name="cal_MASE", h=6, accuracy=TRUE)

# to obtain point forecasts

forecasts$mean

# to obtain lower boundary of 95% prediction intervals

forecasts$lower

# to obtain upper boundary of 95% prediction intervals

forecasts$upper

# to obtain MASE

forecasts$accuracy

```

#### Notes

**Calculation of features for daily series**

```{r}

# install.packages("https://github.com/carlanetto/M4comp2018/releases/download/0.2.0/M4comp2018_0.2.0.tar.gz",

#                 repos=NULL)

library(M4comp2018)

data(M4)

# extract first two daily time series

M4_daily <- Filter(function(l) l$period == "Daily", M4)

# convert daily series into msts objects

M4_daily_msts <- lapply(M4_daily, function(temp){

  temp$x <- convert_msts(temp$x, "daily")

  return(temp)

})

# calculate features

seer::cal_features(M4_daily_msts, seasonal=TRUE, h=14, m=7, lagmax=8L, database="M4", highfreq=TRUE)

```

**Calculation of features for hourly series**

```{r}

data(M4)

# extract first two daily time series

M4_hourly <- Filter(function(l) l$period == "Hourly", M4)[1:2]

## convert data into msts object

hourlym4_msts <- lapply(M4_hourly, function(temp){

    temp$x <- convert_msts(temp$x, "hourly")

    return(temp)

})

cal_features(hourlym4_msts, seasonal=TRUE, m=24, lagmax=25L, 

                                                         database="M4", highfreq = TRUE)

```

# Forecast combinations based on FFORMS algorithm

```{r}

# extract only the values for two time series just for illustration

yearly_m1_features <- features_m1[1:2,]

votes.matrix <- predict(rf$randomforest, yearly_m1_features, type="vote")

tslist <- yearly_m1[1:2]

# To identify models and weights for forecast combination

models_and_weights_for_combinations <- fforms_ensemble(votes.matrix, threshold=0.6)

# Compute combination forecast

fforms_combinationforecast(models_and_weights_for_combinations, tslist, "M1", 6)

```