An open API service indexing awesome lists of open source software.

https://github.com/razielar/feature_engineering_ts_forecasting

Feature Engineering for Time series Forecasting
https://github.com/razielar/feature_engineering_ts_forecasting

Last synced: 3 months ago
JSON representation

Feature Engineering for Time series Forecasting

Awesome Lists containing this project

README

        

# Feature Engineering for Time Series Forecasting

Repo from the course is placed [here](https://github.com/trainindata/feature-engineering-for-time-series-forecasting).

1. [Tabularizing Time Series](#one)
2. [Multi-step Forecasting](#two)
3. [Time series Decomposition](#three)
4. [Missing Data Imputation](#four)
5. [Outliers](#five)
6. [Lag features](#six)
7. [Window features](#seven)
8. [Trend features](#eight)
9. [Seasonality Features](#nine)
10. [Date & Time features](#ten)
11. [Categorical features](#eleven)

## 1) Tabularizing Time Series

### Feature engineering


logo

### Tabularizing Time Series


logo

## 2) Multi-step Forecasting

### ML workflow


logo

## 3) Time series Decomposition

### Multiplicative time series example

Air passangers dataset increases as the trend increases, that is the **variance gets larger** as the **trend increases**.


logo

### MSTL Model


logo

**MSTL Frequency**

| Data | Day | Week | Year |
|----------|----------|----------|----------|
| Daily | | 7 | 365.25 |
| Hourly | 24 | 168 (24*7) | 8766 |

## 4) Missing Data Imputation

* **1)** Forward filling (last observation carried forward): better than backwards filling.
* **2)** Linear interpolation: better than spline interpolation, because is simpler.
* **3)** Spline interpolation: could dramatically disrupt the time series without EDA.
* **4)** `Seasonal decomposition and linear interpolation`
* **4.1)** Linear interpolation.
* **4.2)** Use `STL` or `MSTL` to obtain seasonality. **NOTE**: STL & MSTL assume an `additive model` remember to transform the data.
* **4.3)** De-seasonalise the original time series.
* **4.4)** Linear interpolation on the de-seasonalised data.
* **4.5)** Add the seasonal component back to the imputed de-seasonalised data.

## 5) Outliers

### Outliers in time series data

The outlier classification in time series data comes from [Blazquez-Garcia *et al.* paper](https://arxiv.org/pdf/2002.04236.pdf)


logo

### Estimation methods to identify outliers

* **1)** Rolling mean (mean & std)
* **2)** Rolling median (median & MAD: Median Absolute Deviation)
* **3)** LOWESS residuals
* **4)** STL residuals

## 6) Lag features


logo

### Autocorrelation function (ACF)

### Partial autocorrelation function (PACF)

Measures how correlated a $yt$ is with itself at lags: $y{_t-k}$ **after removing** the **effect of intermediate lags**, by substracting the linear impact by a linear regression.
In practice, we use `ywmle` (Yuke-Walker maximum likelihood estimation) instead of linear regression.

### Cross correlation function (CCF)

Measure how correlated $y_t$ is with another variable at some lag: $x{_t-k}$.

## 7) Window features

### Rolling Window features

### Expanding Window features

### Weighted Expanding/Rolling Window features

## 8) Trend features

### Piecewise linear regression


logo

## 9) Seasonality Features

### Features to capture seasonality and cyclical patterns

| **Seasonality** | **Cyclical patterns** |
|----------------------------------------------------------------------|------------------------|
| 1. Lag features: `tree-based` & `linear` models | 1. Lag features |
| 2. Calendar features (*aka* datetime features): `tree-based` models | |
| 3. Seasonal dummies (*e.g.* is_january, etc.): `linear` models | |
| 4. Fourier features: `linear` models & high frequency data (*e.g.* hourly data with daily, monthly, and yearly seasonality) | |

**Seasonality**: A pattern or effect that repeats with **a fixed frequency** (frequency = 1/period) over time.
**Cyclical patterns**: A pattern or effect that repeats **without a fixed frequency** over time.

### Fourier series

Any periodic function $s(t)$ can be written as a **Fourier series** expansion:


logo

$$ \text{seasonality} = s(t) \approx A_0 + \sum_{n=1}^N A_n\sin(2\pi n f t) + B_n \cos(2\pi n f t) $$

## 10) Date & Time features

### Periodic or Cyclical features

Periodic features repeat their values at regulard intervals. They reach a maximum value and start over again, *e.g.* December (12) is closer to January (1) than to July (7).

We can solve this by **Trigonometric functions**: `sin` & `cos` transformations, as seen below:

$$ \sin(\text{feature}) = \sin\left(\ \text{feature} \cdot \frac{2\pi}{\max(\text{feature})} \right) $$

$$ \cos(\text{feature}) = \cos\left(\ \text{feature} \cdot \frac{2\pi}{\max(\text{feature})} \right) $$


logo

## 11) Categorical features


logo