https://github.com/razielar/feature_engineering_ts_forecasting
Feature Engineering for Time series Forecasting
https://github.com/razielar/feature_engineering_ts_forecasting
Last synced: 3 months ago
JSON representation
Feature Engineering for Time series Forecasting
- Host: GitHub
- URL: https://github.com/razielar/feature_engineering_ts_forecasting
- Owner: razielar
- Created: 2024-02-10T11:41:55.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-05T16:44:39.000Z (about 1 year ago)
- Last Synced: 2025-01-11T14:48:38.503Z (4 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 23.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Feature Engineering for Time Series Forecasting
Repo from the course is placed [here](https://github.com/trainindata/feature-engineering-for-time-series-forecasting).
1. [Tabularizing Time Series](#one)
2. [Multi-step Forecasting](#two)
3. [Time series Decomposition](#three)
4. [Missing Data Imputation](#four)
5. [Outliers](#five)
6. [Lag features](#six)
7. [Window features](#seven)
8. [Trend features](#eight)
9. [Seasonality Features](#nine)
10. [Date & Time features](#ten)
11. [Categorical features](#eleven)## 1) Tabularizing Time Series
### Feature engineering
![]()
### Tabularizing Time Series
![]()
### ML workflow
![]()
## 3) Time series Decomposition
### Multiplicative time series example
Air passangers dataset increases as the trend increases, that is the **variance gets larger** as the **trend increases**.
![]()
### MSTL Model
![]()
**MSTL Frequency**
| Data | Day | Week | Year |
|----------|----------|----------|----------|
| Daily | | 7 | 365.25 |
| Hourly | 24 | 168 (24*7) | 8766 |* **1)** Forward filling (last observation carried forward): better than backwards filling.
* **2)** Linear interpolation: better than spline interpolation, because is simpler.
* **3)** Spline interpolation: could dramatically disrupt the time series without EDA.
* **4)** `Seasonal decomposition and linear interpolation`
* **4.1)** Linear interpolation.
* **4.2)** Use `STL` or `MSTL` to obtain seasonality. **NOTE**: STL & MSTL assume an `additive model` remember to transform the data.
* **4.3)** De-seasonalise the original time series.
* **4.4)** Linear interpolation on the de-seasonalised data.
* **4.5)** Add the seasonal component back to the imputed de-seasonalised data.### Outliers in time series data
The outlier classification in time series data comes from [Blazquez-Garcia *et al.* paper](https://arxiv.org/pdf/2002.04236.pdf)
![]()
### Estimation methods to identify outliers
* **1)** Rolling mean (mean & std)
* **2)** Rolling median (median & MAD: Median Absolute Deviation)
* **3)** LOWESS residuals
* **4)** STL residuals![]()
### Autocorrelation function (ACF)
### Partial autocorrelation function (PACF)
Measures how correlated a $yt$ is with itself at lags: $y{_t-k}$ **after removing** the **effect of intermediate lags**, by substracting the linear impact by a linear regression.
In practice, we use `ywmle` (Yuke-Walker maximum likelihood estimation) instead of linear regression.### Cross correlation function (CCF)
Measure how correlated $y_t$ is with another variable at some lag: $x{_t-k}$.
### Rolling Window features
### Expanding Window features
### Weighted Expanding/Rolling Window features
### Piecewise linear regression
![]()
### Features to capture seasonality and cyclical patterns
| **Seasonality** | **Cyclical patterns** |
|----------------------------------------------------------------------|------------------------|
| 1. Lag features: `tree-based` & `linear` models | 1. Lag features |
| 2. Calendar features (*aka* datetime features): `tree-based` models | |
| 3. Seasonal dummies (*e.g.* is_january, etc.): `linear` models | |
| 4. Fourier features: `linear` models & high frequency data (*e.g.* hourly data with daily, monthly, and yearly seasonality) | |**Seasonality**: A pattern or effect that repeats with **a fixed frequency** (frequency = 1/period) over time.
**Cyclical patterns**: A pattern or effect that repeats **without a fixed frequency** over time.### Fourier series
Any periodic function $s(t)$ can be written as a **Fourier series** expansion:
![]()
$$ \text{seasonality} = s(t) \approx A_0 + \sum_{n=1}^N A_n\sin(2\pi n f t) + B_n \cos(2\pi n f t) $$
### Periodic or Cyclical features
Periodic features repeat their values at regulard intervals. They reach a maximum value and start over again, *e.g.* December (12) is closer to January (1) than to July (7).
We can solve this by **Trigonometric functions**: `sin` & `cos` transformations, as seen below:
$$ \sin(\text{feature}) = \sin\left(\ \text{feature} \cdot \frac{2\pi}{\max(\text{feature})} \right) $$
$$ \cos(\text{feature}) = \cos\left(\ \text{feature} \cdot \frac{2\pi}{\max(\text{feature})} \right) $$
![]()
![]()