An open API service indexing awesome lists of open source software.

https://github.com/razamehar/weather-time-series-analysis-using-statistical-methods-and-deep-learning-models

This project conducts a thorough analysis of weather time series data using diverse statistical and deep learning models. Each model was rigorously applied to the same weather time series data to assess and compare their forecasting accuracy. Detailed results and analyses are provided to delineate the strengths and weaknesses of each approach.
https://github.com/razamehar/weather-time-series-analysis-using-statistical-methods-and-deep-learning-models

bidirectional-lstm centered-approach differencing gru learning-rate-scheduling lstm moving-average naive-forecasting neural-network python3 qqplots seasonality smoothing time-series-analysis weather-forecast

Last synced: 4 months ago
JSON representation

This project conducts a thorough analysis of weather time series data using diverse statistical and deep learning models. Each model was rigorously applied to the same weather time series data to assess and compare their forecasting accuracy. Detailed results and analyses are provided to delineate the strengths and weaknesses of each approach.

Awesome Lists containing this project

README

          

# Weather Time Series Analysis using Statistical Methods and Deep Learning Models

## Project Overview
This project explored various statistical methods and deep learning models for multivariate time series analysis. Techniques such as Naive Forecasting, Moving Average Forecasting, Differenced Moving Average Forecasting, and Differenced Moving Average Forecasting with Smoothing were meticulously examined. Within the realm of deep learning, Simple Neural Networks, Deep Neural Networks, Single-Layer LSTMs, Single-Layer Regularized LSTMs, Bi-Directional Regularized LSTMs, Regularized Stacked GRUs, and Convolutional Layers with Stacked GRUs and Fully Connected Layers were analyzed. Through rigorous comparison and evaluation, the most effective methodology for achieving accurate and reliable weather predictions were sought. This involved establishing baseline, selecting the best model using learning rate scheduler, and conducting performance comparisons against baseline.

**Naive Forecasting:** A simple forecasting method where the prediction is the last observed value, assuming no change or trend.

**Moving Average Forecasting:** A method that predicts future values by averaging a set of recent past values, smoothing out short-term fluctuations.

**Differenced Moving Average Forecasting:** Extends moving average forecasting by first differencing the data (subtracting consecutive observations) to remove trends or seasonality.

**Differenced Moving Average Forecasting with Smoothing:** Further refines differenced moving average forecasting by applying additional smoothing techniques to the differenced data to reduce noise.

**Simple Neural Networks:** Basic neural networks with a single hidden layer, used for pattern recognition in data with limited complexity.

**Deep Neural Networks:** Advanced neural networks with multiple hidden layers, capable of learning complex representations from large datasets.

**Single-Layer LSTMs:** Long Short-Term Memory (LSTM) networks with one layer, designed to handle sequential data by retaining information over time.

**Bi-Directional LSTMs:** LSTMs that process data in both forward and backward directions to enhance performance on sequential data.

**GRUs:** Gated Recurrent Units (GRUs) that efficiently capture dependencies in sequential data.

**CNNs:** Convolutional Neural Networks (CNNs) that adaptively learn and extract hierarchical spatial features from data using convolutional layers, commonly used for image and video processing.

## Statistical Analysis of Variables

### Univariate Analysis
In this phase, individual variables are analyzed to understand their distribution and normality. Utilizing histograms and quantile-quantile (qq) plots, we gain insights into their characteristics.



Histograms

histograms


Quantile-Quantile Plots

qq-plots

### Correlation Analysis
Exploring relationships between variables, correlation analysis employs Pearson correlation coefficients. A correlation matrix visualized through a heatmap highlights the strengths of correlations with 'T (degC)', offering valuable insights into inter-variable relationships and dependencies.



Heatmap of Correlation Coefficients

heatmap


## Data Visualization
Time series plots depict temperature variations over time, revealing both long-term trends and short-term fluctuations within seasonal cycles. Annual temperature trend analysis showcases maximum, average, and minimum temperatures annually, aiding in the interpretation of climate data and identification of seasonal patterns.



Seasonality

seasonality


Seasonality without Noise

seasonality without noise


Seasonality (First Season Cycle)

first season cycle


Temperature over the Years

temperature over the years

## Statistical Forecast Methods

### Fixed Partitioning for Statistical Methods based Forecasting
A systematic partitioning approach divides temperature data into training and testing sets. Data from 2012 to 2014 are allocated for training to enable model learning from historical data, while data from subsequent years are reserved for validation and testing, ensuring accurate predictions of future temperatures.

### Naive Forecast
Predictions are based solely on the last observed temperature, serving as a baseline for accuracy assessment.




naive forecast

### Moving Average Forecasting
Average temperatures over defined window sizes are computed to smooth short-term fluctuations and highlight long-term trends.




moving average

### Differenced Moving Average Forecast
By differencing to remove trends and seasonality before applying a moving average, this method refines predictions and improves accuracy.




Differenced Moving Average

### Differenced Moving Average Forecast with Trend & Seasonality Added
Seasonality and Trend added back to the differenced moving average.




Differenced Moving Average with Trend & Seasonality Added

### Differenced Moving Average Forecast with Smoothing
Using centered approach to smooth the data at each step. For instance, to smooth the data point at t = 365, we would compute the average of the values from t = 359 to t = 370, with the window size of 11.




Differenced Moving Average with Smoothing

## Deep Learning Models
Various deep learning models, including Basic Neural Network, Deep Neural Network, LSTM, Regularized LSTM, Bi-Directional LSTM, Stacked GRUs, and Convolutional layer with stacked GRUs and Fully Connected Layers are explored for temperature forecasting, each tailored to leverage sequential data characteristics for enhanced prediction accuracy.

### Fixed Partitioning for Neural Network based Forecasting
Temperature data are split into training, validation, and testing sets ensuring chronological order and accounting for seasonality, essential for effective model training and evaluation.

### Data Preprocessing
Data normalization using MinMaxScaler ensures consistent scaling, particularly beneficial for non-normally distributed data and when training neural networks with features of different scales.

### Sequence Generation
Sequences are generated from input array data using TensorFlow's timeseries_dataset_from_array, facilitating training, validation, and testing of models with specified sequence lengths.

### Model Finalization
The two most promising models, determined by their low loss and Mean Absolute Error (MAE), were selected for further refinement. They underwent fine-tuning using a learning rate schedule to identify the optimal learning rate and were then retrained on the dataset. Among these models, the one exhibiting the best performance with the new learning rate was chosen as the final selection.



Training Loss versus Learning Rate for Model 1

Training Loss versus Learning Rate for Model 1


Training Loss versus Learning Rate for Model 2

Training Loss versus Learning Rate for Model 2

### Evaluation
Model performance is evaluated using Mean Absolute Error (MAE) metric on the test dataset, comparing predictions against actual values to quantify forecasting accuracy.

## Weather Forecast
Trained models are utilized to predict future temperature values, leveraging the learned patterns and dependencies in the data to provide accurate forecasts.

## Potential Improvements
- Modify the number of units.
- Change the dropout ratio.
- Test different learning rates.
- Experiment with batch sizes.
- Add more dense layers.
- Alter the sequence length.

## Data Sources
https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip

## License
This project is licensed under the Raza Mehar License. See the LICENSE.md file for details.

## Contact
For any questions or clarifications, please contact Raza Mehar at [raza.mehar@gmail.com].