https://github.com/glencrawford/australia_rain_tomorrow_binary_classification_prediction

Binary classification model to predict whether or not it will rain tomorrow with a Tensorflow/Keras and scikit-learn neural network.
https://github.com/glencrawford/australia_rain_tomorrow_binary_classification_prediction

binary-classification classification keras machine-learning neural-network python scikit-learn tensorflow

Last synced: 3 months ago
JSON representation

Binary classification model to predict whether or not it will rain tomorrow with a Tensorflow/Keras and scikit-learn neural network.

Host: GitHub
URL: https://github.com/glencrawford/australia_rain_tomorrow_binary_classification_prediction
Owner: GlenCrawford
Created: 2020-01-27T09:21:21.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-02-04T11:19:37.000Z (over 6 years ago)
Last Synced: 2025-06-21T21:11:44.558Z (about 1 year ago)
Topics: binary-classification, classification, keras, machine-learning, neural-network, python, scikit-learn, tensorflow
Language: Python
Homepage:
Size: 3.66 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Binary classification machine learning model to predict whether it will rain tomorrow in Australia.

This is a neural network that uses binary classification to predict whether, given meteorological observations of a given day at a given weather station in Australia, it will rain there the next day. The model is trained and tested on a dataset containing about 10 years of daily weather observations from numerous Australian weather stations.

There are two separate implementations in this project: one using Tensorflow 2 and Keras, and another using scikit-learn.

The model currently has an accuracy of approximately 87%. Given that it doesn't rain exactly 50% of days, there are a lot more rows in the dataset where the target "RainTomorrow" column has a "No" value than "Yes". This means that you can make a complete guess and be right by random chance about 70% of the time. My goal was therefore to get the model accuracy to somewhere around 90%.

Here is the structure of the dataset used for training and testing, showing the header and two data rows:

| Date       | Location | MinTemp | MaxTemp | Rainfall | Evaporation | Sunshine | WindGustDir | WindGustSpeed | WindDir9am | WindDir3pm | WindSpeed9am | WindSpeed3pm | Humidity9am | Humidity3pm | Pressure9am | Pressure3pm | Cloud9am | Cloud3pm | Temp9am | Temp3pm | RainToday | RISK_MM | RainTomorrow |

|:----------:|:--------:|:-------:|:-------:|:--------:|:-----------:|:--------:|:-----------:|:-------------:|:----------:|:----------:|:------------:|:------------:|:-----------:|:-----------:|:-----------:|:-----------:|:--------:|:--------:|:-------:|:-------:|:---------:|:-------:|:------------:|

| 2010-10-20 | Sydney   | 12.9    | 20.3    | 0.2      | 3           | 10.9     | ENE         | 37            | W          | E          | 11           | 26           | 70          | 57          | 1028.8      | 1025.6      | 3        | 1        | 16.9    | 19.8    | No        | 0       | No           |

| 2017-06-25 | Brisbane | 11      | 24.2    | 0        | 2.2         | 9.8      | ENE         | 20            | SSW        | NNE        | 2            | 7            | 68          | 53          | 1020.5      | 1017.3      | 6        | 3        | 15.9    | 22.6    | No        | 0       | Yes          |

The data was sourced from this [Kaggle dataset](https://www.kaggle.com/jsphyg/weather-dataset-rattle-package) compiled by Joe Young and Adam Young, which was in turn sourced from [http://www.bom.gov.au/climate/data](http://www.bom.gov.au/climate/data) and [http://www.bom.gov.au/climate/dwo/](http://www.bom.gov.au/climate/dwo/). This data is available under a Creative Commons (CC) Attribution 3.0 licence. For details on the meaning of each observation, see [this page](http://www.bom.gov.au/climate/dwo/IDCJDW0000.shtml). Copyright Commonwealth of Australia, Bureau of Meteorology.

## Requirements

* Python (developed with version 3.7.4).

* See dependencies.txt for packages and versions (and below to install).

## Data preprocessing

Data preprocessing is done by a combination of Pandas (to drop NaN rows and map Yes/No strings into 1/0 binary integers), scikit-learn (to scale/normalize numeric features by calculating the z-score of each of their values), and Tensorflow to apply one-hot encoding to categorical features. The model's input layer is thus a combination of pre-normalized numeric features and one-hot encoded categorical features.

The following columns were skipped and not used as features for the model; all the rest were used:

* __Date:__ Not relevant.

* __RainToday:__ This is just a boolean representation of the numeric column "Rainfall". Experimented with adding this feature to the model, but had no effect on accuracy.

* __RISK_MM:__ This is the amount of rain for the following day. This was used to create the label/target column "RainTomorrow". This would be used if the model was doing regression, rather than classification.

* __RainTomorrow:__ Used as the training label/target.

The output of the model is just a single sigmoid-activation neuron which predicts target variable "RainTomorrow".

## Setup

* Clone the Git repository.

* Install the dependencies:

```bash

pip install -r dependencies.txt

```

## Run

```bash

python -W ignore tensor_flow.py

```

or

```bash

python -W ignore scikit_learn.py

```

Note that there is a [current bug in TensorFlow](https://github.com/tensorflow/tensorflow/issues/30609) where deprecation warnings are printed at the usage of feature columns, even though the new feature column API is indeed being used. It has been fixed and will be in a future release of TensorFlow. In the meantime, will just have to live with the warning output.

## Monitoring/logging

After training, run:

```

$ tensorboard --logdir logs/fit

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all

TensorBoard 2.1.0 at http://localhost:6006/ (Press CTRL+C to quit)

```

Then open the above URL in your browser to view the model in TensorBoard.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/glencrawford/australia_rain_tomorrow_binary_classification_prediction

Awesome Lists containing this project

README