Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pradipece/weather_forecast_data_analysis

Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"
https://github.com/pradipece/weather_forecast_data_analysis

data-analysis data-science data-visualization git github python python3

Last synced: 9 days ago
JSON representation

Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"

Awesome Lists containing this project

README

        

## Weather_forecast_data_analysis
Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"

### Problem Statement

This project coding-focused approach how to use `decision trees and random forests` to solve a real-world problem from [Kaggle](https://kaggle.com/datasets):

> **QUESTION**: The [dataset](https://kaggle.com/jsphyg/weather-dataset-rattle-package) contains about 10 years of daily weather observations from numerous Au weather stations. Here's a small sample from the dataset:
>
> ![](https://i.imgur.com/5QNJvir.png)
>
> As a data scientist at the Bureau of Meteorology, you are tasked with creating a fully automated system that can use today's weather data for a given location to predict whether it will rain at the location.
>
>
> ![](https://i.imgur.com/KWfcpcO.png)

### Overview

Perform the following steps to prepare the dataset for training:

1. Create a train/test/validation split
2. Identify input and target columns
3. Identify numeric and categorical columns
4. Impute (fill) missing numeric values
5. Scale numeric values to the $(0, 1)$ range
6. Encode categorical columns to one-hot vectors

### Training and Visualizing Decision Trees

A decision tree in general parlance represents a hierarchical series of binary decisions:

A decision tree in machine learning works in the same way except that we let the computer figure out the optimal structure hierarchy of decisions, following the instruction of criteria.

### Summary

The following topics were covered in this tutorial:

- Downloading a real-world dataset
- Preparing a dataset for training
- Training and interpreting decision trees
- Training and interpreting random forests
- Overfitting, hyperparameter tuning & regularization
- Making predictions on single inputs

Introduced the following terms:

* Decision tree
* Random forest
* Overfitting
* Hyperparameter
* Hyperparameter tuning
* Regularization
* Ensembling
* Generalization
* Bootstrapping

### References
Check out the following resources to learn more:

- https://scikit-learn.org/stable/modules/tree.html
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- https://www.kaggle.com/willkoehrsen/start-here-a-gentle-introduction
- https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering
- https://www.kaggle.com/willkoehrsen/intro-to-model-tuning-grid-and-random-search
- https://www.kaggle.com/c/home-credit-default-risk/discussion/64821