Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pradipece/weather_forecast_data_analysis
Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"
https://github.com/pradipece/weather_forecast_data_analysis
data-analysis data-science data-visualization git github python python3
Last synced: 9 days ago
JSON representation
Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"
- Host: GitHub
- URL: https://github.com/pradipece/weather_forecast_data_analysis
- Owner: pradipece
- License: mit
- Created: 2024-11-24T18:19:20.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-30T15:35:23.000Z (2 months ago)
- Last Synced: 2024-12-07T22:12:17.610Z (2 months ago)
- Topics: data-analysis, data-science, data-visualization, git, github, python, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 1.52 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Weather_forecast_data_analysis
Using decision trees and random forest algorithms to solve real-world data analysis. "sklearn_decision_trees_random_forests"### Problem Statement
This project coding-focused approach how to use `decision trees and random forests` to solve a real-world problem from [Kaggle](https://kaggle.com/datasets):
> **QUESTION**: The [dataset](https://kaggle.com/jsphyg/weather-dataset-rattle-package) contains about 10 years of daily weather observations from numerous Au weather stations. Here's a small sample from the dataset:
>
> ![](https://i.imgur.com/5QNJvir.png)
>
> As a data scientist at the Bureau of Meteorology, you are tasked with creating a fully automated system that can use today's weather data for a given location to predict whether it will rain at the location.
>
>
> ![](https://i.imgur.com/KWfcpcO.png)### Overview
Perform the following steps to prepare the dataset for training:
1. Create a train/test/validation split
2. Identify input and target columns
3. Identify numeric and categorical columns
4. Impute (fill) missing numeric values
5. Scale numeric values to the $(0, 1)$ range
6. Encode categorical columns to one-hot vectors### Training and Visualizing Decision Trees
A decision tree in general parlance represents a hierarchical series of binary decisions:
A decision tree in machine learning works in the same way except that we let the computer figure out the optimal structure hierarchy of decisions, following the instruction of criteria.
### Summary
The following topics were covered in this tutorial:
- Downloading a real-world dataset
- Preparing a dataset for training
- Training and interpreting decision trees
- Training and interpreting random forests
- Overfitting, hyperparameter tuning & regularization
- Making predictions on single inputsIntroduced the following terms:
* Decision tree
* Random forest
* Overfitting
* Hyperparameter
* Hyperparameter tuning
* Regularization
* Ensembling
* Generalization
* Bootstrapping### References
Check out the following resources to learn more:- https://scikit-learn.org/stable/modules/tree.html
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- https://www.kaggle.com/willkoehrsen/start-here-a-gentle-introduction
- https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering
- https://www.kaggle.com/willkoehrsen/intro-to-model-tuning-grid-and-random-search
- https://www.kaggle.com/c/home-credit-default-risk/discussion/64821