https://github.com/xprithvi/random-forest-regressor

This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.
https://github.com/xprithvi/random-forest-regressor

data-science feature-extraction machine-learning random-forest random-forest-regression scikit-learn

Last synced: 4 months ago
JSON representation

This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.

Host: GitHub
URL: https://github.com/xprithvi/random-forest-regressor
Owner: xPrithvi
License: mit
Created: 2023-09-20T11:32:20.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-20T19:06:37.000Z (almost 2 years ago)
Last Synced: 2025-01-20T19:24:53.854Z (6 months ago)
Topics: data-science, feature-extraction, machine-learning, random-forest, random-forest-regression, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 2.35 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Random Forest Regressor
This Jupyter notebook serves as part of the data science pipeline by providing a quick and easy framework to
perform feature enginnering, model training and feature importance analysis for data exploration. In this particular notebook,
Sci-Kit Learn's RandomForestRegressor was trained on information regarding [housing in Perth](https://www.kaggle.com/datasets/syuzai/perth-house-prices) to
numerically predict house prices based on floor space, suburb, number of bedrooms, etc. Feature importance analysis was performed using
built-in methods that calculate importance by node impurity. However, SHAP was also used to provide a more robust and in-depth analysis
via Shapley values.

## Features

- Model saving and loading.
- Hyperparameter tuning via Bayesian optimization.
- Feature importance analysis using tree node impurity and Shapley values.

## Future Improvements

- Custom user input to the model (involves writting a custom data encoder instead of using pandas.get_dummies()).
- Reducing the disk size of saved models.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xprithvi/random-forest-regressor

Awesome Lists containing this project

README