Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xprithvi/random-forest-regressor

This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.
https://github.com/xprithvi/random-forest-regressor

data-science feature-extraction machine-learning random-forest random-forest-regression scikit-learn

Last synced: 21 days ago
JSON representation

This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.

Awesome Lists containing this project

README

        

# Random Forest Regressor
This Jupyter notebook serves as part of the data science pipeline by providing a quick and easy framework to
perform feature enginnering, model training and feature importance analysis for data exploration. In this particular notebook,
Sci-Kit Learn's RandomForestRegressor was trained on information regarding [housing in Perth](https://www.kaggle.com/datasets/syuzai/perth-house-prices) to
numerically predict house prices based on floor space, suburb, number of bedrooms, etc. Feature importance analysis was performed using
built-in methods that calculate importance by node impurity. However, SHAP was also used to provide a more robust and in-depth analysis
via Shapley values.

## Features

- Model saving and loading.
- Hyperparameter tuning via Bayesian optimization.
- Feature importance analysis using tree node impurity and Shapley values.

## Future Improvements

- Custom user input to the model (involves writting a custom data encoder instead of using pandas.get_dummies()).
- Reducing the disk size of saved models.