Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xprithvi/random-forest-regressor
This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.
https://github.com/xprithvi/random-forest-regressor
data-science feature-extraction machine-learning random-forest random-forest-regression scikit-learn
Last synced: 21 days ago
JSON representation
This Jupyter notebook serves as a machine learning template to quickly make predictions and analyse feature importance in a dataset.
- Host: GitHub
- URL: https://github.com/xprithvi/random-forest-regressor
- Owner: xPrithvi
- License: mit
- Created: 2023-09-20T11:32:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-20T19:06:37.000Z (over 1 year ago)
- Last Synced: 2024-11-19T16:59:37.162Z (3 months ago)
- Topics: data-science, feature-extraction, machine-learning, random-forest, random-forest-regression, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 2.35 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Random Forest Regressor
This Jupyter notebook serves as part of the data science pipeline by providing a quick and easy framework to
perform feature enginnering, model training and feature importance analysis for data exploration. In this particular notebook,
Sci-Kit Learn's RandomForestRegressor was trained on information regarding [housing in Perth](https://www.kaggle.com/datasets/syuzai/perth-house-prices) to
numerically predict house prices based on floor space, suburb, number of bedrooms, etc. Feature importance analysis was performed using
built-in methods that calculate importance by node impurity. However, SHAP was also used to provide a more robust and in-depth analysis
via Shapley values.## Features
- Model saving and loading.
- Hyperparameter tuning via Bayesian optimization.
- Feature importance analysis using tree node impurity and Shapley values.## Future Improvements
- Custom user input to the model (involves writting a custom data encoder instead of using pandas.get_dummies()).
- Reducing the disk size of saved models.