https://github.com/cintia0528/data_science-supervised_machine_learning_classification_housing
The projects aim is to find the to best ML algorithm evaluated on its efficiency in predicting whether homes should be classified as expensive or not expensive.
https://github.com/cintia0528/data_science-supervised_machine_learning_classification_housing
classification lazypredict machinelearning-python supervised-machine-learning xgboost-algorithm
Last synced: 5 months ago
JSON representation
The projects aim is to find the to best ML algorithm evaluated on its efficiency in predicting whether homes should be classified as expensive or not expensive.
- Host: GitHub
- URL: https://github.com/cintia0528/data_science-supervised_machine_learning_classification_housing
- Owner: Cintia0528
- Created: 2023-11-27T14:30:06.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-01T14:49:28.000Z (about 2 years ago)
- Last Synced: 2025-03-31T05:35:17.948Z (8 months ago)
- Topics: classification, lazypredict, machinelearning-python, supervised-machine-learning, xgboost-algorithm
- Language: Jupyter Notebook
- Homepage:
- Size: 245 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Supervised Machine Learning - Classification
## Goal
To classify properties into "Expensive" / "Not Expensive" categories with the help of Supervised Machine Learning.
## Overview
We are interested in making better investment decisions, and hence evaluating properties based on 70+ features, whether they qualify as expensive or inexpensive properties.
## Context
Trying out and fine-tuning a variety of Machine Learning models to get the best prediction
1. Is our Machine Learning model predicting the value of properties successfully?
2. What type of errors are most prone for each of the models?
### Task:
* Import database of over 1500 properties
* Explore, analyze and clean over 70 features
* Try and fine-tune ML models for the best outcome
## Deliverables
The **Google Colab Notebook** for trying out different ML algorithms is found [here](https://github.com/Cintia0528/Project-6-Supervised-Machine-Learning---Classification/blob/1c9d84d012a3df27d01b413010a3be04dec79acf/5_b_Housing_Model_Selection_.ipynb).
Further Machine Learning experimentation with LazyPredict and VotingClassifier is found [here](https://github.com/Cintia0528/Project-6-Supervised-Machine-Learning--Classification-/blob/aa13379741c4444573b21000712825789fd7ef70/5_c_Housing_Model_Selection_LazyPredict%20(1).ipynb), with a supporting Medium article [here](https://medium.com/@ubp0528/another-ml-puzzle-decoding-the-factors-behind-expensive-homes-a6f096aa91e1).
## Skills & Tools
1. Data Reading & Cleaning
2. Data Splitting
3. Building a Preprocessor
4. Modelling ( Decision Tree, KNN, Random Forest, XGBoost)
5. Fine Tuning
6. Error Analysis
## Further Analysis
1. Perfecting the model with Lazy predict
2. Pooling individual models' strength with Voting Classifier
Note: In the notebook the Lazypredict + VotingClassifier combo gave us approximately 95%, but when applied to brand new dataset via a Streamlit application it had the highest accuracy with over 97%.