https://github.com/vanilladucky/housing-prediction
This is a data analytics and machine learning project that I undertook using a housing dataset on Kaggle in order to put my machine learning knowledge to practice and some practical application.
https://github.com/vanilladucky/housing-prediction
data-science machine-learning python scikit-learn
Last synced: about 2 months ago
JSON representation
This is a data analytics and machine learning project that I undertook using a housing dataset on Kaggle in order to put my machine learning knowledge to practice and some practical application.
- Host: GitHub
- URL: https://github.com/vanilladucky/housing-prediction
- Owner: vanilladucky
- Created: 2022-04-28T13:38:00.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-11T09:13:46.000Z (about 4 years ago)
- Last Synced: 2025-01-23T06:13:05.033Z (over 1 year ago)
- Topics: data-science, machine-learning, python, scikit-learn
- Language: Jupyter Notebook
- Homepage: https://share.streamlit.io/vanilladucky/housing-prediction/main/prediction_project.py
- Size: 21.9 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# A housing price predicting web application
## Summary
This is a data analytics and machine learning project that I undertook using a housing dataset on Kaggle in order to put my machine learning knowledge to practice and some practical application.
## Work explained
In the **data** folder, there are the cleaned and external datasets.
The external data had numerical and categorical values and also numerous NaN values. I used logical imputation methods, taking into consideration the scenario, to ensure there were no NaN values and even if there were, which are logical for houses, I utilized label encoding for categorical features.
All of these data cleaning, visualization and feature engineering + categorical mapping are present in the **notebooks** folder
Meanwhile, in the **model** notebook, I go onto utilize these datasets to come up with different models, varying in complexities. I went onto choose two specific algorithms which were better than the others and went onto tune their hyperparameters, and finally stacked them with linear regression for the final model, yielding the lowest error.
## Tech used
* Python
* Jupyter Notebook
* Scikit-Learn
* Matplotlib
* Streamlit
## Web App
https://share.streamlit.io/vanilladucky/housing-prediction/main/prediction_project.py