An open API service indexing awesome lists of open source software.

https://github.com/akimuddinshaikh/machine-learning-project

A comparative study of regression models (Decision Tree, Random Forest, Ridge, Lasso, SVM) for predicting real estate prices in King County, NYC, and California using PCA & Pipeline techniques.
https://github.com/akimuddinshaikh/machine-learning-project

machine-learning pca-analysis python regression-models scikit-learn statsmodels

Last synced: 2 months ago
JSON representation

A comparative study of regression models (Decision Tree, Random Forest, Ridge, Lasso, SVM) for predicting real estate prices in King County, NYC, and California using PCA & Pipeline techniques.

Awesome Lists containing this project

README

        

Machine Learning Models for House Price Prediction
A Comparative Study of Regression Models

Author: Akimuddin Aslam Shaikh
Institution: National College of Ireland, School of Computing

**Project Overview**
This project explores various machine learning regression models to predict house prices based on real estate data from King County, NYC, and California. The study aims to compare the performance of different regression models and determine the most suitable approach for real estate price forecasting.

**Key Highlights**
✅ Implemented Decision Tree, Lasso, Linear, Random Forest, Ridge, and Support Vector Machine (SVM) Regression
✅ Applied Pipeline module & PCA (Principal Component Analysis) for dimensionality reduction
✅ Evaluated models using R-squared score, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE)
✅ Used StatsModel for regression analysis
✅ Found that Random Forest Regression with PCA performed best for King County & California datasets, while NYC dataset yielded poor results across models

**Technologies & Libraries Used**
Python
Scikit-Learn (Regression models & PCA)
StatsModels (Regression analysis)
Matplotlib & Seaborn (Data visualization)
Pandas & NumPy (Data preprocessing)

**Regression Models & Performance**
Model R² Score MSE RMSE Dataset
Random Forest + PCA Best Low Low King County, California
Decision Tree Regression Moderate Medium Medium King County, California
Linear Regression Moderate High High NYC
SVM Regression Poor High High NYC
**Key Finding:** The Random Forest Regression model with PCA performed best for King County & California datasets.
**NYC Dataset:** None of the models achieved a positive R-squared score, highlighting dataset limitations.