https://github.com/sarathir-dev/house-price-prediction-model
A machine learning project that predicts house prices using ensemble models with preprocessing, feature engineering, and visualization, achieving 71% accuracy on real estate data for price estimation.
https://github.com/sarathir-dev/house-price-prediction-model
house-price-prediction machine-learning prediction-model random-forest real-estate regression sklearn stacking xgboost
Last synced: about 1 month ago
JSON representation
A machine learning project that predicts house prices using ensemble models with preprocessing, feature engineering, and visualization, achieving 71% accuracy on real estate data for price estimation.
- Host: GitHub
- URL: https://github.com/sarathir-dev/house-price-prediction-model
- Owner: sarathir-dev
- Created: 2025-07-24T05:10:28.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-25T06:42:51.000Z (3 months ago)
- Last Synced: 2025-07-31T12:15:14.950Z (2 months ago)
- Topics: house-price-prediction, machine-learning, prediction-model, random-forest, real-estate, regression, sklearn, stacking, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 4.08 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# House Price Prediction using Machine Learning
A machine learning pipeline for predicting house prices based on features like area, bedrooms, location amenities, and furnishing status. Built with `pandas`, `scikit-learn`, `XGBoost`, and `matplotlib`, this project includes data preprocessing, feature engineering, model training, evaluation, and visualization.
---
## Project Overview
This project aims to accurately estimate house prices using regression techniques. We performed:
- Data Cleaning & Feature Engineering
- Stacking Ensemble (Random Forest + XGBoost)
- Log transformation of target variable
- Standardization and Imputation
- Evaluation using RMSE, R² Score, and Relative Accuracy
- Model Export for Deployment---
## Dataset
The dataset consists of residential house listings with the following key features:
- `price`, `area`, `bedrooms`, `bathrooms`, `stories`
- Amenities: `mainroad`, `guestroom`, `basement`, `hotwaterheating`, `airconditioning`, `parking`
- Preferences: `prefarea`, `furnishingstatus`Sample (cleaned data):
| price | area | bedrooms | bathrooms | mainroad | guestroom | ... |
|-----------|------|----------|-----------|----------|-----------|-----|
| 13300000 | 7420 | 4 | 2 | yes | no | ... |---
## Model Architecture
The pipeline includes:
- **Preprocessing**: Imputation + Scaling using `StandardScaler`
- **Feature Engineering**: Total rooms, area per room, etc.
- **Model**: Stacking Regressor (Random Forest + XGBoost + RidgeCV)---
## Results
| Metric | Value |
|-----------------------|------------------|
| RMSE | ₹1,447,704.47 |
| R² Score | 0.5854 |
| Relative Accuracy | 71.09% |> **Note**: Further tuning or using deep learning methods may improve performance.
---
## Visualizations
More visualizations available in the notebook:
- Feature Distributions
- Correlation Heatmap
- Residual Plot## 📄 License
This project is licensed under the **MIT License**.
Feel free to use, modify, and share it with attribution.---
⭐ If you liked this repo, don’t forget to star it and share it with others!