https://github.com/szymon-budziak/real_estate_house_prices_prediction

Predicting real estate house prices using various machine learning algorithms, including data exploration, preprocessing, model training, and evaluation.
https://github.com/szymon-budziak/real_estate_house_prices_prediction

data-analysis data-preprocessing data-science eda jupyter-notebook machine-learning matplotlib numpy optuna pandas predictive-modeling price-prediction python random-forest regression scikit-learn seaborn

Last synced: 22 days ago
JSON representation

Predicting real estate house prices using various machine learning algorithms, including data exploration, preprocessing, model training, and evaluation.

Host: GitHub
URL: https://github.com/szymon-budziak/real_estate_house_prices_prediction
Owner: Szymon-Budziak
License: mit
Created: 2024-08-05T17:00:59.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-08-05T17:32:13.000Z (9 months ago)
Last Synced: 2025-04-12T20:50:02.923Z (22 days ago)
Topics: data-analysis, data-preprocessing, data-science, eda, jupyter-notebook, machine-learning, matplotlib, numpy, optuna, pandas, predictive-modeling, price-prediction, python, random-forest, regression, scikit-learn, seaborn
Language: Jupyter Notebook
Homepage:
Size: 2.12 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Real Estate House Prices Prediction

---

This repository contains a Jupyter Notebook that demonstrates a comprehensive process for predicting real estate house prices using various machine learning techniques. The notebook walks through data exploration, preprocessing, model training, evaluation, and selection, ultimately culminating in a robust predictive model.

## Table of Contents

- [Introduction](#introduction)
- [Dataset](#dataset)
- [Installation](#installation)
- [Usage](#usage)
- [Notebook Overview](#notebook-overview)
- [1. Data Loading and Initial Exploration](#1-data-loading-and-initial-exploration)
- [2. Exploratory Data Analysis (EDA)](#2-exploratory-data-analysis-eda)
- [3. Data Cleaning and Preprocessing](#3-data-cleaning-and-preprocessing)
- [4. Model Training and Evaluation](#4-model-training-and-evaluation)
- [5. Hyperparameter Tuning](#5-hyperparameter-tuning)
- [6. Model Comparison](#6-model-comparison)
- [7. Final Model and Pipeline](#7-final-model-and-pipeline)
- [8. Conclusion](#8-conclusion)
- [Contributing](#contributing)
- [License](#license)

## Introduction

This project aims to predict real estate hosue prices using a dataset containing various features such as area, furnishing status, and more. It employs multiple machine learning algorithms to find the best model for the task.

## Dataset

The dataset used in this project is a structured CSV file containing information about real estate properties. Key features include price, area, and furnishing status.

## Installation

The project uses Poetry to manage dependencies. To install the dependencies, run the following command:

```bash
poetry install
```

## Notebook Overview

### 1. Data Loading and Initial Exploration

- Load the dataset.
- Display basic statistics and the first few rows of the data.
- Check for unique values and missing data.

### 2. Exploratory Data Analysis (EDA)

- Visualize the distribution of the target variable (price) using histograms and KDE plots.
- Examine the relationship between price and other features using scatter plots and box plots.
- Generate a heatmap to understand feature correlations.
- Create pair plots to visualize relationships between multiple features.

### 3. Data Cleaning and Preprocessing

- Handle binary categorical variables and create dummy variables for other categorical data.
- Standardize or normalize numerical features as needed.

### 4. Model Training and Evaluation

- Train multiple regression models including Linear Regression, Lasso, Ridge, Polynomial Regression, and Random Forest.
- Evaluate models using metrics like R², MSE, and RMSE.

### 5. Hyperparameter Tuning

- Use Optuna for hyperparameter tuning of the Random Forest model to improve performance.

### 6. Model Comparison

- Compare the performance of various models on the training and test sets.
- Visualize and interpret model performance metrics.

### 7. Final Model and Pipeline

- Save the best model using pickle.
- Create a prediction pipeline with the best model and necessary preprocessing steps.
- Demonstrate making predictions with the final model.

### 8. Conclusion

- Summarize findings, model performance, and insights gained from the project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/szymon-budziak/real_estate_house_prices_prediction

Awesome Lists containing this project

README