Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/justfifi17/nyc-real-estate-sales-prediction

Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.
https://github.com/justfifi17/nyc-real-estate-sales-prediction

data-visualization exploratory-data-analysis gradient-boosting linear-regression random-forest xgboost-regression

Last synced: about 6 hours ago
JSON representation

Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.

Host: GitHub
URL: https://github.com/justfifi17/nyc-real-estate-sales-prediction
Owner: justfifi17
Created: 2024-10-16T04:50:49.000Z (23 days ago)
Default Branch: main
Last Pushed: 2024-10-16T05:30:05.000Z (23 days ago)
Last Synced: 2024-10-17T19:38:44.834Z (21 days ago)
Topics: data-visualization, exploratory-data-analysis, gradient-boosting, linear-regression, random-forest, xgboost-regression
Language: Jupyter Notebook
Homepage:
Size: 1.17 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🏙 NYC Real Estate Sales Prediction

This project uses **machine learning models** to predict real estate sales prices in New York City from 2016-2017. The dataset includes property details like borough, neighborhood, building class, and sales prices.

## Models used

- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
- XGBoost Regressor

## 📊 Features

- **Data Preprocessing**: Handled missing values, removed unnecessary columns, and transformed skewed data using log transformation.
- **Modeling**: Developed models using **Linear Regression**, **Random Forest**, **Gradient Boosting**, **XGBoost**, and others.
- **Best Model**: Random Forest with **RMSE** of 0.45 and **R²** of 0.76 was chosen for its strong performance and feature importance insights.

## 🛠️ Technologies Used

- **Python**: Core language.
- **Pandas, NumPy**: Data manipulation.
- **Seaborn, Matplotlib**: Visualization.
- **Scikit-learn, XGBoost**: Machine learning algorithms.

## 🚀 How to Run

1. Clone this repository:
```bash
git clone
2. Install dependencies:
```bash
pip install -r requirements.txt
3. Run the model:
```bash
python real_estate_sales_prediction.py

## 📈 Results
- Random Forest achieved the best results with an R² of 0.76.
- Featured models were evaluated based on RMSE and R².
## 🧪 Tests
- Train-Test Split: Split data into training (80%) and testing (20%) sets.
- Cross-validation: Performed 5-fold cross-validation to evaluate model performance.
- Hyperparameter Tuning: Used GridSearchCV to tune the hyperparameters for Random Forest and XGBoost models.
- Model Evaluation: Assessed models based on RMSE, MAE, and R².