Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/justfifi17/nyc-real-estate-sales-prediction

Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.
https://github.com/justfifi17/nyc-real-estate-sales-prediction

data-visualization exploratory-data-analysis gradient-boosting linear-regression random-forest xgboost-regression

Last synced: about 6 hours ago
JSON representation

Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.

Awesome Lists containing this project

README

        

# ๐Ÿ™ NYC Real Estate Sales Prediction

This project uses **machine learning models** to predict real estate sales prices in New York City from 2016-2017. The dataset includes property details like borough, neighborhood, building class, and sales prices.

## Models used

- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
- XGBoost Regressor

## ๐Ÿ“Š Features

- **Data Preprocessing**: Handled missing values, removed unnecessary columns, and transformed skewed data using log transformation.
- **Modeling**: Developed models using **Linear Regression**, **Random Forest**, **Gradient Boosting**, **XGBoost**, and others.
- **Best Model**: Random Forest with **RMSE** of 0.45 and **Rยฒ** of 0.76 was chosen for its strong performance and feature importance insights.

## ๐Ÿ› ๏ธ Technologies Used

- **Python**: Core language.
- **Pandas, NumPy**: Data manipulation.
- **Seaborn, Matplotlib**: Visualization.
- **Scikit-learn, XGBoost**: Machine learning algorithms.

## ๐Ÿš€ How to Run

1. Clone this repository:
```bash
git clone
2. Install dependencies:
```bash
pip install -r requirements.txt
3. Run the model:
```bash
python real_estate_sales_prediction.py

## ๐Ÿ“ˆ Results
- Random Forest achieved the best results with an Rยฒ of 0.76.
- Featured models were evaluated based on RMSE and Rยฒ.
## ๐Ÿงช Tests
- Train-Test Split: Split data into training (80%) and testing (20%) sets.
- Cross-validation: Performed 5-fold cross-validation to evaluate model performance.
- Hyperparameter Tuning: Used GridSearchCV to tune the hyperparameters for Random Forest and XGBoost models.
- Model Evaluation: Assessed models based on RMSE, MAE, and Rยฒ.