Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/justfifi17/nyc-real-estate-sales-prediction
Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.
https://github.com/justfifi17/nyc-real-estate-sales-prediction
data-visualization exploratory-data-analysis gradient-boosting linear-regression random-forest xgboost-regression
Last synced: about 6 hours ago
JSON representation
Analyzes real estate sales data in NYC by performing exploratory data analysis and building models to predict sale prices based on various features.
- Host: GitHub
- URL: https://github.com/justfifi17/nyc-real-estate-sales-prediction
- Owner: justfifi17
- Created: 2024-10-16T04:50:49.000Z (23 days ago)
- Default Branch: main
- Last Pushed: 2024-10-16T05:30:05.000Z (23 days ago)
- Last Synced: 2024-10-17T19:38:44.834Z (21 days ago)
- Topics: data-visualization, exploratory-data-analysis, gradient-boosting, linear-regression, random-forest, xgboost-regression
- Language: Jupyter Notebook
- Homepage:
- Size: 1.17 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ NYC Real Estate Sales Prediction
This project uses **machine learning models** to predict real estate sales prices in New York City from 2016-2017. The dataset includes property details like borough, neighborhood, building class, and sales prices.
## Models used
- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
- XGBoost Regressor
## ๐ Features- **Data Preprocessing**: Handled missing values, removed unnecessary columns, and transformed skewed data using log transformation.
- **Modeling**: Developed models using **Linear Regression**, **Random Forest**, **Gradient Boosting**, **XGBoost**, and others.
- **Best Model**: Random Forest with **RMSE** of 0.45 and **Rยฒ** of 0.76 was chosen for its strong performance and feature importance insights.## ๐ ๏ธ Technologies Used
- **Python**: Core language.
- **Pandas, NumPy**: Data manipulation.
- **Seaborn, Matplotlib**: Visualization.
- **Scikit-learn, XGBoost**: Machine learning algorithms.## ๐ How to Run
1. Clone this repository:
```bash
git clone
2. Install dependencies:
```bash
pip install -r requirements.txt
3. Run the model:
```bash
python real_estate_sales_prediction.py## ๐ Results
- Random Forest achieved the best results with an Rยฒ of 0.76.
- Featured models were evaluated based on RMSE and Rยฒ.
## ๐งช Tests
- Train-Test Split: Split data into training (80%) and testing (20%) sets.
- Cross-validation: Performed 5-fold cross-validation to evaluate model performance.
- Hyperparameter Tuning: Used GridSearchCV to tune the hyperparameters for Random Forest and XGBoost models.
- Model Evaluation: Assessed models based on RMSE, MAE, and Rยฒ.