https://github.com/praveendecode/us_hpi_prediction
Explore the dynamics of US home prices over two decades using a robust Random Forest Regressor model. Achieving a 99.87% R2 score, uncover key factors influencing real estate trends
https://github.com/praveendecode/us_hpi_prediction
exploratory-data-analysis feature-engineering feature-importance insights model-building model-selection predictive-modeling python usahpi
Last synced: 30 days ago
JSON representation
Explore the dynamics of US home prices over two decades using a robust Random Forest Regressor model. Achieving a 99.87% R2 score, uncover key factors influencing real estate trends
- Host: GitHub
- URL: https://github.com/praveendecode/us_hpi_prediction
- Owner: praveendecode
- Created: 2023-12-10T13:52:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-08T07:05:01.000Z (over 1 year ago)
- Last Synced: 2025-02-09T13:35:04.573Z (3 months ago)
- Topics: exploratory-data-analysis, feature-engineering, feature-importance, insights, model-building, model-selection, predictive-modeling, python, usahpi
- Language: Jupyter Notebook
- Homepage:
- Size: 1.54 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Overview
- This repository contains a data science project focused on understanding the key factors influencing US home prices over the last 20 years.
- The project utilizes the S&P Case-Schiller Home Price Index as a proxy for home prices and explores various economic indicators to build a predictive model.
- The chosen model is a Random Forest Regressor, achieving a remarkable R2 score of 99.87%
# Problem Statement
- The objective is to identify and analyze factors that significantly impact home prices in the United States.
- By leveraging publicly available data, we aim to build a robust predictive model that explains the variations in the S&P Case-Schiller Home Price Index over the past two decades.# Main Features
The following features were selected for the predictive model:
### Zillow Home Value Index (USAUCSFRCONDOSMSAMID):
- Median market value of all homes.
- Reflects overall trends in home values, providing insights into market conditions and price movements.### Average Sales Price for New Houses Sold (ASPNHSUS):
- Reflects the average cost of newly constructed homes.
- Influences perceptions of the affordability of new housing.### Median Sales Price for New Houses Sold (MSPNHSUS):
- Provides insights into the typical price of newly sold houses.
- Helps understand the distribution of new home prices.### Total Construction Spending: Residential (TLRESCONS):
- Reflects the level of investment in residential construction.
- Influences housing supply, potentially affecting home prices.### Gross Domestic Income (GDI):
- Influences economic conditions and affects consumer confidence in buying houses.### Consumer Price Index for All Urban Consumers: Housing (CPIHOSNS):
- Indicates inflation in housing costs, potentially impacting home prices.### Total Population: All Ages (POP):
- Population growth influences housing demand, potentially affecting home prices.### National Totals of State and Local Tax Revenue: Property Taxes (QTAXT01QTAXCAT1USNO):
- Property taxes influence the overall cost of living.### New Privately-Owned Housing Units Authorized (PERMIT):
- Authorized housing units can influence home prices and increase demand.### Monthly Supply of New Houses (MSACSR):
- Reflects the balance between housing supply and demand.### 30-Year Fixed Rate Mortgage Average (MORTGAGE30US):
- Provides insights about a fixed interest rate for 30 years that buyers can afford for a loan.### University of Michigan: Consumer Sentiment (UMCSENT):
- Provides insights into consumer sentiment about the economy and the housing market.
### Unemployed Population: Aged 25-54 (LFUN25TTUSM647S) and Unemployment Rate (UNRATE):
- Influences job security; more job security improves purchasing ability.
### Housing Inventory Estimate: Vacant Units (EVACANTUSQ176N):
- Provides insights into vacant unit availability, market conditions, and supply and demand balance.
### Federal Funds Effective Rate (FEDFUNDS):
- Changing in the federal reserve; adjusts the federal funds rate may influence the mortgage rate.
# Model Process
### Data Cleaning and Imputation:
- Utilized machine learning models to impute missing values in the dataset.### Exploratory Data Analysis (EDA):
- Conducted thorough EDA to understand the relationships between features and the target variable.### Feature Engineering:
- Engineered relevant features to improve model performance.### Model Selection:
- Chose the Random Forest Regressor based on its outstanding R2 score of 99.87%.### Model Evaluation:
- Utilized R2 score, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) for model evaluation.
### Feature Importance :- After training the Random Forest Regressor model, the feature importance analysis has done to reveal the predictor variables that contributes more when predicting target variable.
# Tools Covered
- Programming Langauge : Python
- Code Notebook : Google Colab-notebook
- Data Collection: FRED (Federal Reserve Economic Data)
- Data Cleaning: Machine Learning Imputation- Exploratory Data Analysis: Pandas, Matplotlib, Seaborn
- Model Building: Scikit-Learn (Random Forest Regressor)
- Model Evaluation: R2 Score, MAE, RMSE# Results
- Model Fit (R2 Score): 99.87%
- MAE (Mean Absolute Error): 1.39
- RMSE (Root Mean Squared Error): 2.37
- The Random Forest Regressor demonstrated superior performance in minimizing errors and capturing the variance in the target variable compared to other models