https://github.com/msikorski93/predicting-prices-on-king-county-housing-dataset
  
  
    Predicting house prices using different regression analysis models. 
    https://github.com/msikorski93/predicting-prices-on-king-county-housing-dataset
  
catboost eda gradient-boosting king-county lightgbm linear-regression machine-learning neural-network polynomial-regression real-estate regression-models scikit-learn tensorflow xgboost
        Last synced: 8 days ago 
        JSON representation
    
Predicting house prices using different regression analysis models.
- Host: GitHub
 - URL: https://github.com/msikorski93/predicting-prices-on-king-county-housing-dataset
 - Owner: msikorski93
 - Created: 2021-10-10T19:36:45.000Z (about 4 years ago)
 - Default Branch: main
 - Last Pushed: 2024-01-18T14:44:38.000Z (almost 2 years ago)
 - Last Synced: 2025-01-09T07:51:11.738Z (10 months ago)
 - Topics: catboost, eda, gradient-boosting, king-county, lightgbm, linear-regression, machine-learning, neural-network, polynomial-regression, real-estate, regression-models, scikit-learn, tensorflow, xgboost
 - Language: Jupyter Notebook
 - Homepage:
 - Size: 5.59 MB
 - Stars: 0
 - Watchers: 1
 - Forks: 0
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 
 
Awesome Lists containing this project
README
          # Predicting-Prices-on-King-County-Housing-Dataset






Predicting house prices using different regression analysis models.
### New notebooks for January 2024
A new regression task was introduced. Ten different models were fitted and performed with the following evaluation metrics:
| Model          | RMSE     | RMSLE  | R2 | Adjusted R2 | Variance
Regression | 5-Fold Cross-
Validation |
|----------------|----------|--------|---------------|------------------------|------------------------|-----------------------------|
| Neural Network | 110582.6 | 0.7188 | 0.8819        | 0.8812                 | 0.8837                 | 0.8717                      |
| Random Forest  | 105662.2 | 0.1678 | 0.8870        | 0.8863                 | 0.8870                 | 0.8868                      |
| Decision Tree  | 146264.0 | 0.2482 | 0.7835        | 0.7822                 | 0.7836                 | 0.7729                      |
| CatBoost       | 91614.5  | 0.1511 | 0.9151        | 0.9146                 | 0.9151                 | 0.9142                      |
| XGBoost        | 98811.4  | 0.1593 | 0.9012        | 0.9006                 | 0.9012                 | 0.8951                      |
| LightGBM       | 96765.5  | 0.1586 | 0.9052        | 0.9047                 | 0.9053                 | 0.9054                      |
| Polynomial     | 116431.2 | 0.1968 | 0.8628        | 0.8620                 | 0.8629                 | 0.8631                      |
| Lasso          | 110881.3 | 0.1833 | 0.8756        | 0.8749                 | 0.8756                 | 0.8745                      |
| Ridge          | 116619.7 | 0.1959 | 0.8624        | 0.8616                 | 0.8625                 | 0.8628                      |
| ElasticNet     | 119081.6 | 0.1966 | 0.8565        | 0.8557                 | 0.8566                 | 0.8592                      |
Among these chosen models, the CatBoost (categorical boosting) regressor outperformed all of the other regressors and was selected as the final model to predict house prices. The model was optimized with hyperparameter tuning and evaluated once more.
| RMSE     | RMSLE  | R2 | Adjusted R2 | Variance
Regression | 5-Fold Cross-
Validation |
|----------|--------|---------------|------------------------|------------------------|-----------------------------|
| 91404.4  | 0.1508 | 0.9155        | 0.9150                 | 0.9155                 | 0.8912                      |
This analysis concluded that for the King County area the most influencial price factors are:
* square footage of the real estate;
* location, exactly the combination of variables like: latitude, longitude, distance from downtown;
* the real estate's grade (general condition).