Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andryadsm/predicting-house-prices

🏘️ Project Predicting House Prices (Python)
https://github.com/andryadsm/predicting-house-prices

data-analysis data-preprocessing data-visualization feature-engineering house-prices machine-learning matplotlib numpy pandas python real-estate seaborn sklearn

Last synced: 4 days ago
JSON representation

🏘️ Project Predicting House Prices (Python)

Awesome Lists containing this project

README

        

# Predicting House Prices

🌐 Check this project on [my website](https://aadsm2355.wixsite.com/andryadsm/predicting-house-prices)!

Go to [Kaggle](https://www.kaggle.com/code/andrydasilva/house-prices-with-feature-analyzer).

## Files
- 'house-prices-with-feature-analyzer.ipynb' is the Python Notebook where all the work was done. Works well only inside Kaggle.
- 'submission.csv' is the final output that is submitted to the Kaggle Competition

You can also get the code (.ipynb) and output files in Kaggle. Note that they will work well only inside Kaggle.

---

### 📌 Type
Kaggle Competition, Regression.

### ⚜️ Domain
Real Estate, House Prices.

### 💻 Technologies
- Python (Kaggle Notebook)
- pandas
- numpy
- sklearn
- matplotlib
- seaborn

### 🕹️ Skills
- Machine Learning
- Data Preprocessing
- Feature Engineering
- Data Visualization
- Data Analysis

---

🏘️ Worked on the Kaggle competition "[House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)" where I successfully predicted the sale price of 1459 houses from a dataset of 1460 records of 79 features using Python 🐍.

🔎 Performed Exploratory Data Analysis (EDA), looking deep for missing values, distributions, counts, correlations and more with a lot of use of pandas, matplotlib and seaborn.

📊 Created a "Feature Analyzer", really helpful for EDA, which gives relevant information and plots to quickly get useful insights about a certain feature, categorical or numerical, taking advantage of matplotlib and seaborn.

🧹 Used pandas, numpy and sklearn for cleaning and preprocessing, changing data types, ordinal encoding, dummies, lots of feature engineering 🛠️ and more.

🤖 Tested different models, including several from sklearn, like RandomForestRegressor and GradientBoostingRegressor optimizing with GridSearchCV, concluded with CatBoostRegressor as the best model.

🧾 Evaluated performance with a custom scorer, RMSLE (root-mean-squared-log-error), and got 0.12236, which is as high as top 10% of competitors 🏆.

---

![p1_numr](https://github.com/AndryADSM/Predicting-House-Prices/assets/150280431/62ba6a86-267c-4a04-bc4c-05f78b130255)

---