https://github.com/andryadsm/predicting-house-prices
🏘️ Project Predicting House Prices (Python)
https://github.com/andryadsm/predicting-house-prices
data-analysis data-preprocessing data-visualization feature-engineering house-prices machine-learning matplotlib numpy pandas python real-estate seaborn sklearn
Last synced: 3 months ago
JSON representation
🏘️ Project Predicting House Prices (Python)
- Host: GitHub
- URL: https://github.com/andryadsm/predicting-house-prices
- Owner: AndryADSM
- Created: 2024-02-28T20:24:16.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-15T18:08:48.000Z (over 1 year ago)
- Last Synced: 2025-05-14T12:57:03.955Z (about 1 year ago)
- Topics: data-analysis, data-preprocessing, data-visualization, feature-engineering, house-prices, machine-learning, matplotlib, numpy, pandas, python, real-estate, seaborn, sklearn
- Language: Jupyter Notebook
- Homepage: https://aadsm2355.wixsite.com/andryadsm/predicting-house-prices
- Size: 621 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Predicting House Prices
🌐 Check this project on [my website](https://aadsm2355.wixsite.com/andryadsm/predicting-house-prices)!
Go to [Kaggle](https://www.kaggle.com/code/andrydasilva/house-prices-with-feature-analyzer).
## Files
- 'house-prices-with-feature-analyzer.ipynb' is the Python Notebook where all the work was done. Works well only inside Kaggle.
- 'submission.csv' is the final output that is submitted to the Kaggle Competition
You can also get the code (.ipynb) and output files in Kaggle. Note that they will work well only inside Kaggle.
---
### 📌 Type
Kaggle Competition, Regression.
### ⚜️ Domain
Real Estate, House Prices.
### 💻 Technologies
- Python (Kaggle Notebook)
- pandas
- numpy
- sklearn
- matplotlib
- seaborn
### 🕹️ Skills
- Machine Learning
- Data Preprocessing
- Feature Engineering
- Data Visualization
- Data Analysis
---
🏘️ Worked on the Kaggle competition "[House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)" where I successfully predicted the sale price of 1459 houses from a dataset of 1460 records of 79 features using Python 🐍.
🔎 Performed Exploratory Data Analysis (EDA), looking deep for missing values, distributions, counts, correlations and more with a lot of use of pandas, matplotlib and seaborn.
📊 Created a "Feature Analyzer", really helpful for EDA, which gives relevant information and plots to quickly get useful insights about a certain feature, categorical or numerical, taking advantage of matplotlib and seaborn.
🧹 Used pandas, numpy and sklearn for cleaning and preprocessing, changing data types, ordinal encoding, dummies, lots of feature engineering 🛠️ and more.
🤖 Tested different models, including several from sklearn, like RandomForestRegressor and GradientBoostingRegressor optimizing with GridSearchCV, concluded with CatBoostRegressor as the best model.
🧾 Evaluated performance with a custom scorer, RMSLE (root-mean-squared-log-error), and got 0.12236, which is as high as top 10% of competitors 🏆.
---

---