Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andryadsm/predicting-house-prices
🏘️ Project Predicting House Prices (Python)
https://github.com/andryadsm/predicting-house-prices
data-analysis data-preprocessing data-visualization feature-engineering house-prices machine-learning matplotlib numpy pandas python real-estate seaborn sklearn
Last synced: 4 days ago
JSON representation
🏘️ Project Predicting House Prices (Python)
- Host: GitHub
- URL: https://github.com/andryadsm/predicting-house-prices
- Owner: AndryADSM
- Created: 2024-02-28T20:24:16.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-10-15T18:08:48.000Z (4 months ago)
- Last Synced: 2024-12-25T21:21:25.442Z (about 2 months ago)
- Topics: data-analysis, data-preprocessing, data-visualization, feature-engineering, house-prices, machine-learning, matplotlib, numpy, pandas, python, real-estate, seaborn, sklearn
- Language: Jupyter Notebook
- Homepage: https://aadsm2355.wixsite.com/andryadsm/predicting-house-prices
- Size: 621 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Predicting House Prices
🌐 Check this project on [my website](https://aadsm2355.wixsite.com/andryadsm/predicting-house-prices)!
Go to [Kaggle](https://www.kaggle.com/code/andrydasilva/house-prices-with-feature-analyzer).
## Files
- 'house-prices-with-feature-analyzer.ipynb' is the Python Notebook where all the work was done. Works well only inside Kaggle.
- 'submission.csv' is the final output that is submitted to the Kaggle CompetitionYou can also get the code (.ipynb) and output files in Kaggle. Note that they will work well only inside Kaggle.
---
### 📌 Type
Kaggle Competition, Regression.### ⚜️ Domain
Real Estate, House Prices.
### 💻 Technologies
- Python (Kaggle Notebook)
- pandas
- numpy
- sklearn
- matplotlib
- seaborn### 🕹️ Skills
- Machine Learning
- Data Preprocessing
- Feature Engineering
- Data Visualization
- Data Analysis---
🏘️ Worked on the Kaggle competition "[House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques)" where I successfully predicted the sale price of 1459 houses from a dataset of 1460 records of 79 features using Python 🐍.
🔎 Performed Exploratory Data Analysis (EDA), looking deep for missing values, distributions, counts, correlations and more with a lot of use of pandas, matplotlib and seaborn.
📊 Created a "Feature Analyzer", really helpful for EDA, which gives relevant information and plots to quickly get useful insights about a certain feature, categorical or numerical, taking advantage of matplotlib and seaborn.
🧹 Used pandas, numpy and sklearn for cleaning and preprocessing, changing data types, ordinal encoding, dummies, lots of feature engineering 🛠️ and more.
🤖 Tested different models, including several from sklearn, like RandomForestRegressor and GradientBoostingRegressor optimizing with GridSearchCV, concluded with CatBoostRegressor as the best model.
🧾 Evaluated performance with a custom scorer, RMSLE (root-mean-squared-log-error), and got 0.12236, which is as high as top 10% of competitors 🏆.
---
data:image/s3,"s3://crabby-images/3064b/3064bdea1ede4cab052e4e90366dc2c4f69ddce2" alt="p1_numr"
---