https://github.com/mikel-brostrom/housing_price_prediction
California housing price prediction with NN, Random Forest and Linear Regression
https://github.com/mikel-brostrom/housing_price_prediction
california-housing-price-prediction data-cleaning feature-engineering linear-regression pca-analysis random-forest
Last synced: 4 months ago
JSON representation
California housing price prediction with NN, Random Forest and Linear Regression
- Host: GitHub
- URL: https://github.com/mikel-brostrom/housing_price_prediction
- Owner: mikel-brostrom
- Created: 2020-05-09T20:42:51.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2020-05-31T20:21:10.000Z (about 5 years ago)
- Last Synced: 2024-12-29T15:44:30.835Z (6 months ago)
- Topics: california-housing-price-prediction, data-cleaning, feature-engineering, linear-regression, pca-analysis, random-forest
- Language: Jupyter Notebook
- Homepage:
- Size: 1.33 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Housing_Price_Prediction
The idea of this project was to create a predictor on the california housing dataset. The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data.
## The data
The data is comprised of 8 attributes
* MedInc median income in block
* HouseAge median house age in block
* AveRooms average number of rooms
* AveBedrms average number of bedrooms
* Population block population
* AveOccup average house occupancy
* Latitude house block latitude
* Longitude house block longitude
as well as the target, the housing price
## Training
`train.py` runs the training for three different models: NN, linear regression and random forest on the scikit-learn california housing dataset:
```bash
python3 train.py
```Training output example:
```bash
...Train Epoch: 150 [0/18576 (0%)] Loss: 0.130984
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 143.57it/s]
5
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 102.93it/s]
NN
r2 score: 0.8232532827300671
MAE score: 0.2833885734455647
Linear regressor
r2 score: 0.6098033978087847
MAE score: 0.4635741867691994
Random forest
r2 score: 0.8138137169848451
MAE score: 0.2837869675577879
...
```### The network
We use a stack fully connected layers with ReLU. The r2 score and MAE was used for evaluating the models
## Conclusion
The neural network trained on the standardized signals gave the best model with an R2 score of 82.4. The models trained on the first two principal componentes gave a poor result even if they accounted for ~96% of the data variance.