https://github.com/mikel-brostrom/housing_price_prediction

California housing price prediction with NN, Random Forest and Linear Regression
https://github.com/mikel-brostrom/housing_price_prediction

california-housing-price-prediction data-cleaning feature-engineering linear-regression pca-analysis random-forest

Last synced: 4 months ago
JSON representation

California housing price prediction with NN, Random Forest and Linear Regression

Host: GitHub
URL: https://github.com/mikel-brostrom/housing_price_prediction
Owner: mikel-brostrom
Created: 2020-05-09T20:42:51.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-05-31T20:21:10.000Z (about 5 years ago)
Last Synced: 2024-12-29T15:44:30.835Z (6 months ago)
Topics: california-housing-price-prediction, data-cleaning, feature-engineering, linear-regression, pca-analysis, random-forest
Language: Jupyter Notebook
Homepage:
Size: 1.33 MB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Housing_Price_Prediction

The idea of this project was to create a predictor on the california housing dataset. The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data.

## The data

The data is comprised of 8 attributes

* MedInc median income in block

* HouseAge median house age in block

* AveRooms average number of rooms

* AveBedrms average number of bedrooms

* Population block population

* AveOccup average house occupancy

* Latitude house block latitude

* Longitude house block longitude

as well as the target, the housing price

## Training

`train.py` runs the training for three different models: NN, linear regression and random forest on the scikit-learn california housing dataset:

```bash
python3 train.py
```

Training output example:

```bash
...

Train Epoch: 150 [0/18576 (0%)] Loss: 0.130984
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 143.57it/s]
5
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 102.93it/s]
NN
r2 score: 0.8232532827300671
MAE score: 0.2833885734455647
Linear regressor
r2 score: 0.6098033978087847
MAE score: 0.4635741867691994
Random forest
r2 score: 0.8138137169848451
MAE score: 0.2837869675577879
...
```

### The network

We use a stack fully connected layers with ReLU. The r2 score and MAE was used for evaluating the models

## Conclusion

The neural network trained on the standardized signals gave the best model with an R2 score of 82.4. The models trained on the first two principal componentes gave a poor result even if they accounted for ~96% of the data variance.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mikel-brostrom/housing_price_prediction

Awesome Lists containing this project

README