Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aiaaee/california_housing
It's about California House value. the goal is find a great method to assemble reasonable train test based on MLP regression by using torch
https://github.com/aiaaee/california_housing
data-science datasets deep-learning deep-neural-networks machine-learning machine-learning-algorithms python
Last synced: about 9 hours ago
JSON representation
It's about California House value. the goal is find a great method to assemble reasonable train test based on MLP regression by using torch
- Host: GitHub
- URL: https://github.com/aiaaee/california_housing
- Owner: aiaaee
- License: epl-2.0
- Created: 2024-08-03T05:56:50.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-07T19:52:19.000Z (3 months ago)
- Last Synced: 2024-08-07T23:02:30.051Z (3 months ago)
- Topics: data-science, datasets, deep-learning, deep-neural-networks, machine-learning, machine-learning-algorithms, python
- Language: Jupyter Notebook
- Homepage:
- Size: 458 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# California Housing
## About Dataset
### Context
This is the dataset used in the second chapter of Aurélien Géron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome.The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.
### Content
The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows, their names are pretty self explanitory:
longitude
latitude
housing_median_age
total_rooms
total_bedrooms
population
households
median_income
median_house_value
ocean_proximity
### Acknowledgements
This data was initially featured in the following paper:
Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.and I encountered it in 'Hands-On Machine learning with Scikit-Learn and TensorFlow' by Aurélien Géron.
Aurélien Géron wrote:This dataset is a modified version of the California Housing dataset available from:
[Luís Torgo's](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) page (University of Porto).