Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abhirajp595/python
Data Science Project using Python
https://github.com/abhirajp595/python
data-analysis data-science data-visualization eda jyputer-notebook numpy pandas statistics
Last synced: about 7 hours ago
JSON representation
Data Science Project using Python
- Host: GitHub
- URL: https://github.com/abhirajp595/python
- Owner: Abhirajp595
- Created: 2024-07-17T04:38:34.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-07-17T05:19:21.000Z (4 months ago)
- Last Synced: 2024-07-18T07:13:26.119Z (4 months ago)
- Topics: data-analysis, data-science, data-visualization, eda, jyputer-notebook, numpy, pandas, statistics
- Language: Jupyter Notebook
- Homepage:
- Size: 19.5 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Python
Project Statement:
While searching for the dream house, the buyer looks at various factors, not just at the height of the basement ceiling or the proximity to an east-west railroad.
Using the dataset, find the factors that influence price negotiations while buying a house.
There are 79 explanatory variables describing every aspect of residential homes in Ames, Iowa.Dataset Description:
Variable: Description
SalePrice : The property's sale price is in dollars. This is the target variable that you're trying to predict.
MSSubClass : The building class
MSZoning : The general zoning classification
LotFrontage : Linear feet of street connected to property
LotArea : Lot size in square feet
Street : Type of road access
Alley Type : of alley access
LotShape : General shape of property
LandContour : Flatness of the property
Utilities : Type of utilities available
LotConfig : Lot configuration
LandSlope : Slope of property
Neighborhood : Physical locations within Ames city limits
Condition1 : Proximity to main road or railroad
Condition2 : Proximity to main road or railroad (if a second is present)
BldgType : Type of dwelling
HouseStyle : Style of dwelling
OverallQual : Overall material and finish quality
OverallCond : Overall condition rating
YearBuilt : Original construction date
YearRemodAdd : Remodel date
RoofStyle : Type of roof
RoofMatl : Roof material
Exterior1st : Exterior covering on house
Exterior2nd : Exterior covering on house (if more than one material)
MasVnrType : Masonry veneer type
MasVnrArea : Masonry veneer area in square feet
ExterQual : Exterior material quality
ExterCond : Present condition of the material on the exterior
Foundation : Type of foundation
BsmtQual : Height of the basement
BsmtCond : General condition of the basement
BsmtExposure : Walkout or garden level basement walls
BsmtFinType1 : Quality of the basement finished area
BsmtFinSF1 : Type 1 finished square feet
BsmtFinType2 : Quality of second finished area (if present)
BsmtFinSF2 : Type 2 finished square feet
BsmtUnfSF : Unfinished square feet of basement area
TotalBsmtSF : Total square feet of basement area
Heating : Type of heating
HeatingQC : Heating quality and condition
CentralAir : Central air conditioning
Electrical : Electrical system
1stFlrSF : First Floor square feet
2ndFlrSF : Second floor square feet
LowQualFinSF : Low quality finished square feet (all floors)
GrLivArea : Above grade (ground) living area square feet
BsmtFullBath : Basement full bathrooms
BsmtHalfBath : Basement half bathrooms
FullBath : Full bathrooms above grade
HalfBath : Half bathrooms above grade
Bedroom : Number of bedrooms above basement level
Kitchen : Number of kitchens
KitchenQual : Kitchen quality
TotRmsAbvGrd : Total rooms above grade (does not include bathrooms)
Functional : Home functionality rating
Fireplaces : Number of fireplaces
FireplaceQu : Fireplace quality
GarageType : Garage location
GarageYrBlt : Year garage was built
GarageFinish : Interior finish of the garage
GarageCars : Size of the garage in car capacity
GarageArea : Size of the garage in square feet
GarageQual : Garage quality
GarageCond : Garage condition
PavedDrive : Paved driveway
WoodDeckSF : Wood deck area in square feet
OpenPorchSF : Open porch area in square feet
EnclosedPorch : Enclosed porch area in square feet
3SsnPorch : Three season porch area in square feet
ScreenPorch : Screen porch area in square feet
PoolArea : Pool area in square feet
PoolQC : Pool quality
Fence : Fence quality
MiscFeature : Miscellaneous feature not covered in other categories
MiscVal : $Value of miscellaneous feature
MoSold : Month Sold
YrSold : Year Sold
SaleType : Type of sale
SaleCondition : Condition of sale
Perform the following steps:
1. Understand the dataset:
a. Identify the shape of the dataset
b. Identify variables with null values
c. Identify variables with unique values2. Generate a separate dataset for numerical and categorical variables
3. EDA of numerical variables:
a. Missing value treatment
b. Identify the skewness and distribution
c. Identify significant variables using a correlation matrix
d. Pair plot for distribution and density
4. EDA of categorical variables
a. Missing value treatment
b. Count plot for bivariate analysis
c. Identify significant variables using p-values and Chi-Square values
5. Combine all the significant categorical and numerical variables
6. Plot box plot for the new dataset to find the variables with outliers
Note: The last two points are performed to make the new dataset ready for training and prediction.