Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pratanup/exploratory-data-analysis-eda-
Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDA
https://github.com/pratanup/exploratory-data-analysis-eda-
data-analytics data-science data-visualization exploratory-data-analysis python
Last synced: about 5 hours ago
JSON representation
Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDA
- Host: GitHub
- URL: https://github.com/pratanup/exploratory-data-analysis-eda-
- Owner: PratanuP
- Created: 2023-02-22T12:16:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-02-22T12:23:50.000Z (over 1 year ago)
- Last Synced: 2023-09-29T21:06:35.161Z (about 1 year ago)
- Topics: data-analytics, data-science, data-visualization, exploratory-data-analysis, python
- Language: Jupyter Notebook
- Homepage:
- Size: 784 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Exploratory-Data-Analysis-EDA-
Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDATask1: Import the Travel1.csv in jupyter notebook and check the number of records and columns. Then calculate the basic statistics like mean,min,max,25%ile etc for numeric variables and count,freq etc for object variables. Describe function can be used for this.
Task2: Look for any outliers in age and salary columns. If found, replace it with the mean values of respective columns.
Task3: Look for missing values in the data. If found in numeric variables, replace them with median and if found in object variables, replace them with mode.
Task4: In gender column, female is misspelled as fe male. Change this to female.
Task5: Remove customer ID from the dataframe and then find the correlation of all variables with each other. Show the result in both formats (tabular format to see magnitude and direction by using corr function and in visualization format using heatmap.)
Task6: Look for duplicate rows and delete them.
Task7: Show the frequency distribution of gender like number of males and females.
Task8: Convert the categorical variables like Occupation, gender etc into numeric variables to make them ready for modelling. You can use any technique for this like one hot encoding, label encoding etc.
Task9: Run the automatic EDA using Pandas profiling.
Task10: Run the automatic EDA using D-tale.