Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pratanup/exploratory-data-analysis-eda-

Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDA
https://github.com/pratanup/exploratory-data-analysis-eda-

data-analytics data-science data-visualization exploratory-data-analysis python

Last synced: 10 days ago
JSON representation

Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDA

Host: GitHub
URL: https://github.com/pratanup/exploratory-data-analysis-eda-
Owner: PratanuP
Created: 2023-02-22T12:16:27.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-02-22T12:23:50.000Z (almost 2 years ago)
Last Synced: 2024-11-08T10:20:07.989Z (2 months ago)
Topics: data-analytics, data-science, data-visualization, exploratory-data-analysis, python
Language: Jupyter Notebook
Homepage:
Size: 784 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Exploratory-Data-Analysis-EDA-

Objective is to make this data ready for modeling by transforming the given data into clean data by doing EDA

Task1: Import the Travel1.csv in jupyter notebook and check the number of records and columns. Then calculate the basic statistics like mean,min,max,25%ile etc for numeric variables and count,freq etc for object variables. Describe function can be used for this.

Task2: Look for any outliers in age and salary columns. If found, replace it with the mean values of respective columns. 

Task3: Look for missing values in the data. If found in numeric variables, replace them with median and if found in object variables, replace them with mode.

Task4: In gender column, female is misspelled as fe male. Change this to female.

Task5: Remove customer ID from the dataframe and then find the correlation of all variables with each other. Show the result in both formats (tabular format to see magnitude and direction by using corr function and in visualization format using heatmap.)

Task6: Look for duplicate rows and delete them.

Task7: Show the frequency distribution of gender like number of males and females.

Task8: Convert the categorical variables like Occupation, gender etc into numeric variables to make them ready for modelling. You can use any technique for this like one hot encoding, label encoding etc.

Task9: Run the automatic EDA using Pandas profiling.

Task10: Run the automatic EDA using D-tale.