https://github.com/aniruddhakhedkar/descriptive_statistical_analysis_for_universal_bank_personal_loan_modelling_dataset
Python_Descriptive_Statistics_Project
https://github.com/aniruddhakhedkar/descriptive_statistical_analysis_for_universal_bank_personal_loan_modelling_dataset
central-tendency dispersion matplotlib pandas seaborn
Last synced: 2 months ago
JSON representation
Python_Descriptive_Statistics_Project
- Host: GitHub
- URL: https://github.com/aniruddhakhedkar/descriptive_statistical_analysis_for_universal_bank_personal_loan_modelling_dataset
- Owner: Aniruddhakhedkar
- Created: 2024-07-24T07:12:36.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-20T14:02:24.000Z (12 months ago)
- Last Synced: 2025-01-07T11:26:33.377Z (9 months ago)
- Topics: central-tendency, dispersion, matplotlib, pandas, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 235 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Title:- Data Analysis with Python for a Universal Bank's Personal Loan Modelling Dataset
### Project Description:-
This project involves conducting descriptive statistical analysis on the Bank Personal Loan Modelling dataset to gain insights into critical customer characteristics and their spending habits. The analysis focuses on measures of central tendency (mean, median, mode) and measures of dispersion (standard deviation, variance, skewness, kurtosis, interquartile range, and range).
Additionally, statistical visualization techniques are employed to visually represent important variables within the dataset. This includes the use of boxplots, scatterplots, density plots, and histograms to illustrate the distributions and relationships of key variables.
Furthermore, correlation analysis is conducted to evaluate the strength and direction of relationships among significant continuous variables.
## Tools/Software:- Python(pandas, matplotlib, and seaborn)
## Tasks/Objectives:-
1. Determination of the statistical summary for all the variables in the dataset.
2. Evaluation of the measures of central tendency and measures of dispersion for all the quantitative variables in the dataset.
3. To examine the presence of a linear relationship between age and experience variables? and create a plot to illustrate this relationship.
4. To find the most frequent family size observed for the customer.
5. What is the percentage of variation present in the ‘Income’ variable?
6. Imputation of ‘Mortgage’ variable to improve the insights.
7. Plot a density curve of the CCAvg variable for the customers who possess credit cards and make recommendations from the distribution.
8. To plot the outliers present in dataset to showcase to the stakeholders.
9. Find the decile values of the variable ‘Income’ in the dataset
10. Calculate the IQR of all the variables which are quantitative and continuous.
11. Do the higher-income holders spend more on credit cards?
12. How many customers use online banking? Do customers using bank internet facilities have higher incomes?
13. Using the z-score of the income variable, evaluate the number of observations/IDs outside the +-3σ.
## Data Description/Data Dictionary:-
1. ID- Customer ID
2. Age- Customer's age in completed years
3. Experience- Years of professional experience
4. Income- Annual income of the customer ($000)
5. ZIPCode- Home address ZIP code
6. Family- Family size of the customer
7. CCAvg- Avg. spending on credit cards per month ($000)
8. Education- Education level of customer 1-Undergraduate, 2-Graduate, 3-Advanced/Professional
9. Mortgage- Value of house mortgage if any ($000)
10. Personal Loan- Did this customer accepted the personal loan offered in the last campaign?
11. Securities Account- Does the customer have securities account with the bank?
12. CD Account- Does the customer have a Certificate of Deposit (CD) account with the bank?
13. Online- Does the customer use internet banking facilities?
14. CreditCard- Does the customer use a credit card issued by the bank?