https://github.com/dakshdeephere/bank_eda-practice
EDA analysis of Bank.csv dataset
https://github.com/dakshdeephere/bank_eda-practice
analysis data data-visualization dataanalysis matplotlib numpy pandas python3 seaborn
Last synced: about 2 months ago
JSON representation
EDA analysis of Bank.csv dataset
- Host: GitHub
- URL: https://github.com/dakshdeephere/bank_eda-practice
- Owner: dakshdeepHERE
- License: mit
- Created: 2023-09-13T18:01:22.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-24T20:22:10.000Z (over 2 years ago)
- Last Synced: 2025-02-23T14:43:44.289Z (over 1 year ago)
- Topics: analysis, data, data-visualization, dataanalysis, matplotlib, numpy, pandas, python3, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 975 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bank Data Analysis Project
This repository contains a data analysis project that focuses on exploring and analyzing a dataset from a bank. The dataset, stored in a CSV file named `bank_data.csv`, contains various customer-related information, such as age, job, education, and financial details.
## Table of Contents
1. [Introduction](#introduction)
2. [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
3. [Data Analysis](#data-analysis)
- [Importing Libraries](#importing-libraries)
- [Reading Dataset](#reading-dataset)
- [Data Cleaning](#data-cleaning)
- [Dropping Columns](#dropping-columns)
- [Dividing 'jobedu' Column](#dividing-jobedu-in-job-and-education)
- [Handling Missing Values](#handling-missing-values)
- [Finding Duplicates](#finding-duplicates)
- [Outlier Handling](#outlier-handling)
- [Standardizing Variables](#standarize-variable)
- [Univariate Analysis](#univariate-analysis-categorical-features)
- [Bivariate Analysis](#bivariate-analysis)
4. [Conclusion](#conclusion)
5. [Contributing](#contributing)
6. [License](#license)
This data analysis project aims to provide insights into the bank dataset, exploring various aspects of the data such as customer demographics, financial information, and the response variable. The project includes data cleaning, handling missing values, outlier detection, and various visualizations to help understand the data better.
Before running the code in this project, make sure you have the following Python libraries installed:
- Pandas
- NumPy
- Matplotlib
- Seaborn
You can install the required Python libraries using pip:
```bash
pip install pandas numpy matplotlib seaborn
```
The data analysis process is broken down into several steps, as outlined below:
The project starts by importing necessary Python libraries and setting up the environment.
The dataset, stored in the 'bank_data.csv' file, is read into a Pandas DataFrame, and the first few rows are displayed to get an initial overview.
Data cleaning involves removing unwanted rows, columns, or values from the dataset to prepare it for analysis. In this project, some rows with missing or irrelevant data are dropped, and the 'jobedu' column is divided into separate 'job' and 'education' columns.
Unnecessary columns like 'customerid' are dropped to simplify the dataset.
A new `Education` column is created by extracting values from the `jobedu` column.
Missing values in the `age` and `month` columns are identified and handled appropriately. In the `pdays` column, missing values are replaced with NaN.
Duplicate records based on `age` and `response` columns are identified.
Outliers in numerical variables like `age`, `salary`, and `balance` are analyzed using boxplots and quantiles.
The 'duration' variable is standardized to ensure uniformity.
Univariate analysis explores categorical features like `marital`, `job`, `education`, `poutcome`, and the target variable `response`. Visualizations such as bar plots and pie charts provide insights.
Bivariate analysis examines relationships between variables, including numerical-numerical, categorical-numerical, and categorical-categorical relationships. Correlation analysis, boxplots, and heatmaps are used to visualize these relationships.
This data analysis project provides a comprehensive exploration of the bank dataset, covering data cleaning, missing value handling, outlier detection, and various visualizations. The findings and insights gained from this analysis can be valuable for making informed decisions and building predictive models.
Contributions to this project are welcome. If you have suggestions, improvements, or additional analyses to add, please feel free to contribute.
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.