An open API service indexing awesome lists of open source software.

https://github.com/dakshdeephere/bank_eda-practice

EDA analysis of Bank.csv dataset
https://github.com/dakshdeephere/bank_eda-practice

analysis data data-visualization dataanalysis matplotlib numpy pandas python3 seaborn

Last synced: about 2 months ago
JSON representation

EDA analysis of Bank.csv dataset

Awesome Lists containing this project

README

          

# Bank Data Analysis Project

This repository contains a data analysis project that focuses on exploring and analyzing a dataset from a bank. The dataset, stored in a CSV file named `bank_data.csv`, contains various customer-related information, such as age, job, education, and financial details.

## Table of Contents

1. [Introduction](#introduction)
2. [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
3. [Data Analysis](#data-analysis)
- [Importing Libraries](#importing-libraries)
- [Reading Dataset](#reading-dataset)
- [Data Cleaning](#data-cleaning)
- [Dropping Columns](#dropping-columns)
- [Dividing 'jobedu' Column](#dividing-jobedu-in-job-and-education)
- [Handling Missing Values](#handling-missing-values)
- [Finding Duplicates](#finding-duplicates)
- [Outlier Handling](#outlier-handling)
- [Standardizing Variables](#standarize-variable)
- [Univariate Analysis](#univariate-analysis-categorical-features)
- [Bivariate Analysis](#bivariate-analysis)
4. [Conclusion](#conclusion)
5. [Contributing](#contributing)
6. [License](#license)

## Introduction

This data analysis project aims to provide insights into the bank dataset, exploring various aspects of the data such as customer demographics, financial information, and the response variable. The project includes data cleaning, handling missing values, outlier detection, and various visualizations to help understand the data better.

## Getting Started

### Prerequisites

Before running the code in this project, make sure you have the following Python libraries installed:

- Pandas
- NumPy
- Matplotlib
- Seaborn

### Installation

You can install the required Python libraries using pip:

```bash
pip install pandas numpy matplotlib seaborn
```

## Data Analysis

The data analysis process is broken down into several steps, as outlined below:

### Importing Libraries

The project starts by importing necessary Python libraries and setting up the environment.

### Reading Dataset

The dataset, stored in the 'bank_data.csv' file, is read into a Pandas DataFrame, and the first few rows are displayed to get an initial overview.

### Data Cleaning

Data cleaning involves removing unwanted rows, columns, or values from the dataset to prepare it for analysis. In this project, some rows with missing or irrelevant data are dropped, and the 'jobedu' column is divided into separate 'job' and 'education' columns.

### Dropping Columns

Unnecessary columns like 'customerid' are dropped to simplify the dataset.

### Dividing 'jobedu' Column

A new `Education` column is created by extracting values from the `jobedu` column.

### Handling Missing Values

Missing values in the `age` and `month` columns are identified and handled appropriately. In the `pdays` column, missing values are replaced with NaN.

### Finding Duplicates

Duplicate records based on `age` and `response` columns are identified.

### Outlier Handling

Outliers in numerical variables like `age`, `salary`, and `balance` are analyzed using boxplots and quantiles.

### Standardizing Variables

The 'duration' variable is standardized to ensure uniformity.

### Univariate Analysis

Univariate analysis explores categorical features like `marital`, `job`, `education`, `poutcome`, and the target variable `response`. Visualizations such as bar plots and pie charts provide insights.

### Bivariate Analysis

Bivariate analysis examines relationships between variables, including numerical-numerical, categorical-numerical, and categorical-categorical relationships. Correlation analysis, boxplots, and heatmaps are used to visualize these relationships.

## Conclusion

This data analysis project provides a comprehensive exploration of the bank dataset, covering data cleaning, missing value handling, outlier detection, and various visualizations. The findings and insights gained from this analysis can be valuable for making informed decisions and building predictive models.

## Contributing

Contributions to this project are welcome. If you have suggestions, improvements, or additional analyses to add, please feel free to contribute.

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.