https://github.com/akash1070/data-science-virtual-internship-by-anz
Exploratory data analysis and prediction of annual salary for customers from the dataset provided by ANZ.
https://github.com/akash1070/data-science-virtual-internship-by-anz
data-analysis data-science predictive-analytics presentation-slides
Last synced: about 1 year ago
JSON representation
Exploratory data analysis and prediction of annual salary for customers from the dataset provided by ANZ.
- Host: GitHub
- URL: https://github.com/akash1070/data-science-virtual-internship-by-anz
- Owner: Akash1070
- Created: 2022-09-25T15:38:06.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-25T15:54:08.000Z (over 3 years ago)
- Last Synced: 2025-01-29T11:52:20.227Z (over 1 year ago)
- Topics: data-analysis, data-science, predictive-analytics, presentation-slides
- Language: Jupyter Notebook
- Homepage:
- Size: 3.03 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Data Science Virtual Internship By ANZ**
Repository for all the code and reports for Data Analytics Virtual Internship Program at ANZ.
# Project Details
## Project:
Exploratory data analysis and prediction of annual salary for customers from the dataset provided by ANZ.
## Dataset Description:
The Dataset that was given to us is based on a synthesised transaction dataset containing 3 monthsโ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions.
The dataset is designed to simulate realistic transaction behaviours that are observed in ANZโs real transaction data, so many of the insights we will gather will be genuine.
## Tools used:
**For data wrangling and visualization:** NumPy, Pandas, Matplotlib, Seaborn
**For predictive analytics:** scikit-learn
**For Reporting:** Google slides
## Tasks:
**Task 1:** Segmenting the dataset and drawing unique insights, including visualisation of the transaction volume and assessing the effect of any outliers.
**Task 2:** Exploring correlations between customer attributes, building a regression and a decision-tree prediction model based on your findings.
## Authors
- [@Akash Kumar Jha](https://github.com/Akash1070)
## Deployment
1. Importing Necessary Libraries
2. Load All Datasets
3. Data Cleaning
4. Data Analysis
5. Predictive Analysis
## Installation
To install the libraries used in this project. Follow the
below steps:
```bash
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
```
## Running Flask Api
To run tests, run the following command
```bash
python app.py
```
## ๐ About Me
Data Scientist Enthusiast | Petroleum Engineer Graduate | Solving Problems Using Data
# Hi, I'm Akash! ๐
## ๐ Links
[](https://github.com/Akash1070)
[](https://www.linkedin.com/in/akashkumar107/)
## Tech Stack

## Other Me
๐ฉโ๐ป Iโm interested in Petroleum Engineering
๐ง Iโm currently learning Data Scientist | Data Analytics | Business Analytics
๐ฏโโ๏ธ Iโm looking to collaborate on Ideas & Data
## ๐ Skills
1. Data Scientist
2. Data Analyst
3. Business Analyst
4. Machine Learning
## Future Plans
โก๏ธ Looking forward to help drive innovations into your company as a Data Scientist
โก๏ธ Looking forward to offer more than I take and leave the place better than i found