https://github.com/rajull-agrawal/customer_analysis
This project is for data analysis on customer-related datasets using PySpark and tablaeu.
https://github.com/rajull-agrawal/customer_analysis
csv jupyter-notebook pyspark python tableau
Last synced: 4 months ago
JSON representation
This project is for data analysis on customer-related datasets using PySpark and tablaeu.
- Host: GitHub
- URL: https://github.com/rajull-agrawal/customer_analysis
- Owner: Rajull-Agrawal
- Created: 2024-09-08T08:20:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-08T09:54:10.000Z (over 1 year ago)
- Last Synced: 2025-04-05T20:43:14.700Z (about 1 year ago)
- Topics: csv, jupyter-notebook, pyspark, python, tableau
- Language: Jupyter Notebook
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Customer Data Analysis
## Author: Rajul Agrawal
### Overview
This Jupyter notebook performs data analysis on customer-related datasets using PySpark. The analysis involves loading various datasets, cleaning data, transforming date formats, and displaying distinct demographic information.
### Dataset
The notebook processes the following CSV datasets:
- **Customer Product**: Contains product details held by customers.
- **Customer Channel Activity**: Captures customer interactions through various channels.
- **Customer Demographics**: Includes demographic details of customers.
- **Customer Transaction History**: Tracks customers' transaction details.
- **Product Lookup**: Provides details about different products.
### Key Steps
1. **Data Loading**: The datasets are loaded into PySpark DataFrames.
2. **Data Transformation**:
- A UDF is applied to standardize various date formats across the datasets.
- The schema of each dataset is printed to understand the structure.
- Distinct values of demographic fields like `Marital_Status` are analyzed.
3. **Data Export**: Commented-out sections allow exporting the DataFrames to CSV for further analysis.
### Dependencies
- PySpark
- Pandas
- Datetime
- Pgeocode (for geographic data processing)
### Tableau Dashboard
This analysis is complemented by a Tableau dashboard that visualizes key insights from the data. Screenshots of the dashboard can be added here.


