Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abeltavares/online_retail_pyspark_analysis
PySpark data analysis of the Online Retail Data Set
https://github.com/abeltavares/online_retail_pyspark_analysis
business-intelligence churn-analysis customer-segmentation data-analysis data-visualization jupyter-notebook machine-learning market-basket-analysis online-retail product-affinity-analysis pyspark
Last synced: 6 days ago
JSON representation
PySpark data analysis of the Online Retail Data Set
- Host: GitHub
- URL: https://github.com/abeltavares/online_retail_pyspark_analysis
- Owner: abeltavares
- Created: 2023-04-07T22:45:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-04-07T23:16:06.000Z (over 1 year ago)
- Last Synced: 2023-07-15T15:23:02.950Z (over 1 year ago)
- Topics: business-intelligence, churn-analysis, customer-segmentation, data-analysis, data-visualization, jupyter-notebook, machine-learning, market-basket-analysis, online-retail, product-affinity-analysis, pyspark
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
![Status](https://img.shields.io/badge/work%20in%20progress-e28a2b)
[![PySpark](https://img.shields.io/badge/PySpark-3.3.2-orange.svg)](https://spark.apache.org/docs/latest/api/python/index.html)# Online Retail Data Analysis
This repository contains an analysis of the Online Retail dataset, which includes transactional data from a UK-based online retailer. The analysis is performed using PySpark in Jupyter Notebooks.
## Dataset
The dataset used in this analysis can be found in the `data` folder. The dataset contains information about customer purchases, including product descriptions, quantities, and prices.
## Notebooks
The analysis is divided into several Jupyter Notebooks, each focusing on a specific aspect of the data:
- `Exploratory_Data_Analysis.ipynb`: Exploratory data analysis to understand the structure and distribution of the data.
- `RFM_Analysis.ipynb`: RFM analysis to segment customers based on their purchasing behavior.
- `KMeans_Clustering.ipynb`: K-means clustering to segment customers based on their order history.
- `Product_Affinity_Analysis.ipynb`: Product affinity analysis to identify which products tend to be purchased together.
- `Market_Basket_Analysis.ipynb`: Market basket analysis to analyze which products tend to be purchased together at different times of day, week, or year.
- `Churn_Analysis.ipynb`: Churn analysis to identify customers who are likely to churn based on their past behavior.## Requirements
The analysis requires PySpark and Jupyter Notebook. The necessary Python libraries can be installed using the `requirements.txt` file.
## Usage
To run the analysis, clone the repository and open the Jupyter Notebooks in order.
## Contributions
This project is open to contributions. If you have any suggestions or improvements, please feel free to create a pull request.
## Copyright
© 2023 Abel Tavares