https://github.com/tashi-2004/global-ecommerce-retail-trends-analysis
The Global E-commerce & Retail Analysis project involves data preprocessing, dimensionality reduction with PCA, CLV calculation and What-If analysis . Key insights include effective PCA for data reduction, detailed CLV analysis across segments , and the impact of pricing strategies on sales.
https://github.com/tashi-2004/global-ecommerce-retail-trends-analysis
boxplot clv-analysis data-science data-visualization dataintegration deep-learning dimensionality-reduction ecommerce heatmap machine-learning normalization outlier-detection outlier-removal pca-analysis preprocessing python scatter-plot whatif-analysis
Last synced: 2 months ago
JSON representation
The Global E-commerce & Retail Analysis project involves data preprocessing, dimensionality reduction with PCA, CLV calculation and What-If analysis . Key insights include effective PCA for data reduction, detailed CLV analysis across segments , and the impact of pricing strategies on sales.
- Host: GitHub
- URL: https://github.com/tashi-2004/global-ecommerce-retail-trends-analysis
- Owner: tashi-2004
- Created: 2024-09-15T21:24:26.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-10-28T21:01:02.000Z (7 months ago)
- Last Synced: 2025-01-30T08:16:09.757Z (4 months ago)
- Topics: boxplot, clv-analysis, data-science, data-visualization, dataintegration, deep-learning, dimensionality-reduction, ecommerce, heatmap, machine-learning, normalization, outlier-detection, outlier-removal, pca-analysis, preprocessing, python, scatter-plot, whatif-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 24.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# Global-Ecommerce-Retail-Trends-Analysis
## Overview
This project involves analyzing global e-commerce trends and their impact on traditional retail. The analysis includes data preprocessing, outlier detection, Principal Component Analysis (PCA), Customer Lifetime Value (CLV) calculation, and a What-if analysis to simulate the effect of different pricing strategies.
## Files Included
1. **Datasets:**
- `1.csv`, `2.csv`, `3.csv`: Raw e-commerce and retail datasets containing information about sales, customer transactions, and more.
- `tashi.csv`: A combined and preprocessed dataset that merges `1.csv`, `2.csv`, and `3.csv`. [Download](https://mega.nz/folder/GRMlHbzB#RNCqR2Wn1MwV6UkK4i-8dQ)
- `pca_transformed_data.csv`: Dataset after applying PCA for dimensionality reduction. [Download](https://mega.nz/folder/GRMlHbzB#RNCqR2Wn1MwV6UkK4i-8dQ)2. **Reports:**
- `Report.pdf`: A comprehensive report detailing the analysis, visualizations, and insights.
- `Pre-Processing_Insights.pdf`: A detailed document explaining the preprocessing steps taken, including outlier detection and handling missing values.3. **Notebook:**
- `code.ipynb`: Jupyter notebook containing Python code for data preprocessing, PCA, CLV calculation, and visualizations.## Project Steps
### 1. Data Preprocessing
- **Data Integration:** Datasets `1.csv`, `2.csv`, and `3.csv` are combined into a single dataset `tashi.csv`.
- **Handling Missing Values:** Missing values in numerical columns were filled using the mean of each column.
- **Outlier Detection and Removal:** Outliers were detected using the Z-score method with a threshold of 2.5. These outliers were removed from the dataset.
- **Normalization:** Numerical columns were normalized using `MinMaxScaler`.
- **Label Encoding:** Categorical columns were label encoded using `LabelEncoder`.
- **Feature Creation:**
- `Total_Sales`: Calculated by multiplying `UnitPrice` and `Quantity`.
- `Discount_Effectiveness`: Ratio of `discount_amount` to `Total_Sales`.
- `Sales_per_Customer`: Sum of total sales per customer.### 2. Dimensionality Reduction with PCA
- **Principal Component Analysis (PCA):** Applied to the normalized dataset to reduce dimensionality while retaining 80% of the variance.
- **Visualizations:**
- Scatter plot of the first two principal components.
- Heatmap and boxplots to visualize PCA results.
![]()
### 3. Customer Lifetime Value (CLV)
- **CLV Calculation:** CLV was calculated for each customer based on average purchase value, purchase frequency, and retention rate.
- **CLV Visualization:**
- Boxplots and violin plots were used to visualize CLV across different customer segments.
- A heatmap of average CLV across customer segments was generated.
![]()
### 4. What-If Analysis
- **Price Change Simulations:** The effect of different price changes on CLV was simulated by modifying the `UnitPrice` variable. The results were visualized using line plots and histograms.
- **Visualization of Impact:** Line plots and heatmaps were created to illustrate the impact of different `UnitPrice` multipliers on CLV and total sales.
![]()
![]()
## Running the Notebook
1. Open the Jupyter notebook `code.ipynb` in any Jupyter environment (e.g., JupyterLab, Google Colab).
2. Run the code cells in sequence to preprocess the data, apply PCA, calculate CLV, and generate the visualizations.### Dataset Files:
- Ensure that `1.csv`, `2.csv`, and `3.csv` are available in the working directory.
- The notebook generates `tashi.csv` and `pca_transformed_data.csv` as outputs.## Key Insights
- **Dimensionality Reduction:** PCA effectively reduced the dataset dimensions while preserving most of the data's variance.
- **CLV Segmentation:** Customer Lifetime Value was calculated and segmented, highlighting differences in customer value across groups.
- **Impact of Price Changes:** The What-if analysis demonstrated how changes in `UnitPrice` affect CLV, providing insights into potential pricing strategies.## Visualizations
- **Scatter Plot of Principal Components:** Highlights clusters or separations in the dataset after applying PCA.
- **Boxplot of CLV Segments:** Showcases the distribution of CLV across different customer segments.
- **Violin Plot of CLV Segments:** Visualizes the density of CLV values across segments.
- **Heatmaps:** Used to display the relationship between CLV, total sales, and `UnitPrice` changes.
- **Histograms:** Demonstrate the frequency of CLV values for various price multipliers in the What-if analysis.## Contact
For any questions or suggestions, feel free to contact at [[email protected]]