https://github.com/soufianboukir/ecom-analytics-platform
End-to-end data science project on an Amazon sales dataset, including data preprocessing, analysis, modeling, and a Streamlit dashboard for insights and decision-making.
https://github.com/soufianboukir/ecom-analytics-platform
data-analysis data-science data-visualization data-visualization-dashboard forecasting-models timeseries
Last synced: 5 days ago
JSON representation
End-to-end data science project on an Amazon sales dataset, including data preprocessing, analysis, modeling, and a Streamlit dashboard for insights and decision-making.
- Host: GitHub
- URL: https://github.com/soufianboukir/ecom-analytics-platform
- Owner: soufianboukir
- License: apache-2.0
- Created: 2026-05-01T18:59:47.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-11T12:23:56.000Z (about 1 month ago)
- Last Synced: 2026-05-11T14:22:15.377Z (about 1 month ago)
- Topics: data-analysis, data-science, data-visualization, data-visualization-dashboard, forecasting-models, timeseries
- Language: Jupyter Notebook
- Homepage: https://ecom-analytics-forecasting-platform.streamlit.app/
- Size: 12.2 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# E-Commerce Analytics & Forecasting Platform
An end-to-end data science and business intelligence system built on 100,000 Amazon-style sales transactions — combining interactive dashboards, customer analytics, product performance analysis, and multi-model revenue forecasting.
## 📄 Report
- Full analysis report: [PDF](https://github.com/soufianboukir/ecom-analytics-platform/blob/main/reports/customer-analystics-and-revenue-forecasting-system.pdf)
- Streamlit dashboard: [Live App](https://ecom-analytics-forecasting-platform.streamlit.app/)

## Overview
This project develops a **5-page interactive Streamlit dashboard** that transforms raw e-commerce transactional data into actionable business intelligence. It covers the full data science pipeline:
- **Exploratory Data Analysis** — distributions, correlations, trends
- **Customer Analytics** — lifetime value, geographic distribution, payment behavior
- **Product Performance** — revenue, margins, brand comparison, drilldown
- **Revenue Forecasting** — 4 models benchmarked, XGBoost selected for 12-month forecast
- **Return Analysis** — country × category heatmap, top returned products
The system is designed to answer real business questions:
- Which customers, products, and geographies drive the most revenue?
- Do discounts actually increase revenue?
- What will revenue look like over the next 12 months?
- Which products are being returned most and why?
---
## Project Structure
```
ecom-analytics-platform/
│
├── app/
│ ├── app.py # Main Streamlit entry point
│ └── pages/
│ ├── overview.py # Page 1 — KPIs, revenue trend, top countries
│ ├── analysis.py # Page 2 — Revenue by category, discounts, shipping
│ ├── customer_insights.py # Page 3 — LTV, geography, payment methods
│ ├── product_performance.py # Page 4 — Products, margins, brands, drilldown
│ └── forecasting.py # Page 5 — Multi-model forecasting system
│
├── data/
│ ├── raw/
│ │ └── amazon_sales.csv # Original dataset
│ └── processed/
│ ├── amazon_sales_final.csv # Cleaned & feature-engineered dataset
│ └── amazon_sales.py
│
│
├── notebooks/
│ ├── 01_data_cleaning.ipynb
│ ├── 02_exploratory_data_analysis.ipynb
│ └── 03_feature_engineering.ipynb
│
├── requirements.txt # Python dependencies
├── README.md
└── report/
└── main.pdf # Full academic report
```
---
## Dashboard Pages
### Page 1 — Executive Overview
High-level business snapshot with 4 KPI cards, monthly revenue trend, top 5 countries by revenue, top 5 categories donut chart, and order status breakdown.
| KPI | Value |
|---|---|
| Total Revenue | $91,825,648 |
| Total Orders | 100,000 |
| Avg Order Value | $918.26 |
| Return Rate | 6.2% |
---
### Page 2 — Sales Analysis
- Revenue breakdown by Category, Brand, and Payment Method
- Discount vs Revenue scatter analysis
- Shipping cost distribution by country (box plots)
- Seasonal revenue patterns
---
### Page 3 — Customer Insights
- Top 20 customers by Lifetime Value (LTV) — horizontal gradient bar chart
- Customer geographic distribution — US choropleth map + city bar chart
- Average order value by payment method — grouped bar + revenue share donut
- Export buttons for LTV and payment summary CSVs
---
### Page 4 — Product Performance
- Best-selling products — Top 20 by revenue and by quantity (tabbed)
- Category margin analysis — Revenue vs Shipping vs Tax vs Margin grouped bar
- Brand bubble chart — Avg unit price vs Avg quantity (bubble size = revenue)
- Product drilldown — search bar → KPIs + monthly sparkline + raw orders table
- Return analysis — By country, by category, country × category heatmap, top 10 returned products
---
### Page 5 — Revenue Forecasting System
- 4 models: Naive baseline, Linear Regression, XGBoost, Prophet
- Time-based train/test split — last 6 months as holdout
- Actual vs Predicted chart (per model selector)
- All models comparison chart
- Model performance table — MAE, RMSE, R², MAPE
- XGBoost 12-month recursive future forecast with downloadable CSV
---
## Installation
### 1. Clone the repository
```bash
git clone https://github.com/soufianboukir/ecom-analytics-platform.git
cd ecom-analytics-platform
```
### 2. Create a virtual environment
```bash
python -m venv .venv
source .venv/bin/activate # Linux / macOS
.venv\Scripts\activate # Windows
```
### 3. Install dependencies
```bash
pip install -r requirements.txt
```
---
## Usage
```bash
streamlit run app/app.py
```
Then open your browser at `http://localhost:8501`
### Sidebar Filters
- **Date Range** — Filter all pages by order date
- **Category** — Filter by product category (available on relevant pages)
built with ❤️ by **soufian**.