An open API service indexing awesome lists of open source software.

https://github.com/imswappy/ads-eda-predictor

Interactive Streamlit app for marketing campaign analytics and prediction. Includes dashboards, EDA, econometrics tests (ADF, cointegration, OLS diagnostics), ML pipelines with preprocessing, CV, and persistence. Predict outcomes with Linear/Ridge/Lasso/Random Forest regressors.
https://github.com/imswappy/ads-eda-predictor

adf breusch-pagan cointegration lasso-regression linear-regression matplotlib-pyplot numpy pandas random-forest ridge-regression seaborn sklearn

Last synced: about 1 month ago
JSON representation

Interactive Streamlit app for marketing campaign analytics and prediction. Includes dashboards, EDA, econometrics tests (ADF, cointegration, OLS diagnostics), ML pipelines with preprocessing, CV, and persistence. Predict outcomes with Linear/Ridge/Lasso/Random Forest regressors.

Awesome Lists containing this project

README

          

# 🤖 Marketing Data Science Studio — EDA, Econometrics & Predictive Pipelines

[![Streamlit App](https://img.shields.io/badge/Streamlit-Deployed-brightgreen)](https://ads-eda-predictor.streamlit.app/)
[![GitHub Repo](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/Imswappy/ads-eda-predictor)

---

## 📌 Overview

This project extends Jupyter notebook analyses into an **interactive Streamlit dashboard** for **marketing campaign analytics**.
It integrates **fundamentals of data exploration**, **statistical inference**, **econometrics tests**, and **machine learning pipelines**.

The focus is on comparing **Facebook Ads** vs **AdWords Ads** campaigns — analyzing clicks, conversions, costs, and predicting ad performance.

🔗 **Live App:** [ads-eda-predictor.streamlit.app](https://ads-eda-predictor.streamlit.app/)
🔗 **Source Code:** [github.com/Imswappy/ads-eda-predictor](https://github.com/Imswappy/ads-eda-predictor)

---

## 📂 Pages in the App

### 1️⃣ Dashboard — KPIs & Time-Series
- **KPIs:**
- Mean:
$$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$$
- Variance:
$$s^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2$$
- Standard Error:
$$\text{SEM} = \frac{s}{\sqrt{n}}$$
- **Time-series:** detects seasonality, trends, and structural breaks.
- **Scatter + OLS Regression:**
$$\hat\beta_1 = \frac{\sum (X_i - \bar X)(Y_i - \bar Y)}{\sum (X_i - \bar X)^2},\qquad \hat\beta_0 = \bar Y - \hat\beta_1 \bar X.$$

---

### 2️⃣ Exploratory Data Analysis (EDA)
- **Distributions & Moments:**
- Skewness:
$$\text{Skew} = \frac{1}{n}\sum\left(\frac{X_i - \bar X}{s}\right)^3$$
- Kurtosis:
$$\text{Kurt} = \frac{1}{n}\sum\left(\frac{X_i - \bar X}{s}\right)^4$$
- **Histogram & KDE:**
$$\hat f(x) = \frac{1}{nh}\sum_{i=1}^n K\left(\frac{x - X_i}{h}\right)$$
- **Correlation (Pearson):**
$$r_{XY} = \frac{\sum (X_i - \bar X)(Y_i - \bar Y)}{\sqrt{\sum (X_i - \bar X)^2}\sqrt{\sum (Y_i - \bar Y)^2}}.$$

---

### 3️⃣ Statistical Tests & Regression Diagnostics
- **ADF (Stationarity):**
$$\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \sum_{i=1}^p \delta_i \Delta Y_{t-i} + \varepsilon_t.$$
- **Cointegration (Engle-Granger):**
- Regress residuals and check for stationarity.
- **Breusch–Pagan Test (Heteroscedasticity):**
$$H_0: \text{Var}(\varepsilon) = \sigma^2 \quad vs \quad H_1: \text{Var}(\varepsilon) = f(X)$$
- **Diagnostics:** residual plots, robust SEs, adjusted $$R^2$$, AIC/BIC.

---

### 4️⃣ Notebook Reproductions
- Replicates matplotlib & seaborn plots for **validation**.
- **LOWESS smoothing:**
$$\hat f(x_0) = \arg\min_\beta \sum w_i(x_0)(y_i - \beta_0 - \beta_1 x_i)^2$$
where weights decay with distance from $$x_0$$.

---

### 5️⃣ Predictor — Pipelines, CV & Model Persistence
- **Preprocessing (ColumnTransformer):**
- Numeric: Median imputation + scaling.
- StandardScaler: $$X' = \frac{X - \mu}{\sigma}$$
- MinMaxScaler: $$X' = \frac{X - X_{\min}}{X_{\max}-X_{\min}}$$
- Categorical: Rare-category grouping → most-frequent imputation → OneHotEncoding.

- **Models:**
- Linear Regression (OLS)
- Ridge (L2):
$$\min_\beta \sum (y_i - X_i\beta)^2 + \alpha \|\beta\|_2^2$$
- Lasso (L1):
$$\min_\beta \sum (y_i - X_i\beta)^2 + \alpha \|\beta\|_1$$
- Random Forest (Ensemble):
$$\hat f(x) = \frac{1}{B}\sum_{b=1}^B T_b(x)$$

- **Evaluation:**
- RMSE:
$$\text{RMSE} = \sqrt{\frac{1}{n}\sum (y_i - \hat y_i)^2}$$
- $$R^2 = 1 - \frac{\sum (y_i - \hat y_i)^2}{\sum (y_i - \bar y)^2}$$
- **Persistence:** trained pipelines saved with `joblib` for reuse.

---

## 🚀 Deployment
- Built with **Streamlit** (multi-page app).
- Deployed on **Streamlit Cloud**: [ads-eda-predictor.streamlit.app](https://ads-eda-predictor.streamlit.app/)
- Repository: [github.com/Imswappy/ads-eda-predictor](https://github.com/Imswappy/ads-eda-predictor)

---

## 🛠️ Skills Involved
- **Python** (pandas, numpy, matplotlib, seaborn)
- **Machine Learning** (Linear/Ridge/Lasso, Random Forest, pipelines, CV)
- **Statistical Inference** (ADF, Cointegration, Breusch–Pagan, OLS diagnostics)
- **Data Visualization** (Plotly, Altair, Seaborn, Matplotlib)
- **Streamlit** (multi-page UI, state management, deployment)
- **Model Deployment** (joblib persistence, end-to-end reproducibility)

---

## 📸 Screenshots
image
image
image
image
image
image
image
image
image

---

## ⚡ Getting Started

```bash
# Clone repo
git clone https://github.com/Imswappy/ads-eda-predictor.git
cd ads-eda-predictor

# Install dependencies
pip install -r requirements.txt

# Run Streamlit app locally
streamlit run streamlit_app.py