https://github.com/steffin12-git/logistic-regression-social-network-ads-ml
Built an interpretable Logistic Regression model to predict whether a user will purchase a product from social network ads using demographic and behavioral features. The notebook demonstrates a complete ML workflow β data ingestion, preprocessing, scaling, modeling, evaluation, and visual diagnostics.
https://github.com/steffin12-git/logistic-regression-social-network-ads-ml
matplotlib-pyplot pandas python seaborn sklearn statistics
Last synced: about 1 month ago
JSON representation
Built an interpretable Logistic Regression model to predict whether a user will purchase a product from social network ads using demographic and behavioral features. The notebook demonstrates a complete ML workflow β data ingestion, preprocessing, scaling, modeling, evaluation, and visual diagnostics.
- Host: GitHub
- URL: https://github.com/steffin12-git/logistic-regression-social-network-ads-ml
- Owner: Steffin12-git
- Created: 2025-08-16T04:21:54.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-08-16T04:44:22.000Z (3 months ago)
- Last Synced: 2025-08-16T06:28:31.309Z (3 months ago)
- Topics: matplotlib-pyplot, pandas, python, seaborn, sklearn, statistics
- Language: Jupyter Notebook
- Homepage:
- Size: 112 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π£ Logistic Regression β Social Network Ads (Purchase Prediction)
**Repository:** *Logistic Regression Social Network Ads*
**Notebook:** `Logistic Regression Social network.ipynb`
---
## π TL;DR
Built an interpretable **Logistic Regression** model to predict whether a user will purchase a product from social network ads using demographic and behavioral features. The notebook demonstrates a **complete ML workflow** β data ingestion, preprocessing, scaling, modeling, evaluation, and visual diagnostics.
---
## π Key Results
* **Dataset:** 400 rows, 5 columns (`User ID`, `Gender`, `Age`, `EstimatedSalary`, `Purchased`)
* **Train/Test split:** 70% / 30% β **Train = 280**, **Test = 120**
* **Model:** `sklearn.linear_model.LogisticRegression()` (default)
* **Test set performance:**
* **Accuracy:** **0.86**
* **Classification report (test set):**
```
precision recall f1-score support
0 0.83 0.97 0.89 73
1 0.94 0.68 0.79 47
accuracy 0.86 120
macro avg 0.88 0.83 0.84 120
weighted avg 0.87 0.86 0.85 120
```
* **Confusion Matrix:**

* **ROC Curve (AUC \~ 0.91):**

> π **Interpretation:**
> Model is very accurate at identifying non-purchasers (recall = 0.97), but misses \~32% of actual purchasers (recall = 0.68). Depending on business goals, the decision threshold can be adjusted to improve recall.
---
## π Dataset & Preprocessing
* **Data file:** `Social_Network_Ads.csv`
* **Features used (`X`):** `Gender`, `Age`, `EstimatedSalary`
* **Target (`y`):** `Purchased`
* **Steps performed:**
* Dropped `User ID`
* Encoded `Gender` (Male = 1, Female = 0)
* Standardized features using **StandardScaler**
* Train-test split (70% train, 30% test)
```python
# Encoding gender
df['Gender'] = df['Gender'].apply(lambda x: 1 if str(x).strip().lower() == "male" else 0)
# Feature scaling
scalar = StandardScaler()
X_scaled = scalar.fit_transform(X)
```
---
## π§βπ» Model Training & Evaluation
* **Logistic Regression model:**
```python
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
```
* **Evaluation metrics generated:**
* Accuracy score (0.86)
* Classification report (precision, recall, F1-score)
* Confusion matrix (visualized above)
* ROC Curve & AUC
---
## π Insights & Business Relevance
* β
**High precision for purchasers (0.94):** When the model predicts a purchase, itβs usually correct. Useful for **targeted campaigns** where false positives are costly.
* β οΈ **Lower recall for purchasers (0.68):** The model misses \~32% of buyers. If the goal is **maximizing sales capture**, recall should be improved (via class weighting, resampling, or threshold tuning).
* β‘ **Lightweight & interpretable:** With only three predictors, this model is fast, explainable, and easy to deploy. Ideal for **marketing proof-of-concept**.
---
## π§Ύ Reproducibility β How to Run
1. Clone the repo and ensure the dataset `Social_Network_Ads.csv` is present.
2. Install dependencies:
```bash
pip install pandas seaborn matplotlib scikit-learn jupyter
```
3. Launch Jupyter and run the notebook:
```bash
jupyter notebook "Logistic Regression Social network.ipynb"
```
4. To export plots for README:
```python
plt.savefig("images/confusion metrics.png", bbox_inches="tight")
plt.savefig("images/Roc Curve.png", bbox_inches="tight")
```
---
## π Project Structure
```
βββ Logistic Regression Social network.ipynb # Main notebook
βββ Social_Network_Ads.csv # Dataset (400 rows)
βββ images/ # Visualization assets
β βββ confusion metrics.png
β βββ Roc Curve.png
βββ README.md # Documentation
```
---
## π‘ Elevator Pitch
Developed an **86% accurate Logistic Regression model** to predict purchase behavior from social network ads using demographic features (Age, Gender, Estimated Salary). Produced **business-driven insights**: strong precision for targeting campaigns, but opportunities to boost recall for wider sales reach. Delivered a reproducible ML pipeline with visual diagnostics (confusion matrix & ROC curve) for easy deployment.
**Tech Stack:** Python, Pandas, Matplotlib, Seaborn, Scikit-learn, Jupyter.