{"id":50784188,"url":"https://github.com/sumith25-dev/customer-churn-prediction","last_synced_at":"2026-06-12T06:06:10.598Z","repository":{"id":360883059,"uuid":"1252120649","full_name":"sumith25-dev/customer-churn-prediction","owner":"sumith25-dev","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-28T08:09:39.000Z","size":224,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-28T10:06:18.619Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sumith25-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-28T07:55:56.000Z","updated_at":"2026-05-28T08:15:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sumith25-dev/customer-churn-prediction","commit_stats":null,"previous_names":["sumith25-dev/customer-churn-prediction"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/sumith25-dev/customer-churn-prediction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumith25-dev%2Fcustomer-churn-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumith25-dev%2Fcustomer-churn-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumith25-dev%2Fcustomer-churn-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumith25-dev%2Fcustomer-churn-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sumith25-dev","download_url":"https://codeload.github.com/sumith25-dev/customer-churn-prediction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumith25-dev%2Fcustomer-churn-prediction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34231243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-12T06:06:09.933Z","updated_at":"2026-06-12T06:06:10.593Z","avatar_url":"https://github.com/sumith25-dev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📡 Customer Churn Prediction System\n\n\u003e **End-to-end ML pipeline** for telecom customer churn prediction using XGBoost, SMOTE class balancing, SHAP explainability, and a production-ready Streamlit dashboard.\n\n[![Python](https://img.shields.io/badge/Python-3.10+-blue?logo=python)](https://python.org)\n[![XGBoost](https://img.shields.io/badge/XGBoost-2.0-orange)](https://xgboost.readthedocs.io)\n[![Streamlit](https://img.shields.io/badge/Streamlit-1.35-red?logo=streamlit)](https://streamlit.io)\n\n---\n\n## 🎯 Results\n\n| Metric | Score |\n|--------|-------|\n| **Accuracy** | **92%** |\n| **AUC-ROC** | **0.89** |\n| **Recall** | **88%** |\n| Inference Time | \u003c 2 seconds |\n| Baseline (Logistic Regression) | 81% accuracy |\n\n\u003e Outperforms logistic regression baseline by **11 percentage points** on IBM Telco dataset (7,043 records).\n\n---\n\n## 🏗️ Architecture\n\n```\nchurn-prediction/\n├── app.py                  # Streamlit dashboard (4 pages)\n├── src/\n│   ├── train.py            # Training pipeline (XGBoost + SMOTE + GridSearch)\n│   ├── predict.py          # Inference helpers + SHAP explanations\n│   └── utils.py            # Sample CSV generator \u0026 shared utilities\n├── models/                 # Saved artifacts (after training)\n│   ├── xgb_model.pkl\n│   ├── scaler.pkl\n│   ├── feature_cols.pkl\n│   └── shap_explainer.pkl\n├── data/                   # Place dataset CSV here\n├── requirements.txt\n└── README.md\n```\n\n---\n\n## 🚀 Quick Start\n\n### 1. Clone \u0026 install\n```bash\ngit clone https://github.com/YOUR_USERNAME/customer-churn-prediction.git\ncd customer-churn-prediction\npip install -r requirements.txt\n```\n\n### 2. Download the dataset\nGet the IBM Telco Customer Churn dataset from Kaggle:\n```\nhttps://www.kaggle.com/datasets/blastchar/telco-customer-churn\n```\nPlace `WA_Fn-UseC_-Telco-Customer-Churn.csv` inside the `data/` folder.\n\n### 3. Train the model\n```bash\npython src/train.py\n```\nThis will:\n- Load and clean the 7,043-record dataset\n- Engineer 20+ features (tenure buckets, service count, charge ratios)\n- Apply SMOTE to balance the 26% churn minority class\n- Run 5-fold cross-validated grid search over XGBoost hyperparameters\n- Evaluate on a 20% held-out test set\n- Save SHAP explainability artifacts\n- Output: `models/*.pkl`, `models/confusion_matrix.png`, `models/roc_curve.png`, `models/shap_summary.png`\n\n### 4. Launch the dashboard\n```bash\nstreamlit run app.py\n```\n\n---\n\n## 🔑 Key Churn Drivers (SHAP Analysis)\n\n1. **Contract Type** — Month-to-month contracts show 3× higher churn rate\n2. **Tenure** — New customers (0–12 months) churn most frequently\n3. **Monthly Charges** — Higher bills correlate with churn risk\n4. **Internet Service** — Fiber optic users churn more than DSL\n5. **Tech Support** — Absence of tech support increases churn risk\n\n---\n\n## 📊 Dashboard Features\n\n| Page | Description |\n|------|-------------|\n| 🏠 Dashboard | KPI cards, key churn drivers, model overview |\n| 🔍 Single Prediction | Real-time inference with SHAP waterfall chart |\n| 📦 Bulk Prediction | CSV upload → predictions → downloadable results |\n| 📊 Model Insights | Feature importance, ROC curve, confusion matrix |\n\n---\n\n## 🛠️ Pipeline Details\n\n### Data Processing\n- Handle missing `TotalCharges` (11 records) with median imputation\n- Encode categorical features via one-hot encoding\n- Create derived features: `tenure_group`, `service_count`, `charges_per_month`\n\n### Class Imbalance\n- Dataset: 73.5% No Churn / 26.5% Churn\n- Strategy: **SMOTE** (Synthetic Minority Over-sampling Technique) on training split only\n\n### Model Selection\n- Algorithm: **XGBoost** (gradient-boosted trees)\n- Validation: **StratifiedKFold (k=5)** cross-validation\n- Tuning: **GridSearchCV** over n_estimators, max_depth, learning_rate, subsample\n\n### Explainability\n- **SHAP TreeExplainer** for global feature importance and local per-prediction explanations\n- Surfaces top positive/negative drivers for each customer prediction\n\n---\n\n## 👤 Author\n\n**Sumith B R** — Junior AI Engineer  \n[LinkedIn](https://www.linkedin.com/in/sumith-b-r-548534200/) · [GitHub](https://github.com/sumith25-dev)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsumith25-dev%2Fcustomer-churn-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsumith25-dev%2Fcustomer-churn-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsumith25-dev%2Fcustomer-churn-prediction/lists"}