https://github.com/dpb24/customer-churn
  
  
    🌐 Predicting Customer Churn with Decision Tree, XGBoost & Neural Network Models on the Cell2Cell Dataset 
    https://github.com/dpb24/customer-churn
  
churn-analysis churn-prediction confusion-matrix decision-tree-classifier keras-neural-networks predictive-analytics shap-analysis tensorflow xgboost-classifier
        Last synced: 3 months ago 
        JSON representation
    
🌐 Predicting Customer Churn with Decision Tree, XGBoost & Neural Network Models on the Cell2Cell Dataset
- Host: GitHub
 - URL: https://github.com/dpb24/customer-churn
 - Owner: dpb24
 - Created: 2025-07-14T06:36:27.000Z (4 months ago)
 - Default Branch: main
 - Last Pushed: 2025-07-14T09:46:18.000Z (4 months ago)
 - Last Synced: 2025-07-14T09:55:38.085Z (4 months ago)
 - Topics: churn-analysis, churn-prediction, confusion-matrix, decision-tree-classifier, keras-neural-networks, predictive-analytics, shap-analysis, tensorflow, xgboost-classifier
 - Language: Jupyter Notebook
 - Homepage:
 - Size: 64.5 KB
 - Stars: 0
 - Watchers: 0
 - Forks: 0
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 
 
Awesome Lists containing this project
README
          # 🌐 Customer Churn Prediction: Cell2Cell Dataset 
**Libraries:** `scikit-learn`, `matplotlib`, `seaborn`, `XGBoost`, `TensorFlow`, `SHAP`
**Dataset:** [Cell2Cell Churn Dataset](https://www.kaggle.com/datasets/jpacse/datasets-for-churn-telecom/data) 
 
In this project, we explore the Cell2Cell dataset and build three supervised machine learning models - **Decision Tree**, **XGBoost**, and a **Neural Network** (TensorFlow) — to predict customer churn risk. The workflow includes feature engineering, model tuning, performance evaluation, and SHAP-based interpretability. 
## 🧠 Analytical Approach
 - **Feature engineering** with tenure, billing, call usage, and derived behavioural indicators
 - **Categorical Optimisation**  via native support in XGBoost and embedding layers in the neural network
 - **Model Evaluation** using accuracy, precision, recall, and $F_1$ score on both validation and test sets
 - **Global Interpretability** through SHAP to compare feature importance across architectures
 - **Confusion Matrix Analysis** to diagnose prediction trade-offs and guide threshold tuning 
 
## 📊 Results
 - **XGBoost** emerged as the top-performing model, with validation recall of **75.1%** and a stable $F_1$ score of **0.495** on test data
 - **Neural Network** achieved higher validation accuracy but struggled with recall, indicating overfitting to the majority class
 - **SHAP analysis** revealed consistently influential features across models
 - **Confusion matrix analysis** highlighted a recall-focused strategy: most churners were correctly flagged, though false positives remained substantial
## 🔮 Next Steps
 - Refine the Neural Network with dropout, class weighting, and early stopping to improve recall
 - Extend SHAP analysis
📖 Jupyter Notebook: [GitHub](notebooks/cell2cell_customer_churn_v1.ipynb) | [CoLab](https://colab.research.google.com/drive/19w02yTrKmPHFSv-OwqHzBjbYtv3l_t-N)