Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ashishsingh789/bcg_virtual_internship
This repository showcases my BCG X virtual internship project on customer churn analysis for PowerCo, covering business understanding, EDA, feature engineering, and modeling using Python and machine learning.
https://github.com/ashishsingh789/bcg_virtual_internship
data-manipulation data-science dataanalysis datavisualization eda machine-learning matplotlib numpy pandas python random-forest scikit-learn seaborn
Last synced: 7 days ago
JSON representation
This repository showcases my BCG X virtual internship project on customer churn analysis for PowerCo, covering business understanding, EDA, feature engineering, and modeling using Python and machine learning.
- Host: GitHub
- URL: https://github.com/ashishsingh789/bcg_virtual_internship
- Owner: AshishSingh789
- Created: 2024-10-08T21:28:26.000Z (29 days ago)
- Default Branch: main
- Last Pushed: 2024-10-13T09:01:17.000Z (25 days ago)
- Last Synced: 2024-10-19T12:04:27.191Z (19 days ago)
- Topics: data-manipulation, data-science, dataanalysis, datavisualization, eda, machine-learning, matplotlib, numpy, pandas, python, random-forest, scikit-learn, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 8.96 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BCG X Virtual Internship - Data Science Project
This repository contains the work I completed during the BCG X Virtual Internship (June 2024 - September 2024) offered through Forage. The project focused on analyzing customer churn for Powerco, utilizing the full data science process from business understanding to model evaluation.# Project Overview
As part of this virtual internship, I worked on various tasks that mirror the typical responsibilities of a Data Scientist at BCG X. These tasks were designed to give hands-on experience in solving business problems using data-driven methodologies.# Tasks Completed:
# 1. Business Understanding & Hypothesis Framing
Framed PowerCo's problem in the context of customer churn.
Defined key hypotheses and identified important factors such as pricing, customer service, and energy preferences (clean energy vs. conventional).
Outlined the data requirements needed to investigate customer churn and provided an approach for analyzing these factors.# 2. Exploratory Data Analysis (EDA)
Analyzed historical customer and pricing data, along with churn indicators.
Explored data types, generated descriptive statistics, and visualized distributions to understand underlying patterns.
Used Python (Jupyter Notebook) to perform these analyses, focusing on key attributes affecting churn.# 3. Feature Engineering & Modelling
Created new features to enhance the predictive capability of the model, such as extracting date components and combining columns to form meaningful features.
Evaluated which columns could be removed or combined to improve model performance.
Combined the provided datasets to create a final dataset for modeling.# 4. Modeling and Evaluation
Built and trained a Random Forest classifier using the scikit-learn library to predict customer churn.
Evaluated the model using performance metrics like accuracy, precision, and recall.
Discussed the justification for chosen evaluation metrics and provided a summary of the model's performance.
Provided insights and recommendations on how PowerCo could reduce churn based on the model’s predictions.# Repository Structure
bashCopy code
├── data/
│ └── data_for_predictions.csv # Final dataset for modeling (not uploaded here)
├── notebooks/
│ ├── task1_business_understanding.ipynb
│ ├── task2_exploratory_data_analysis.ipynb
│ ├── task3_feature_engineering.ipynb
│ └── task4_modeling_and_evaluation.ipynb
├── README.md
# Project README file└── .gitignore
# Git ignores files for data and sensitive files# Technologies Used
Python
Jupyter Notebooks
Pandas, NumPy for data manipulation
Matplotlib, Seaborn for data visualization
Scikit-learn for machine learning (Random Forest)
Conclusion
This internship taught me hands-on experience with real-world data science problems, from formulating business problems to building and evaluating predictive models. The work provided a comprehensive understanding of the key stages in the data science workflow, from EDA to feature engineering and modeling.Check out the Jupyter notebooks in the repository for more details on the individual tasks.