Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ashishsingh789/bcg_virtual_internship

This repository showcases my BCG X virtual internship project on customer churn analysis for PowerCo, covering business understanding, EDA, feature engineering, and modeling using Python and machine learning.
https://github.com/ashishsingh789/bcg_virtual_internship

data-manipulation data-science dataanalysis datavisualization eda machine-learning matplotlib numpy pandas python random-forest scikit-learn seaborn

Last synced: 28 days ago
JSON representation

Host: GitHub
URL: https://github.com/ashishsingh789/bcg_virtual_internship
Owner: AshishSingh789
Created: 2024-10-08T21:28:26.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-10-13T09:01:17.000Z (3 months ago)
Last Synced: 2024-10-19T12:04:27.191Z (3 months ago)
Topics: data-manipulation, data-science, dataanalysis, datavisualization, eda, machine-learning, matplotlib, numpy, pandas, python, random-forest, scikit-learn, seaborn
Language: Jupyter Notebook
Homepage:
Size: 8.96 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# BCG X Virtual Internship - Data Science Project
This repository contains the work I completed during the BCG X Virtual Internship (June 2024 - September 2024) offered through Forage. The project focused on analyzing customer churn for Powerco, utilizing the full data science process from business understanding to model evaluation.

# Project Overview
As part of this virtual internship, I worked on various tasks that mirror the typical responsibilities of a Data Scientist at BCG X. These tasks were designed to give hands-on experience in solving business problems using data-driven methodologies.

# Tasks Completed:
# 1. Business Understanding & Hypothesis Framing
Framed PowerCo's problem in the context of customer churn.
Defined key hypotheses and identified important factors such as pricing, customer service, and energy preferences (clean energy vs. conventional).
Outlined the data requirements needed to investigate customer churn and provided an approach for analyzing these factors.

# 2. Exploratory Data Analysis (EDA)
Analyzed historical customer and pricing data, along with churn indicators.
Explored data types, generated descriptive statistics, and visualized distributions to understand underlying patterns.
Used Python (Jupyter Notebook) to perform these analyses, focusing on key attributes affecting churn.

# 3. Feature Engineering & Modelling
Created new features to enhance the predictive capability of the model, such as extracting date components and combining columns to form meaningful features.
Evaluated which columns could be removed or combined to improve model performance.
Combined the provided datasets to create a final dataset for modeling.

# 4. Modeling and Evaluation
Built and trained a Random Forest classifier using the scikit-learn library to predict customer churn.
Evaluated the model using performance metrics like accuracy, precision, and recall.
Discussed the justification for chosen evaluation metrics and provided a summary of the model's performance.
Provided insights and recommendations on how PowerCo could reduce churn based on the model’s predictions.

# Repository Structure
bash

Copy code

├── data/

│ └── data_for_predictions.csv # Final dataset for modeling (not uploaded here)

├── notebooks/

│ ├── task1_business_understanding.ipynb

│ ├── task2_exploratory_data_analysis.ipynb

│ ├── task3_feature_engineering.ipynb

│ └── task4_modeling_and_evaluation.ipynb

├── README.md
# Project README file

└── .gitignore
# Git ignores files for data and sensitive files

# Technologies Used
Python
Jupyter Notebooks
Pandas, NumPy for data manipulation
Matplotlib, Seaborn for data visualization
Scikit-learn for machine learning (Random Forest)
Conclusion
This internship taught me hands-on experience with real-world data science problems, from formulating business problems to building and evaluating predictive models. The work provided a comprehensive understanding of the key stages in the data science workflow, from EDA to feature engineering and modeling.

Check out the Jupyter notebooks in the repository for more details on the individual tasks.