{"id":18075649,"url":"https://github.com/ashishsingh789/bcg_virtual_internship","last_synced_at":"2026-04-09T11:02:40.504Z","repository":{"id":258524814,"uuid":"869772223","full_name":"AshishSingh789/BCG_virtual_Internship","owner":"AshishSingh789","description":"This repository showcases my BCG X virtual internship project on customer churn analysis for PowerCo, covering business understanding, EDA, feature engineering, and modeling using Python and machine learning.","archived":false,"fork":false,"pushed_at":"2024-10-13T09:01:17.000Z","size":9394,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-19T12:04:27.191Z","etag":null,"topics":["data-manipulation","data-science","dataanalysis","datavisualization","eda","machine-learning","matplotlib","numpy","pandas","python","random-forest","scikit-learn","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AshishSingh789.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-08T21:28:26.000Z","updated_at":"2024-10-13T09:01:20.000Z","dependencies_parsed_at":"2024-10-20T12:50:12.052Z","dependency_job_id":null,"html_url":"https://github.com/AshishSingh789/BCG_virtual_Internship","commit_stats":null,"previous_names":["rohit-kumar873/bcg_virtual_internship"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshishSingh789%2FBCG_virtual_Internship","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshishSingh789%2FBCG_virtual_Internship/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshishSingh789%2FBCG_virtual_Internship/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AshishSingh789%2FBCG_virtual_Internship/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AshishSingh789","download_url":"https://codeload.github.com/AshishSingh789/BCG_virtual_Internship/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393539,"owners_count":20931809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-manipulation","data-science","dataanalysis","datavisualization","eda","machine-learning","matplotlib","numpy","pandas","python","random-forest","scikit-learn","seaborn"],"created_at":"2024-10-31T11:06:46.298Z","updated_at":"2025-12-30T21:55:53.625Z","avatar_url":"https://github.com/AshishSingh789.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BCG X Virtual Internship - Data Science Project\nThis repository contains the work I completed during the BCG X Virtual Internship (June 2024 - September 2024) offered through Forage. The project focused on analyzing customer churn for Powerco, utilizing the full data science process from business understanding to model evaluation.\n\n# Project Overview\nAs part of this virtual internship, I worked on various tasks that mirror the typical responsibilities of a Data Scientist at BCG X. These tasks were designed to give hands-on experience in solving business problems using data-driven methodologies.\n\n# Tasks Completed:\n# 1. Business Understanding \u0026 Hypothesis Framing\nFramed PowerCo's problem in the context of customer churn.\nDefined key hypotheses and identified important factors such as pricing, customer service, and energy preferences (clean energy vs. conventional).\nOutlined the data requirements needed to investigate customer churn and provided an approach for analyzing these factors.\n\n\n# 2. Exploratory Data Analysis (EDA)\nAnalyzed historical customer and pricing data, along with churn indicators.\nExplored data types, generated descriptive statistics, and visualized distributions to understand underlying patterns.\nUsed Python (Jupyter Notebook) to perform these analyses, focusing on key attributes affecting churn.\n\n\n# 3. Feature Engineering \u0026 Modelling\nCreated new features to enhance the predictive capability of the model, such as extracting date components and combining columns to form meaningful features.\nEvaluated which columns could be removed or combined to improve model performance.\nCombined the provided datasets to create a final dataset for modeling.\n\n\n# 4. Modeling and Evaluation\nBuilt and trained a Random Forest classifier using the scikit-learn library to predict customer churn.\nEvaluated the model using performance metrics like accuracy, precision, and recall.\nDiscussed the justification for chosen evaluation metrics and provided a summary of the model's performance.\nProvided insights and recommendations on how PowerCo could reduce churn based on the model’s predictions.\n\n\n# Repository Structure\nbash\n\nCopy code\n\n├── data/\n\n│   └── data_for_predictions.csv   # Final dataset for modeling (not uploaded here)\n\n├── notebooks/\n\n│   ├── task1_business_understanding.ipynb\n\n│   ├── task2_exploratory_data_analysis.ipynb\n\n│   ├── task3_feature_engineering.ipynb\n\n│   └── task4_modeling_and_evaluation.ipynb\n\n├── README.md  \n# Project README file\n\n└── .gitignore  \n# Git ignores files for data and sensitive files\n\n\n# Technologies Used\nPython\nJupyter Notebooks\nPandas, NumPy for data manipulation\nMatplotlib, Seaborn for data visualization\nScikit-learn for machine learning (Random Forest)\nConclusion\nThis internship taught me hands-on experience with real-world data science problems, from formulating business problems to building and evaluating predictive models. The work provided a comprehensive understanding of the key stages in the data science workflow, from EDA to feature engineering and modeling.\n\nCheck out the Jupyter notebooks in the repository for more details on the individual tasks.\n\n\n\n[BCG X.pdf](https://github.com/user-attachments/files/18366583/BCG.X.pdf)\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashishsingh789%2Fbcg_virtual_internship","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashishsingh789%2Fbcg_virtual_internship","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashishsingh789%2Fbcg_virtual_internship/lists"}