{"id":22332542,"url":"https://github.com/theoddysey/scikit-pipeline","last_synced_at":"2025-07-11T04:36:56.355Z","repository":{"id":229020322,"uuid":"770049711","full_name":"TheODDYSEY/Scikit-Pipeline","owner":"TheODDYSEY","description":"Bank Customer Churn Prediction Project 💰","archived":false,"fork":false,"pushed_at":"2024-04-27T10:00:32.000Z","size":6611,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-31T08:43:29.977Z","etag":null,"topics":["data-pipelines","jupyter-notebooks","scikitlearn-machine-learning","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TheODDYSEY.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-10T19:24:43.000Z","updated_at":"2024-04-27T10:03:07.000Z","dependencies_parsed_at":"2024-03-21T17:10:23.753Z","dependency_job_id":null,"html_url":"https://github.com/TheODDYSEY/Scikit-Pipeline","commit_stats":null,"previous_names":["theoddysey/scikit-pipeline"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheODDYSEY%2FScikit-Pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheODDYSEY%2FScikit-Pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheODDYSEY%2FScikit-Pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TheODDYSEY%2FScikit-Pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TheODDYSEY","download_url":"https://codeload.github.com/TheODDYSEY/Scikit-Pipeline/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245605908,"owners_count":20643068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipelines","jupyter-notebooks","scikitlearn-machine-learning","visualization"],"created_at":"2024-12-04T04:18:38.174Z","updated_at":"2025-03-26T07:21:44.024Z","avatar_url":"https://github.com/TheODDYSEY.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bank Customer Churn Prediction Project\n\n## Project Overview\n\nThis project aims to predict customer churn in a bank using machine learning techniques. Customer churn, also known as customer attrition, refers to the phenomenon where customers cease doing business with a company. Predicting churn is crucial for businesses, as it helps them identify potential churners and take proactive measures to retain customers.\n\n## Dataset Description\n\nThe dataset \"train.csv\" contains various features related to bank customers. Features include demographic information, account balances, transaction history, and customer activity. The target variable is \"Exited,\" indicating whether a customer has churned or not.\n\n## Installation\n\n1. Clone the project repository.\n2. Install the required dependencies by running:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n## Usage\n\n1. Ensure you have the necessary dataset named \"train.csv\" in the project directory.\n2. Run the project code using:\n    ```bash\n    python churn_prediction.py\n    ```\n\n## Implementation Details\n\n1. **Data Loading and Preprocessing**:\n   - Load the dataset and preprocess it by dropping irrelevant columns, shuffling the data, and splitting into features and target variable.\n   \n2. **Feature Engineering and Preprocessing**:\n   - Scale numerical features and encode categorical features.\n   \n3. **Feature Selection**:\n   - Select the best features using the Chi-square (chi2) statistical test.\n   \n4. **Model Building**:\n   - Build a Random Forest Classifier to predict customer churn.\n\n5. **Pipeline Creation**:\n   - Create pipelines for data preprocessing, feature selection, and model training.\n\n6. **Model Training and Evaluation**:\n   - Fit the complete pipeline to the training data and evaluate model performance on the testing dataset.\n\n## Results\n\nThe trained Random Forest Classifier achieves an accuracy of X% on the testing dataset. Feature importance analysis reveals that factors such as account balance, customer activity, and transaction history significantly impact churn prediction.\n\n## Future Work\n\nPotential areas for improvement or future extensions of the project include:\n- Experimenting with different machine learning models (e.g., Gradient Boosting, Neural Networks).\n- Tuning hyperparameters to optimize model performance.\n- Incorporating additional features such as customer feedback and satisfaction scores.\n\n## Contributing\n\nContributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- The dataset used in this project is sourced from Kaggle.\n- Special thanks to the contributors of scikit-learn and other open-source libraries used in this project.\n\n## Conclusion\n\nThis project demonstrates the application of machine learning techniques for predicting customer churn in a bank. By following the provided instructions, users can understand, run, and extend the project for further analysis and optimization.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheoddysey%2Fscikit-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftheoddysey%2Fscikit-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheoddysey%2Fscikit-pipeline/lists"}