{"id":13574311,"url":"https://github.com/oneapi-src/customer-churn-prediction","last_synced_at":"2025-04-04T14:32:20.808Z","repository":{"id":66145920,"uuid":"574715783","full_name":"oneapi-src/customer-churn-prediction","owner":"oneapi-src","description":"AI Starter Kit for customer churn prediction using Intel® Extension for Scikit-learn*","archived":true,"fork":false,"pushed_at":"2024-02-01T23:56:29.000Z","size":110,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-05T09:44:11.914Z","etag":null,"topics":["machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oneapi-src.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-05T23:18:28.000Z","updated_at":"2024-04-08T18:32:55.000Z","dependencies_parsed_at":"2024-02-13T00:50:01.359Z","dependency_job_id":null,"html_url":"https://github.com/oneapi-src/customer-churn-prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fcustomer-churn-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fcustomer-churn-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fcustomer-churn-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fcustomer-churn-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oneapi-src","download_url":"https://codeload.github.com/oneapi-src/customer-churn-prediction/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247194310,"owners_count":20899463,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","scikit-learn"],"created_at":"2024-08-01T15:00:50.177Z","updated_at":"2025-04-04T14:32:20.399Z","avatar_url":"https://github.com/oneapi-src.png","language":"Python","readme":"PROJECT NOT UNDER ACTIVE MANAGEMENT\n\nThis project will no longer be maintained by Intel.\n\nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \n\nIntel no longer accepts patches to this project.\n\nIf you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n\nContact: webadmin@linux.intel.com\n# **Scikit-Learn Customer Churn Prediction**\r\n\u003c!-- Table of Contents --\u003e\r\n## Table of Contents\r\n- [Purpose](#purpose)\r\n- [Reference Solution](#reference-solution)\r\n- [Reference Implementation](#reference-implementation)\r\n- [Intel® Implementation](#optimizing-the-e2e-solution-with-intel®-oneapi)\r\n- [Performance Observations](#performance-observations)\r\n\r\n## Purpose\r\nFor Telecommunication companies, it is key to attract new customers and at the same time avoid contract terminations (=churn) to grow their revenue-generating base. Looking at churn, different reasons trigger customers to terminate their contracts, for example, better prices, more interesting packages, bad service experiences, or changes in customers’ personal situations. Churn analytics provides valuable capabilities to predict customer churn in real-time, whenever customers interaction take place or in batch, using multiple events generated by the customers’ interactions. Organizations can take decisions based on churn prediction to proactively provide suitable compensation offers or take corrective service improvement actions to reduce the churn.\r\n\r\nIn this reference kit, we demonstrate a reference implementation for predicting the customer churn status which can be part of the overall customer service improvement system.\r\n\r\n## Reference Solution\r\nBased on the previous customer churn history along with their service subscription details, a ML model is built to predict whether the customer is going to churn. In this reference kit, since this is a classification problem, we are exploring two different approaches of solving it as given below and find out the best among them.\r\n- Probabilistic approach using Logistic Regression algorithm\r\n- Decision tree approach using Random Forest algorithm\r\n\r\n\u003e Above algorithm has been picked up as an example for the appropriate approaches\r\n\r\nWe also focus on below critical factors for the efficient solution\r\n- Faster model development and \r\n- Performance-efficient model inference\r\n\r\n\u003c!-- Key Implementation Details --\u003e\r\n### Key Implementation Details\r\nThis section describes the code base and how to replicate the benchmarking results. The included code demonstrates a complete framework for\r\n1. Setting up a virtual environment for stock Scikit-Learn and Intel® Extension for Scikit-learn*\r\n2. Training a Logistic regression based Churn Prediction Model for predicting customer churn using stock version of SciKit-Learn and Intel® Extension for Scikit-learn*\r\n3. Training a Random Forest classifier based Churn Prediction Model for predicting customer churn using stock version of SciKit-Learn and Intel® Extension for Scikit-learn*\r\n4. Predicting the customer churn from the trained models on new data using Scikit-Learn/Intel® Extension for Scikit-learn*\r\n\r\n## Reference Implementation\r\n**Use case E2E flow**\r\n\r\n![image](assets/e2e_flow.png)\r\n\r\n\r\n**Reference Sources**\r\n\r\nModel Training: https://www.kaggle.com/code/edwingeevarughese/internet-service-churn-analysis\r\n\r\n### Note:\r\n***Please see this data set's applicable license for terms and conditions. Intel®Corporation does not own the rights to this data set and does not confer any rights to it.***\r\n\r\n### Repository clone and Anaconda installation\r\n\r\n```\r\ngit clone https://github.com/oneapi-src/customer-churn-prediction\r\ncd customer-churn-prediction\r\n```\r\n**Anaconda Installation**\r\n\r\n\u003e**Note**: This reference kit implementation already provides the necessary conda environment configurations to set up the software requirements. To utilize these environment scripts, first, install Anaconda/Miniconda by following the instructions at the following link\u003cbr\u003e[Anaconda installation](https://docs.anaconda.com/anaconda/install/linux/)\r\n\r\n**Dataset Details**\r\n\r\n\u003c!-- Dataset Details --\u003e\r\nDataset used in this reference kit is taken from [Kaggle](https://www.kaggle.com/datasets/mehmetsabrikunt/internet-service-churn)\r\n\u003e *Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.*\r\n\r\nEach row in the data set represents a customer and contains below features of the customer\r\n- `ID` Customer Unique Identification Number\r\n- `is_tv_subscriber` customer has a tv subscription (1,0)\r\n- `is_movie_package_subscriber` Wis he/she has a cinema movie package subs (1,0)\r\n- `subscription_age` how many years has the customer used our service\r\n- `bill_avg` last 3 months bill avg\r\n- `reamining_contract` how many years remaining for the customer contract. if null; the customer hasn't had a contract. the customer who has a contract time\r\n- `service_failure_count` customer call count to call center for service failure for last 3 months\r\n- `download_avg` last 3 months internet usage (in GigaBytes)\r\n- `upload_avg` last 3 months upload avg (in GigaBytes)\r\n- `download_over_limit` most customer has a download limit. if they reach this limit they have to pay for this. this column contains \"limit over\r\n- `Churn` Whether the customer churned or not (Yes or No)\r\n\r\n\r\nBased on these features and previous known churn history a model is built to predict the churn of the customer for the below scenario\r\n- Predict the churn in batch mode (on demand for proactive service offers, to check customer satisfaction and improve services)\r\n\r\nThe dataset consisted of structured Telecomm internet subscriber Data for 72275 customers and 10 features against each customer as described above. 4 categorical features and 6 numerical features. Based on these features we are required to predict churn = Yes/No.\r\n\r\n\r\n**Dataset Installation**\r\n\r\nOnce the repo is cloned, follow the below commands to install the dataset described previously for your environment.\r\n1. Go to the [Kaggle](https://www.kaggle.com/datasets/mehmetsabrikunt/internet-service-churn)link.\r\n2. Download the dataset, and click on the download button(782kB).\r\n3. Unzip the data folder to get the .csv file. \r\n4. Move the .csv file inside the \"data\" folder inside the oneAPI-CustomerChurnPrediction-SciKit repository.\r\n\r\n\r\n**Solution Setup**\r\n\r\nFollow the below conda installation commands to set up the Stock environment along with the necessary packages for this model training and prediction.\r\n\u003e**Note: It is assumed that the present working directory is the root directory of this code repository**\r\n\r\n```\r\nconda env create --file env/stock/stock-churn-prediction.yml\r\n```\r\n*Activate conda environment for stock version*\r\nUse the following command to activate the environment that was created:\r\n```\r\nconda activate stock-churn-prediction\r\n```\r\n**Software Requirements**\r\n| **Package**                | **Version**\r\n| :---                       | :---\r\n| python                     | 3.9.13\r\n| scikit-learn               | 1.1.3\r\n\r\n\r\n**Solution Implementation**\r\n***Hyperparameter tuning***\r\n*Random Forest Classifier Algorithm Parameters Considered*\r\n| **Parameter** | **Description** | **Values**\r\n| :---          | :---            | :---\r\n| **n_estimators** | The number of trees in the forest. | **100, 150**\r\n| **max_leaf_nodes** | Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. | **15, 30, 45**\r\n| **maxdepth** | The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. | **None, 4, 5**\r\n\r\n*Logistic Regression Algorithm Parameters Considered*\r\n| **Parameter** | **Description** | **Values**\r\n| :---          | :---            | :---\r\n| **fit_intercept** | Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. | **True,False**\r\n\r\n*GridSearchCV*\r\nIt is used to apply fit method to train and optimize by cross-validated grid-search over a parameter grid.\r\n\r\n*Parameters Considered for GridSearchCV*\r\n| **Parameter** | **Description**\r\n| :-- | :--\r\n| `n_jobs` | Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all available processors.\r\n| `param_grid` | Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.\r\n| `cv` | Determines the cross-validation splitting strategy\r\n\r\n\u003c!-- Benchmarking Details --\u003e\r\n**Benchmarking Details**\r\n\r\nBelow is the summary of benchmarking that are performed in this experiment\r\n- One of the important aspects in this enterprise scenario is to improve the MLOps time for developing and deploying new models due to its ever-increasing size of datasets over a period. To address this, the use of Intel® Extension for Scikit-learn optimized Logistic Regression and Random Forest Classifier algorithms are explored. The training and prediction time of models are benchmarked over stock version of Scikit-Learn python package.\r\n\r\n\r\n**Model Building Process**\r\n\r\nThe training.py script **reads and preprocesses the data**, **trains an RFC and LR model**, and inference.py script **predicts on unseen test data** using the trained model, while also reporting on the execution time.\r\n\r\nThe usage of the training benchmark script is as given below\r\n```sh\r\nusage: training.py [-h] [-s DATASIZE] [-hy HYPERPARAMS] [-tr TRAINING] [-b BATCHSIZE] [-i INTEL] [-ts TESTSPLIT] [-model MODELNAME] [-save SAVEMODELDIR]\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -s DATASIZE, --datasize DATASIZE\r\n                        Dataset size, default is full dataset\r\n  -hy HYPERPARAMS, --hyperparams HYPERPARAMS\r\n                        Enabling Hyperparameter tuning (0/1)\r\n  -tr TRAINING, --training TRAINING\r\n                        Enabling training (0/1)\r\n  -b BATCHSIZE, --batchsize BATCHSIZE\r\n                        Enabling inference (0/1)\r\n  -i INTEL, --intel INTEL\r\n                        Use intel accelerated technologies where available (0/1)\r\n  -ts TESTSPLIT, --testsplit TESTSPLIT\r\n                        Percentage of test split from the total dataset (default is 20) Remaining percentage will be used as Training dataset split (default is 80)\r\n  -model MODELNAME, --modelname MODELNAME\r\n                        Default is 'random_forest_classifier'\r\n  -save SAVEMODELDIR, --savemodeldir SAVEMODELDIR\r\n                        Please specify model path along with model name to save Default is 'models/customerchurn_rfc_joblib'\r\n```\r\n\r\nTo run the script with stock python and stock technologies on full dataset:\r\n```sh\r\n#For Training with Random Forest Classifier default hyperparameters\r\npython src/training.py -model random_forest_classifier -tr 1 -save models/customerchurn_rfc_joblib\r\n#For Training with Logistic Regression default hyperparameters\r\npython src/training.py -model logistic_regression -tr 1 -save models/customerchurn_lr_joblib\r\n\r\n\r\n#For Hyperparameter tuning using Random Forest Classifier\r\npython src/training.py -model random_forest_classifier -tr 1 -hy 1 -save models/customerchurn_rfc_joblib\r\n#For Hyperparameter tuning using Logistic Regression\r\npython src/training.py -model logistic_regression -tr 1 -hy 1 -save models/customerchurn_lr_joblib\r\n```\r\n\u003eNote: By supplying the \"-s 10000\" (example for utilizing 10K samples) option, users can execute training and hyperparameter tuning benchmarks on a subset of the dataset. By default, training is performed on the whole dataset.\r\n\r\n\u003eNote: For above default parameters training mode, model will be saved with an extension of \"_default\" to the provided path (-save)\r\n\r\nThe hyperparameter tuning will save the best estimator model for both algorithms under `models` directory with the below file names for every run\r\n`models/customerchurn_rfc_joblib` - Random Forest Classifier model\r\n`models/customerchurn_lr_joblib` - Logistic Regression model\r\n\r\nThe saved model can then be passed to the inference benchmarking script to perform the prediction as given below for the given dataset\r\n\r\n**Running Predictions**\r\nThe usage of the inference benchmark script is as given below\r\n```sh\r\nusage: inference.py [-h] [-s DATASIZE] [-b BATCHSIZE] [-i INTEL] [-model MODELNAME] [-save SAVED_MODEL_DIR] [-ts TESTSPLIT]\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -s DATASIZE, --datasize DATASIZE\r\n                        Dataset size, default is full dataset\r\n  -b BATCHSIZE, --batchsize BATCHSIZE\r\n                        Enabling inference (0/1)\r\n  -i INTEL, --intel INTEL\r\n                        Use intel accelerated technologies where available (0/1)\r\n  -model MODELNAME, --modelname MODELNAME\r\n                        Default is 'random_forest_classifier'\r\n  -save SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR\r\n                        Please specify model path along with model name to save Default is 'models/customerchurn_rfc_joblib'\r\n  -ts TESTSPLIT, --testsplit TESTSPLIT\r\n                        Percentage of test split from the total dataset (default is 20) Remaining percentage will be used as Training dataset split (default is 80)\r\n```\r\n\r\n```sh\r\n#For Inferencing using Random Forest Classifier\r\npython src/inference.py -model random_forest_classifier --saved_model_dir models/customerchurn_rfc_joblib -b 3000\r\n#For Inferencing using Logistic Regression\r\npython src/inference.py -model logistic_regression --saved_model_dir models/customerchurn_lr_joblib -b 3000\r\n```\r\n\u003eNote: 20% of datasize is used as batchsize for benchmarking batch prediction (Stock Scikit-learn models used here for inference)\r\n\r\n\u003eNote: The model trained on 72K dataset with best-tuned parameters is used for performing inference across different batch sizes.\r\n\r\n\u003eNote: If the inference batch size is greater than 20% of the total data size, the test dataset accuracy computation will be done on training samples, which is not typical practice but is utilized here to mimic bigger batchsize. Accuracy must be disregarded in this case.\r\n\r\n## Optimizing the E2E solution with Intel® oneAPI\r\n\r\n**Optimized E2E use case flow with Intel® oneAPI components**\r\n\r\n![image](assets/e2e_flow_optimized.png)\r\n\r\n**Optimized Software Components**\r\n\r\n***Intel® oneAPI Extension for Scikit*** Designed for data scientists, Intel® Extension for Scikit-learn* is a seamless way to speed up your Scikit-learn applications for machine learning to solve real-world problems. This extension package dynamically patches Scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver, while achieving the speed up for your machine learning algorithms.\r\n\r\n**Software Requirements**\r\n| **Package**                | **Version**\r\n| :---                       | :---\r\n| Intel python               | 3.9.13 - 2022.0.0\r\n| scikit-learn               | 1.1.3\r\n| scikit-learn-intelex       | 2021.7.1\r\n\r\n**Optimized Solution Setup**\r\n\r\nFollow the below conda installation commands to set up the Stock environment along with the necessary packages for this model training and prediction.\r\n\u003e**Note: It is assumed that the present working directory is the root directory of this code repository**\r\n\r\n```sh\r\nconda env create --file env/intel/intel-churn-prediction.yml\r\n```\r\n*Activate conda environment for intel version*\r\nUse the following command to activate the environment that was created:\r\n```sh\r\nconda activate intel-churn-prediction\r\n```\r\n\r\n**Optimized Solution Implementation**\r\n\r\n***Model building process with Intel® Optimization***\r\n\r\nTo run the script with intel optimized python distribution on full dataset:\r\n```sh\r\n#For Training with Random Forest Classifier default hyperparameters\r\npython src/training.py -i 1 -model random_forest_classifier -tr 1 -save models/customerchurn_rfc_joblib\r\n#For Training with Logistic Regression default hyperparameters\r\npython src/training.py -i 1 -model logistic_regression -tr 1 -save models/customerchurn_lr_joblib\r\n\r\n\r\n#For Hyper parameter tuning using Random Forest Classifier\r\npython src/training.py -i 1 -model random_forest_classifier -tr 1 -hy 1 -save models/customerchurn_rfc_joblib\r\n#For Hyper parameter tuning using Logistic Regression\r\npython src/training.py -i 1 -model logistic_regression -tr 1 -hy 1 -save models/customerchurn_lr_joblib\r\n```\r\n\u003eNote: By supplying the \"-s 10000\" (example for utilizing 10K samples) option, users can execute training and hyperparameter tuning benchmarks on a subset of the dataset. By default, training is performed on the whole dataset.\r\n\r\nThe saved optimized model can then be passed to the benchmarking script to perform the prediction as given below\r\n```sh\r\n#For Inferencing using Random Forest Classifier\r\npython src/inference.py --i 1 -model random_forest_classifier --saved_model_dir models/customerchurn_rfc_joblib -b 3000\r\n#For Inferencing using Logistic Regression\r\npython src/inference.py --i 1 -model logistic_regression --saved_model_dir models/customerchurn_lr_joblib -b 3000\r\n```\r\n\u003eNote: 20% of datasize is used as batchsize for benchmarking batch prediction (Intel® Extension for Scikit-learn* models used here for inference)\r\n\r\n\u003eNote: The model trained on 72K dataset with best-tuned parameters is used for performing inference across different batch sizes.\r\n\r\n\u003eNote: If the inference batch size is greater than 20% of the total data size, the test dataset accuracy computation will be done on training samples, which is not typical practice but is utilized here to mimic bigger batchsize. Accuracy must be disregarded in this case.\r\n\r\n## Performance Observations\r\n### Algorithm: Logistic Regression\r\n***Best Hyperparameter Training***\u003cbr\u003e\r\n![image](assets/best_training_lr.png)\r\n\u003cbr\u003e**Key Takeaways**\u003cbr\u003eIntel® Extension for Scikit-learn* for the Logistic Regression model offers training time speed-up ranging between 1.68x and 2.10x compared to stock Scikit-learn with tuned hyperparameters on this dataset.\r\n\r\n***Inference Results***\u003cbr\u003e\r\n![image](assets/inference_lr.png)\r\n\u003cbr\u003e**Key Takeaways**\u003cbr\u003e\r\n\r\n**Prediction**\r\n- Intel® Extension for Scikit-learn* offers batch prediction time speed-up up to 1.11x compared to stock Scikit-learn with Logistic Regression on hyperparameter tuned model.\r\n\u003e No accuracy drop observed\r\n\r\n### Algorithm: Random Forest Classifier\r\n***Best Hyperparameter Training***\u003cbr\u003e\r\n![image](assets/best_training_rfc.png)\r\n\u003cbr\u003e**Key Takeaways**\u003cbr\u003eIntel® Extension for Scikit-learn* for the Random Forest Classifier model offers a training time speed-up of up to 11.05x compared to stock Scikit-learn with tuned hyperparameters on this dataset.\r\n\r\n***Inference Results***\u003cbr\u003e\r\n![image](assets/inference_rfc.png)\r\n\u003cbr\u003e**Key Takeaways**\u003cbr\u003e\r\n\r\n**Prediction**\r\n- Intel® Extension for Scikit-learn* offers batch prediction time speed-up of up to 9.12x compared to stock Scikit-learn with Random Forest Classifier on hyperparameter tuned model.\r\n\r\n#### Conclusion\r\nTo build a customer churn prediction solution at scale, Data scientists will need to train models for substantial datasets and run inference more frequently. The ability to accelerate training will allow them to train more frequently and achieve better accuracy. Besides training, faster speed in inference will allow them to run predictions in real-time scenarios as well as more frequently. A Data Scientist will also look at data classification to tag and categorize data so that it can be better understood and analyzed. This task requires a lot of training and retraining, making the job tedious. The ability to get it faster speed will accelerate the ML pipeline.\r\nConsidering the customer churn prediction problem, the key factor to decide on the above two algorithms is to measure the recall value of the model correctly identifying the churn. \r\n\r\nThis reference kit implementation provides a performance-optimized guide around customer churn prediction use cases that can be easily scaled across similar use cases.\r\n\r\nThe recall value of incorrectly identifying the churn prediction is not a major factor, as it will still help reduce the churn. As per the default dataset size, it has been observed that Logistic regression recall value for correctly identifying the churn is on higher side compared to Random Forest Classifier, which would be the opt one for this kind of problem.\r\n\r\n### Notices \u0026 Disclaimers\r\nPerformance varies by use, configuration and other factors. Learn more on the [Performance Index site](https://edc.intel.com/content/www/us/en/products/performance/benchmarks/overview/). \r\nPerformance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. \r\nYour costs and results may vary. \r\nIntel technologies may require enabled hardware, software or service activation.\r\n© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.  \r\n\r\nTo the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.\r\n \r\nIntel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.\r\n\r\n\r\n## Appendix\r\n\r\n**Date Testing Performed**: November 2022 \r\n\r\n**Configuration Details and Workload Setup**: Azure D8v5 (Intel® Xeon® Platinum 8370C CPU @ 2.80GHz), 1 Socket, 4 Cores per Socket, 2 Threads per Core, Turbo: On, Total Memory: 32 GB, OS: Ubuntu 20.04, Kernel: Linux 5.15.0-1019-azure. Framework/Toolkit: Intel® oneAPI Extension for Scikit, Python -v3.9.13. Dataset size: 72275 customers and 10 features. Model: RandomForestClassifier and Logistic Regression. Batch size for prediction time benchmark: 20% of train data size. Precision: FP32. \r\n\r\n**Testing performed by** Intel Corporation\r\n\r\n**Accuracy Observations**\r\n\r\nAccuracy of RandomForestClassifier is upto 97%.\r\n\r\nAccuracy of Logistic Regression is upto 82%.\r\n\r\n**Experiment Setup**\r\n| Platform                          | Microsoft Azure: Standard_D8_v5 (Ice Lake)\u003cbr\u003eUbuntu 20.04\r\n| :---                              | :---\r\n| Hardware                          | Intel IceLake CPU\r\n| Software                          | Intel® oneAPI AI Analytics Toolkit, scikit-Learn\r\n| What you will learn               | Intel oneAPI performance advantage over the stock versions\r\n","funding_links":[],"categories":["Table of Contents"],"sub_categories":["AI - Frameworks and Toolkits"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fcustomer-churn-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foneapi-src%2Fcustomer-churn-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fcustomer-churn-prediction/lists"}