https://github.com/christabelsakyi/product_purchase_prediction
As a newly hired AI Engineer, your task is to predict customer behavior based on various features such as age, income, and gender. This exercise involves cleaning the data, training a decision tree model, and evaluating the model's performance to understand the key factors influencing customer purchasing decisions.
https://github.com/christabelsakyi/product_purchase_prediction
machine-learning matplotlib-pyplot numpy python seaborn sklearn
Last synced: about 1 month ago
JSON representation
As a newly hired AI Engineer, your task is to predict customer behavior based on various features such as age, income, and gender. This exercise involves cleaning the data, training a decision tree model, and evaluating the model's performance to understand the key factors influencing customer purchasing decisions.
- Host: GitHub
- URL: https://github.com/christabelsakyi/product_purchase_prediction
- Owner: christabelsakyi
- Created: 2025-10-03T20:26:07.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-10-05T17:13:56.000Z (about 1 month ago)
- Last Synced: 2025-10-05T18:25:59.913Z (about 1 month ago)
- Topics: machine-learning, matplotlib-pyplot, numpy, python, seaborn, sklearn
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Product Purchase Prediction
### **Exercise: Building and Optimizing a Decision Tree Classifier for Product Purchase Prediction**
As a newly hired AI Engineer, your task is to predict customer behavior based on various features such as age, income, and gender. This exercise involves cleaning the data, training a decision tree model, and evaluating the model's performance to understand the key factors influencing customer purchasing decisions.
### **Dataset**
You are provided with a dataset that contains customer information and their product purchase behavior.
**Dataset columns:**
* `Age`: The age of the customer.
* `Income`: The income of the customer (in thousands).
* `Gender`: The gender of the customer (Male/Female).
* `Buy_Product`: The target variable indicating whether the customer bought the product (1 for yes, 0 for no).
---
### **Instructions**
#### **1\. Data Preprocessing**
1. **Load the dataset** from the provided CSV file.
2. **Convert categorical variables**: Convert the `Gender` column from categorical values (`Male`, `Female`) to numerical format (`Male = 0`, `Female = 1`).
3. **Normalize the features**: Apply Min-Max scaling to the `Age` and `Income` columns to bring the values between 0 and 1\.
4. **Split the dataset** into:
* Features (`X`): `Age`, `Income`, and `Gender`.
* Target variable (`y`): `Buy_Product`.
#### **2\. Train a Decision Tree Classifier**
1. **Train a Decision Tree model** using the features (`Age`, `Income`, and `Gender`) to predict `Buy_Product`.
2. **Hyperparameter tuning**: Use Grid Search to optimize the model's hyperparameters such as `max_depth` and `min_samples_split`. Report the best parameters.
#### **3\. Make Predictions**
1. **Predict** the target for the following new data points:
* `Age = 40`, `Income = 50`, `Gender = Male`
* `Age = 30`, `Income = 45`, `Gender = Female`
#### **4\. Evaluate the Model**
1. **Calculate the model's accuracy** on the entire dataset.
2. **Generate the confusion matrix** and **classification report** to evaluate the model’s performance.
#### **5\. Visualize the Decision Tree**
1. **Plot the decision tree** to understand how the model makes predictions. Include feature names and class labels in the visualization.
#### **6\. Cross-Validation**
1. Perform **5-fold cross-validation** on the model and report the average accuracy score.
---
### **Deliverables**
1. **Preprocessed Dataset**: Include the dataset after preprocessing (with normalized features and encoded gender).
2. **Trained Model**: Submit the decision tree model after training and hyperparameter optimization.
3. **Model Evaluation**: Provide the accuracy, confusion matrix, and classification report.
4. **Visualizations**: Include a plot of the decision tree.
5. **Cross-Validation Results**: Report the average accuracy score from 5-fold cross-validation.