{"id":26091654,"url":"https://github.com/phanchenh/predictquantity_pythonproject_pizzadataset","last_synced_at":"2026-05-08T02:21:36.537Z","repository":{"id":277131083,"uuid":"930451730","full_name":"PhanChenh/PredictQuantity_PythonProject_PizzaDataset","owner":"PhanChenh","description":"Sales Quantity Forecasting for Pizza Dataset 2015","archived":false,"fork":false,"pushed_at":"2025-02-20T02:06:07.000Z","size":4109,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-20T03:22:15.380Z","etag":null,"topics":["business-analytics","business-intelligence","data-visualization","decision-tree-regression","gradient-boosting-regressor","insights","lasso-regression","machine-learning","ridge-regression","sales-forecasting","scikitlearn-machine-learning","supervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PhanChenh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-10T16:52:05.000Z","updated_at":"2025-02-20T02:06:11.000Z","dependencies_parsed_at":"2025-02-12T10:26:22.500Z","dependency_job_id":"7a0cf689-91ab-4698-991d-1cf02858ab36","html_url":"https://github.com/PhanChenh/PredictQuantity_PythonProject_PizzaDataset","commit_stats":null,"previous_names":["phanchenh/predictquantity_pythonproject_pizzadataset"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhanChenh%2FPredictQuantity_PythonProject_PizzaDataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhanChenh%2FPredictQuantity_PythonProject_PizzaDataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhanChenh%2FPredictQuantity_PythonProject_PizzaDataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhanChenh%2FPredictQuantity_PythonProject_PizzaDataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PhanChenh","download_url":"https://codeload.github.com/PhanChenh/PredictQuantity_PythonProject_PizzaDataset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242675751,"owners_count":20167595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-analytics","business-intelligence","data-visualization","decision-tree-regression","gradient-boosting-regressor","insights","lasso-regression","machine-learning","ridge-regression","sales-forecasting","scikitlearn-machine-learning","supervised-learning"],"created_at":"2025-03-09T10:22:39.691Z","updated_at":"2026-05-08T02:21:36.507Z","avatar_url":"https://github.com/PhanChenh.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project Title: Sales Quantity Forecasting for Pizza Dataset 2015\n\n## Table of Contents\n- [Overview](#overview)\n- [Dataset](#dataset)\n- [Objective](#objective)\n- [Analysis Approach](#analysis-approach)\n- [Key Findings](#key-findings)\n- [How to run code](#how-to-run-code)\n- [Technologies Used](#technologies-used)\n- [Results \u0026 Visualizations](#results--visualizations)\n- [Recommendation](#recommendation)\n- [Contact](#contact)\n\n## Overview\n\nThis project focuses on forecasting pizza sales quantity using machine learning techniques. By analyzing transaction data, we aim to predict the total quantity sold weekly for specific pizzas of different sizes in New Jersey. The ultimate goal is to aggregate these weekly predictions to generate an annual sales forecast for the next year.\n\n## Dataset\n\nThe analysis is based on the [data_pizza.xlsx](data_pizza.xlsx)\n- Time period Covered: 2015\n- Number of Record: 21288 rows\n- Number of Features: 10\n- Key Variables: according to [data_dictionary.xlsx](data_dictionary.xlsx)\n\n## Objectives\n\nThe objective is to analyze and model the total quantity sold on a weekly basis using categorical features, forecast total sales and quantity for the upcoming year by aggregating weekly predictions, and identify key factors influencing pizza sales to optimize business strategies accordingly.\n\n## Analysis Approach\n1. Exploratory Data Analysis (EDA): Data cleaning, handling missing values, and visualizing trends.\n2. Data Preprocessing and Data Selection: Applying transformations such as log transformations and standardization.\n3. Model Selection and Evaluation: Comparing multiple supervised learning models.\n4. Forecasting Sales and Quantity: Using the best-performing model for predictions.\n5. Insights and Recommendations: Deriving actionable insights for business decisions.\n\nTo predict the quantity sold based on categorical features, several prediction models can be considered. The choice of model depends on the nature of the data and the specific problem requirements. Here are some suitable models for handling categorical features:\n\n1. Linear Models with Regularization:\n- Lasso Regression (L1 regularization): Helps in feature selection by shrinking less important feature coefficients to zero.\n- Ridge Regression (L2 regularization): Regularizes model complexity by shrinking coefficients without setting them to zero.\n\nAdvantages:\n- Good for datasets with many categorical variables, especially if one-hot encoding leads to a high-dimensional space.\n- Efficient and interpretable.\n\n2. Decision Tree Regression:\n- Handles both numerical and categorical data natively.\n- Can capture non-linear relationships and interactions between features.\n\nAdvantages:\n- Intuitive and easy to interpret.\n- No need for extensive preprocessing of categorical data.\n\n3. Random Forest Regression:\n- An ensemble of decision trees, which improves generalization by reducing variance.\n- Handles categorical features well and is robust to overfitting.\n\nAdvantages:\n- High accuracy and can capture complex patterns.\n- Feature importance can be derived, providing insights into the impact of different features.\n\n4. Gradient Boosting Machines (GBM): Handle categorical features effectively.\n\nAdvantages:\n- Excellent predictive performance.\n- Built-in handling for categorical features.\n\n5. Support Vector Machines (SVM):\n- Can be used with kernels to capture non-linear relationships.\n- Requires preprocessing of categorical data into numerical format (e.g., one-hot encoding).\n\nAdvantages:\n- Effective for both linear and non-linear data.\n- Robust to overfitting in high-dimensional spaces.\n\n## Key Findings\n- The Gradient Boosting Machine (GBM) model provided the best forecasting performance.\n- Total predicted sales and quantity for 2016 showed a decline compared to 2015.\n- Large and Medium pizza sizes were the most popular.\n- Non-holiday weeks saw significantly higher sales than holiday weeks.\n- Seasonal shifts: Highest sales in summer (2015) and winter (predicted 2016).\n- Medium-priced pizzas ($15-$25) dominated sales.\n- Weeks 39, 52, and 53 consistently experienced sales drops.\n\n## How to run code\n\n1. Install Required Libraries: Ensure all necessary libraries such as pandas, matplotlib, seaborn, Scikit-learn,... are installed like in the [file](sales_quantity_predicting.ipynb)\n2. Load the Dataset: Import the dataset by loading the [file](data_pizza.xlsx)\n3. Run the Analysis Notebooks: Execute the analysis notebooks in Jupyter to process the data, build and train the model, and visualize the results.\n\n## Technologies Used\n\n- Python: Data analysis and preprocessing were performed using pandas and numpy. For modeling, various algorithms were explored, including Lasso Regression, Ridge Regression, Decision Tree, Random Forest, Gradient Boosting Machine (GBM), and Support Vector Machines (SVM), etc. Model evaluation and hyperparameter tuning were done using scikit-learn, including GridSearchCV for optimizing model parameters.\n\n- Visualization: Visualizations were created with matplotlib and seaborn to analyze trends and evaluate model performance. Feature importance was also visualized to understand the impact of different features on the predictions.\n\n## Results \u0026 Visualizations\n\n### Choosing Dataset and Model\n\n![Screenshot 2025-02-19 145955](https://github.com/user-attachments/assets/f9f04ad2-2fc1-41b9-b267-96e657ba5063)\n\nFigure 1: Report the RMSE for the training and test sets for linear regression and Lasso regression model for each dataset\n\nFinding: Dataset with log transformations and Dataset with log transformations + standardization perform better \n\n![Screenshot 2025-02-19 150443](https://github.com/user-attachments/assets/989012b3-c950-4add-9063-daf65179a0e8)\n\nFigure 2: Model Validation - Lasso Regression Model \n\n![Screenshot 2025-02-19 150738](https://github.com/user-attachments/assets/edf0cb56-ede9-4633-9c54-6df4956d08e3)\n\nFigure 3: Model Validation - Ridge Regression Model\n\n![Screenshot 2025-02-19 150841](https://github.com/user-attachments/assets/51679730-1ddc-4e7c-aa09-50e9b240c524)\n\nFigure 4: Model Validation - Decision Tree Regression Model\n\n![Screenshot 2025-02-19 150938](https://github.com/user-attachments/assets/d28fc361-d24e-470e-b022-26fa47f78a6d)\n\nFigure 5: Model Validation - Random Forest Regressor Model \n\n![Screenshot 2025-02-19 151031](https://github.com/user-attachments/assets/fbf5386d-4ed8-444b-bddb-686447b1ccc8)\n\nFigure 6: Model Validation - Gradient Boosting Machine (GBM) Model\n\n![Screenshot 2025-02-19 151128](https://github.com/user-attachments/assets/6aa0b5ca-b776-4a02-976f-85f190be7fe6)\n\nFigure 7: Model Validation - Support Vector Regression (SVR) Model\n\n![Screenshot 2025-02-19 151230](https://github.com/user-attachments/assets/e919daa0-a972-4474-afdf-4198b17fd238)\n\nFigure 8: Model Validation - MCA  Model\n\nConclusion: The Gradient Boosting Machine (GBM) is the best model based on the lowest RMSE on both the training and test sets. It indicates a good balance between fitting the training data and generalizing to new, unseen data.\n\n### Result for prediction\n\n2015 total sales: 817.86K\n\n2015 total quantity: 49574\n\n2016 Total Sales (predict): 766410.2264293174\n\n2016 Quantity (predict): 46456.417408155285\n\nSeems like predicted quantity in 2016 according to the predicting model has lower sales and lower quantity than 2015. For further understanding, we can plot total sales, quantity by month, week with the predicted quantity to find out the reasons. \n\n![image](https://github.com/user-attachments/assets/4067f5c6-7b67-4f04-b76f-5d5635212b77)\nTotal quantity sold by pizza name in 2015\n\n![image](https://github.com/user-attachments/assets/bc9981e4-f7e3-485c-b6f7-72a7a7becddd)\nTotal predicted quantity sell by pizza name in 2016\n\nFinding:\n\nTotal quantity sold by pizza name in 2015:\n- top 6 pizza around 2400-2500 quantity sold: the classic deluxe pizza \u003e the bbq chicken pizza\u003e the hawaiian pizza \u003e the pepperoni pizza \u003e the thai chicken pizza \u003e the california chicken pizza \n- bottom pizza around 500 quantity sold: the bire carre pizza.\n- other pizza around nearly 1000-2000 quantity sold.\n\nTotal predicted quantity sell by pizza name in 2016:\n- top 6 pizza around 2000-2500 quantity sold: the pepperoni pizza \u003e the classic deluxe pizza \u003e the california chicken pizza \u003e the thai chicken pizza \u003e the hawaiian pizza \u003e The Italian Supreme Pizza\n- bottom pizza around 500 quantity sold: the bire carre pizza.\n- other pizza around nearly 800-2000 quantity sold.\n\n![image](https://github.com/user-attachments/assets/aab5140a-4263-4570-911c-00eabbe2da26)\nVisualize the distribution of categories with target variable with Bar Plots in 2015\n\n![image](https://github.com/user-attachments/assets/c22a9bf0-55f3-4506-99d6-2f3dc33346e3)\nVisualize the distribution of categories with target variable with Bar Plots in 2016\n\nFinding:\n\nVisualize the distribution of categories with target variable with Bar Plots in 2015:\n- quantity sold by size: L\u003eM\u003eS\u003eXL\u003eXXL\n- non-holiday sold more than holiday around 2.7 times\n- Quantity sold by season: summer\u003espring\u003ewinter\u003efall (14K\u003e13.5K\u003e11.8K\u003e10.5K)\n- Quantity sold by price: medium price around $15-$25\u003e low price below $15 \u003e high price above $25 (30K\u003e15K\u003e500)\n\nVisualize the distribution of categories with target variable with Bar Plots in 2016:\n- quantity sold by size: L\u003eM\u003eS\u003eXL\u003eXXL\n- non-holiday sold more than holiday around 2.2 times\n- Quantity sold by season: winter\u003espring\u003efall\u003esummer (14K\u003e12K\u003e11.5K\u003e9.8K)\n- Quantity sold by price: medium price around $15-$25\u003e low price below $15 \u003e high price above $25 (30K\u003e14K\u003e500)\n\n![image](https://github.com/user-attachments/assets/b4d4c0b5-5cb7-450f-b968-c313a9b93ecb)\nTotal Quantity Sold vs. Week of Year (2015)\n\n![image](https://github.com/user-attachments/assets/2aa6ba5d-5e02-49c9-806e-00c3d61399e1)\nPredicted Quantity Sell vs. Week of Year (2016)\n\nFinding: \n\nTotal Quantity Sold vs. Week of Year (2015)\n- The sales start low at around 600 in Week 1.\n- From Week 2 to Week 38, sales increase and stabilize between 900 and 1,050 units.\n- Week 39 sees a drop to around 650, but sales recover in Week 40, stabilizing again around 900–1,050 until Week 47.\n- Week 48 peaks at 1,200, followed by a decline in the next few weeks, stabilizing around 950 until Week 51.\n- Week 52 drops to 650, and Week 53 further declines to around 400.\n\nPredicted Quantity Sell vs. Week of Year (2016)\n- The year starts higher than 2015, at around 700 in Week 1.\n- From Week 2 to Week 13, sales increase and stabilize around 1,000 units.\n- Week 14 sees a drop, with sales stabilizing around 900 until Week 26.\n- Week 27 marks another decline, with sales remaining between 700 and 780 until Week 38.\n- Week 39 drops to 600, but Week 40 sees a recovery to 800, followed by a gradual increase until Week 48, reaching 1,000 units.\n- Week 49 experiences a slight drop to 950, followed by a gradual decline to 900 in Week 51.\n- Week 52 drops to 700, and Week 53 further declines to 500.\n\nBelow is plot of important features\n![image](https://github.com/user-attachments/assets/669a791e-fb53-40f8-9482-ca70ce4a6d2d)\nFeature importance from GBM model\n\nConsider features that have significant impact: price, week_of_year\n\n## Recommendation:\n- Focus promotions on high-performing pizzas to drive even more sales.\n- Since 2016 predicts higher winter sales, consider launching winter-themed promotions (e.g., family bundles, seasonal flavors).\n- Identify why summer sales are predicted to decline—adjust marketing efforts accordingly.\n- Consider revising pricing strategy for high-priced pizzas—perhaps premium toppings or meal combos could justify the price.\n- Adjust Summer Marketing: Address the expected decline in summer sales.\n\n## Contact\n\n📧 Email: phanchenh99@gmail.com\n\n🔗 [LinkedIn](https://www.linkedin.com/in/phan-chenh-6a7ba127a/) | [Portfolio](https://henh-phan-chenh.vercel.app/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphanchenh%2Fpredictquantity_pythonproject_pizzadataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphanchenh%2Fpredictquantity_pythonproject_pizzadataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphanchenh%2Fpredictquantity_pythonproject_pizzadataset/lists"}