An open API service indexing awesome lists of open source software.

https://github.com/lucashomuniz/project-10

Optimizing Sales Forecast Accuracy: Exploratory Analysis and Insights
https://github.com/lucashomuniz/project-10

data-analysis data-munging data-visualization dax-languague exploratory-data-analysis language-r power-bi sales-forecast statistics-modules

Last synced: about 1 year ago
JSON representation

Optimizing Sales Forecast Accuracy: Exploratory Analysis and Insights

Awesome Lists containing this project

README

          

# ✅ PROJECT-10

In this project, **ACME S.A.**, a personal care company producing products such as **shampoo**, **soap**, and **toothpaste**, faced supply shortages attributed to promotional activities causing demand spikes. These promotions were not incorporated into the **demand planning process**, leaving production unprepared for increased volumes. To address this issue and validate the hypothesis, a **data-driven analysis** is conducted.

The approach involves analyzing **sales data** to evaluate the impact of promotions on demand and deploying five **predictive models**—**Linear Regression**, **Random Forest**, **ElasticNet**, **K-Nearest Neighbors (KNN)**, and **Gradient Boosting Machines (GBM)**—to forecast future sales. Model performance will be compared to determine the most accurate forecasting method. Additionally, the analysis estimates the proportion of sales driven by **promotions** versus **regular purchasing patterns**. The insights derived will enable **ACME S.A.** to enhance its demand planning process and better manage fluctuations caused by promotional activities, ensuring a more reliable supply chain.

Keywords: Python Language, Data Analysis, Machine Learning, Classification Model, Supervised Learning, Linear Regression, Random Forest, KNN, Elastic Net, GBM, Customer Segmentation, Predictive Modeling.

# ✅ PROCESS

> Question 1: What is the average impact of promotions on total sales, and is this impact consistent across all brands and customers?

My analysis began with a comprehensive evaluation of the **sales dataset**, focusing on the influence of **promotional periods** on sales behavior over time. Key variables such as **sales volume**, **invoice dates**, and the **duration of promotional events** were compared to identify patterns between promotions and sales activity. Notably, all **74,815 out of 380 unique dates** in the **Invoice Dataset** coincided with promotional periods, indicating a potential reliance on promotions to drive sales. Further analysis explored how factors like **product basecode**, **customer hierarchy**, and **promotion codes** influenced sales outcomes.

A significant variation in sales volume was observed across different **product-customer combinations**, suggesting that some customers respond more strongly to specific promotions. For instance, for product **HU0138**, customer **HU000105** participated in **822 promotions**, resulting in a sales volume of **10,578 tons**, while customer **HU000106** engaged in **1,548 promotions**, achieving a higher volume of **17,501 tons**. Conversely, customer **HU000109**, who did not participate in any promotions for the same product, generated a significantly lower sales volume of just **27 tons**. This highlights the variability in **promotion effectiveness** among different customer groups and products.

Analysis revealed that the **average sales volume during promotions** is approximately **13.50 tons per product-customer combination**, compared to **12.27 tons** without promotions. This represents a **10% average increase** in sales volume attributable to promotions. While promotions positively impact sales, the effect is moderate rather than dramatic.

Finally, it was identified that **97% of the sales volume** was generated by combinations associated with promotions, emphasizing a strong dependence on promotions for the majority of sales. Only **3% of the sales volume** came from combinations without promotions, indicating that a small portion of the product portfolio or customer base operates independently of promotional activities.

> Question 2: Given the approved promotions for the last quarter of 2018, how many tons of products will be sold during this period?

To estimate the **sales volume** for Q4 2018 considering approved **promotions**, the process was divided into several steps. First, **approved promotions** within Q4 2018 were filtered, resulting in **4,230 promotions**. These promotions were mapped to identify distinct **product-customer combinations**, yielding **615 combinations**. Next, historical sales data from Q4 2017 was analyzed as a basis for forecasting. By applying the same **product-customer combinations** identified for 2018, it was found that **266 combinations** had corresponding sales records in 2017, with a total sales volume of **72,864 tons**.

Using this historical data, a **predictive model** was built to estimate sales for Q4 2018. Five algorithms were tested: **Linear Regression**, **Random Forest**, **ElasticNet**, **K-Nearest Neighbors (KNN)**, and **Gradient Boosting Machines (GBM)**. Performance metrics including **Mean Absolute Error (MAE)**, **Root Mean Square Error (RMSE)**, and **R² (coefficient of determination)** were calculated to evaluate model accuracy. The **GBM model** achieved the best performance. The **GBM model** predicted a total sales volume of **72,942 tons** for Q4 2018, with the following performance metrics: **MAE = 65.55 tons**, **RMSE = 100.02 tons**, and **R² = 0.5967**. This indicates the model captured approximately **60% of the variance** in the data.

In summary, the process—from filtering promotions to applying predictive modeling—resulted in a reliable forecast of Q4 2018 sales, leveraging historical data and advanced algorithms to provide actionable insights.

![Screenshot 2024-09-13 at 10 40 42](https://github.com/user-attachments/assets/cd80d8e4-b34b-4f64-8238-f732ec1a3c61)

> Question 3: In the Q4 2018 forecast, how many tons of sales will be driven by promotions versus usual sales behavior?

To answer the third question, the analysis builds upon findings from the previous questions. In the second question, the total forecasted sales volume for Q4 2018 was estimated at **72,942 tons**, based on combinations of products and customers with active promotions. However, this does not imply that 100% of these sales are due to promotions. Even during promotional periods, a portion of sales stems from **usual buying behavior**, representing sales that would occur without promotions. The challenge is to distinguish between sales driven by **promotions** and those from **usual behavior**.

From the first question, the **average sales volume during promotions** was calculated as **1.24 tons higher** than the average sales without promotions, reflecting a **10.09% increase** in sales when promotions are active. This percentage represents the **average sales impact of promotions**, not the exact proportion of sales attributed to them. Using this percentage, the total forecasted sales volume for Q4 2018 can be divided into two components: **Usual Behavior**, sales independent of promotions, and **Additional Sales**, the increase due to promotions.

Given the forecasted total of **72,942 tons**, and knowing that promotions historically increase sales by **10.09%**, the sales directly attributed to promotions are estimated at **66,263 tons**. Subtracting this from the total, **6,679 tons** represent sales that would have occurred without promotions. This breakdown provides a clear distinction between sales volumes driven by **promotions** and those resulting from **usual buying behavior**, offering actionable insights into the impact of promotional activities on total sales.

# ✅ CONCLUSION

In conclusion, this project analyzed the impact of **promotions** on **sales** and developed **predictive models** to forecast demand for the last quarter of 2018. By leveraging machine learning algorithms such as **Gradient Boosting Machines (GBM)**, **Random Forest**, and **ElasticNet**, the analysis accurately estimated the **sales volume**, taking into account both the effects of **promotions** and **usual purchasing behavior**. The findings provided valuable insights into **sales fluctuations** caused by promotions, supporting the enhancement of the company’s **demand planning process**.

To further optimize the **predictive models**, several strategies can be adopted. **Feature engineering** can uncover additional relevant variables, while **cross-validation** ensures the models generalize effectively and reduces overfitting risks. Fine-tuning model **hyperparameters**, applying **ensembling techniques**, and implementing **data normalization** can significantly enhance model performance and robustness. By adopting these approaches, the company can achieve more precise **demand forecasts**, better preparation for **sales fluctuations**, and a more efficient **planning process** overall.