https://github.com/nadirrezaou/demand-forecasting-with-random-forest
Forecasting product demand using Random Forest and sales data preprocessing.
https://github.com/nadirrezaou/demand-forecasting-with-random-forest
data-science demand-forecasting machine-learning python random-forest regression sales-prediction sklearn
Last synced: about 1 month ago
JSON representation
Forecasting product demand using Random Forest and sales data preprocessing.
- Host: GitHub
- URL: https://github.com/nadirrezaou/demand-forecasting-with-random-forest
- Owner: nadirrezaou
- Created: 2025-08-18T09:50:44.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-18T10:18:19.000Z (10 months ago)
- Last Synced: 2025-08-18T11:37:47.551Z (10 months ago)
- Topics: data-science, demand-forecasting, machine-learning, python, random-forest, regression, sales-prediction, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 1.95 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Demand Forecasting with Random Forest
Demand forecasting is the process of estimating **future customer demand** over a specific period by analyzing historical sales data and related features.
Traditionally, organizations use statistical forecasting methods such as **ARIMA, SARIMA, and Moving Averages**.
However, these methods often require significant domain expertise and manual tuning.
With the rise of **Machine Learning**, new approaches have emerged that can automatically learn patterns from data and provide more accurate forecasts.
---
## Table of Contents
- [Goal](#goal)
- [Data](#data)
- [Workflow](#workflow)
- [Result](#result)
- [Required Packages](#required-packages)
---
## Goal
The goal of this project is to explore the use of **machine learning models**, specifically the **Random Forest Regressor**, for predicting product demand.
Unlike traditional approaches, machine learning models can:
- Handle **large datasets**
- Capture **complex relationships**
- Process **categorical features** with minimal manual intervention
By building and tuning a Random Forest model, we aim to improve the accuracy of demand forecasts and reduce prediction errors.
---
## Data
The dataset consists of daily sales records with the following fields:
- `record_ID` – Unique record identifier
- `week` – Date
- `store_id` – Store identifier
- `sku_id` – Product identifier
- `total_price` – Final price after discounts
- `base_price` – Original price
- `is_featured_sku` – Whether the product was featured
- `is_display_sku` – Whether the product was displayed
- `units_sold` – **Target variable** (number of units sold)
---
## Workflow
### Data Preprocessing
- Split `week` column into `day`, `month`, `year`
- Handle missing values
- Remove outliers (top 1% sales)
- Drop irrelevant features (`record_ID`)
### Feature Engineering
- One-hot encode categorical variables (`store_id`, `sku_id`)
### Regression Modeling
- Split dataset into training and testing sets
- Train a **Random Forest Regressor**
- Evaluate performance with **R² score** and **RMSE**
### Hyperparameter Tuning
- Use **GridSearchCV** to optimize parameters:
- `n_estimators` (number of trees)
- `min_samples_split` (minimum samples per split)
### Visualization
- Plot **predicted vs actual sales**
- Explore feature distributions and sales patterns
---
## Result
- The model successfully predicts demand with a reasonable **R² score** and reduced **RMSE** compared to baseline.
- After **hyperparameter tuning**, the model achieves even better accuracy.
- Future improvements may include advanced models such as **XGBoost** or **Neural Networks**.
---
## Required Packages
```txt
numpy
pandas
scikit-learn
matplotlib