https://github.com/walidalsafadi/store-sales-ts-forecasting
Use machine learning to predict grocery sales
https://github.com/walidalsafadi/store-sales-ts-forecasting
data-visualization eda feature-engineering model-optimization time-series-forecasting xgboost
Last synced: 5 months ago
JSON representation
Use machine learning to predict grocery sales
- Host: GitHub
- URL: https://github.com/walidalsafadi/store-sales-ts-forecasting
- Owner: WalidAlsafadi
- License: apache-2.0
- Created: 2024-09-26T13:39:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-24T11:16:41.000Z (over 1 year ago)
- Last Synced: 2025-04-01T23:52:07.618Z (about 1 year ago)
- Topics: data-visualization, eda, feature-engineering, model-optimization, time-series-forecasting, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 833 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **Sales Forecasting Using Time Series Analysis with XGBoost**
## **Project Overview** ๐ฏ
This project aims to forecast sales for thousands of product families sold at Favorita stores using historical data. The analysis is performed using XGBoost, a powerful machine learning algorithm. Through feature engineering and model tuning, the project provides both short-term and long-term sales predictions, with a focus on generating a 30-day forecast.
## **Project Workflow** ๐
1. **Data Collection and Preprocessing**:
- Used historical sales data provided by Favorita stores.
- Processed the data to handle missing values and outliers.
- Performed feature engineering to create lag features and rolling statistics.
- Applied one-hot encoding for categorical variables and scaling for numerical variables.
2. **Exploratory Data Analysis (EDA)**:
- Visualized sales trends over time.
- Analyzed seasonal effects, promotions, and other external factors influencing sales.
3. **Feature Engineering**:
- Created lag features and rolling windows to capture historical patterns.
- Engineered calendar-related features such as day of the week, month, and promotions.
4. **Model Building**:
- Implemented **XGBoost** for time series forecasting.
- Fine-tuned the model using RMSE and RMSLE as evaluation metrics.
- Generated a 30-day sales forecast.
5. **Model Evaluation**:
- Used RMSE and RMSLE to assess the modelโs accuracy.
- Compared the modelโs predictions against actual sales data for validation.
## **Key Features** ๐ ๏ธ
- **XGBoost Model**: Applied an advanced machine learning approach for time series forecasting.
- **Feature Engineering**: Integrated lag and rolling statistics to enhance forecast accuracy.
- **Data Visualization**: Plots and charts to illustrate key trends and model performance.
- **30-Day Sales Forecast**: Focused on predicting future sales, with results analyzed for potential improvements.
## **Challenges & Improvements** ๐ง
- **Current Challenge**: The 30-day forecast results show a linear pattern, lacking the nuanced shape of actual sales data.
- **Future Work**: Exploring advanced techniques such as model stacking, ensemble methods, and deeper time-series models (e.g., ARIMA, LSTM) to improve forecast accuracy.
## **Results** ๐
- Achieved a RMSLE of 0.409.
- Significant improvement in prediction accuracy through feature engineering.
### **Check it out on Kaggle ๐**
You can also explore the project on Kaggle: https://www.kaggle.com/code/walidkw/store-sales-ts-forecasting-xgboost