Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/walidalsafadi/store-sales-ts-forecasting
Use machine learning to predict grocery sales
https://github.com/walidalsafadi/store-sales-ts-forecasting
data-visualization eda feature-engineering model-optimization time-series-forecasting xgboost
Last synced: 28 days ago
JSON representation
Use machine learning to predict grocery sales
- Host: GitHub
- URL: https://github.com/walidalsafadi/store-sales-ts-forecasting
- Owner: WalidAlsafadi
- License: apache-2.0
- Created: 2024-09-26T13:39:03.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T11:16:41.000Z (4 months ago)
- Last Synced: 2024-12-04T14:31:10.656Z (3 months ago)
- Topics: data-visualization, eda, feature-engineering, model-optimization, time-series-forecasting, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 833 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# **Sales Forecasting Using Time Series Analysis with XGBoost**
## **Project Overview** π―
This project aims to forecast sales for thousands of product families sold at Favorita stores using historical data. The analysis is performed using XGBoost, a powerful machine learning algorithm. Through feature engineering and model tuning, the project provides both short-term and long-term sales predictions, with a focus on generating a 30-day forecast.## **Project Workflow** π
1. **Data Collection and Preprocessing**:
- Used historical sales data provided by Favorita stores.
- Processed the data to handle missing values and outliers.
- Performed feature engineering to create lag features and rolling statistics.
- Applied one-hot encoding for categorical variables and scaling for numerical variables.2. **Exploratory Data Analysis (EDA)**:
- Visualized sales trends over time.
- Analyzed seasonal effects, promotions, and other external factors influencing sales.3. **Feature Engineering**:
- Created lag features and rolling windows to capture historical patterns.
- Engineered calendar-related features such as day of the week, month, and promotions.4. **Model Building**:
- Implemented **XGBoost** for time series forecasting.
- Fine-tuned the model using RMSE and RMSLE as evaluation metrics.
- Generated a 30-day sales forecast.5. **Model Evaluation**:
- Used RMSE and RMSLE to assess the modelβs accuracy.
- Compared the modelβs predictions against actual sales data for validation.## **Key Features** π οΈ
- **XGBoost Model**: Applied an advanced machine learning approach for time series forecasting.
- **Feature Engineering**: Integrated lag and rolling statistics to enhance forecast accuracy.
- **Data Visualization**: Plots and charts to illustrate key trends and model performance.
- **30-Day Sales Forecast**: Focused on predicting future sales, with results analyzed for potential improvements.## **Challenges & Improvements** π§
- **Current Challenge**: The 30-day forecast results show a linear pattern, lacking the nuanced shape of actual sales data.
- **Future Work**: Exploring advanced techniques such as model stacking, ensemble methods, and deeper time-series models (e.g., ARIMA, LSTM) to improve forecast accuracy.## **Results** π
- Achieved a RMSLE of 0.409.
- Significant improvement in prediction accuracy through feature engineering.### **Check it out on Kaggle π**
You can also explore the project on Kaggle: https://www.kaggle.com/code/walidkw/store-sales-ts-forecasting-xgboost