https://github.com/peeti-sriwongsanguan/mlops-timeseries-walmart
An end-to-end machine learning (mlops) project using walmart dataset
https://github.com/peeti-sriwongsanguan/mlops-timeseries-walmart
docker mac macine-learning mlops-workflow silicon time-series-analysis walmart
Last synced: about 1 month ago
JSON representation
An end-to-end machine learning (mlops) project using walmart dataset
- Host: GitHub
- URL: https://github.com/peeti-sriwongsanguan/mlops-timeseries-walmart
- Owner: peeti-sriwongsanguan
- Created: 2024-08-25T17:12:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-08-27T02:30:14.000Z (8 months ago)
- Last Synced: 2025-02-06T01:28:58.897Z (3 months ago)
- Topics: docker, mac, macine-learning, mlops-workflow, silicon, time-series-analysis, walmart
- Language: Python
- Homepage:
- Size: 850 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# End-to-End MLOPS Workflows
## Walmart Sales Forecasting: XGBoost vs CNN
This is a simple end-to-end mlops project which takes data from [Walmart stores](https://www.kaggle.com/datasets/ujjwalchowdhury/walmartcleaned) (special thanks to Ujjwal Chowdhury for the cleaned dataset) and transforms it with machine learning pipelines from training, model tracking and experimenting with Docker. For the models I compare XGBoost and CNN models for time series forecast.For the dataset, it contains economic conditions like the Consumer Price Index (CPI), unemployment rate (Unemployment Index, etc). In this project, I want to try a new package called Optuna. Optuna is an open-source hyperparameter optimization framework developed by Preferred Networks, Inc. It provides a flexible and efficient platform for optimizing machine learning model hyperparameters, allowing users to find the best set of hyperparameters for their models automatically.
## Key findings
- XGBoost performs best overall, with the most accurate and consistent predictions
- Time-based features (like rolling means) and department are crucial for predictions.
- Some features at the bottom (e.g., MarkDown2, DayOfWeek) have very little impact on predictions.## Now let's dive into the analysis
### Environment
- Machine: MacBook with M3 Pro chip (MPS device used for GPU acceleration)
- Python version: 3.9## Project Structure
```
walmart-sales-forecast/
│
├── .github/
│ └── workflows/
│ └── ci.yml
│
├── src/
│ ├── __init__.py
│ ├── data_preprocessing.py
│ ├── model.py
│ ├── plot_image.py
│ └── utils.py
│
├── image/
├── data/
│
├── tests/
│ └── test_data_preprocessing.py
│
├── .gitignore
├── conftest.py
├── main.py
├── Dockerfile
├── Pipfile
├── Pipfile.lock
├── setup.cfg
└── README.md
```### Setup
#### 1. Clone the repository:
```
git clone https://github.com/peeti-sriwongsanguan/mlops-timeseries-walmart.gitcd walmart-sales-forecast
```#### 2. Install Pipenv if you haven't already:
```
pip install pipenv
```#### 3. Install dependencies and create a virtual environment:
```
pipenv install --dev
```#### 4. Activate the virtual environment:
```
pipenv shell
```#### 5. Make sure the Walmart dataset and place it in the data folder
### Docker
#### To run the project in a Docker container:1. Build the Docker image:
```
docker build -t walmart-sales-forecast .
```
2.1. Run the container that also create and save image in the image folder
This command mounts the image directory from your current working directory to /app/image in the container.
```
docker run -v $(pwd)/image:/app/image walmart-sales-forecast
```
2.2. Otherwise, use this command to run the container:
```
docker run walmart-sales-forecast
```