{"id":50373381,"url":"https://github.com/vardhin/predictive-casestudy","last_synced_at":"2026-05-30T08:04:22.036Z","repository":{"id":351791583,"uuid":"1212501857","full_name":"vardhin/predictive-casestudy","owner":"vardhin","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-16T13:06:12.000Z","size":9347,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-16T15:09:22.471Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vardhin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-16T12:49:35.000Z","updated_at":"2026-04-16T13:06:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/vardhin/predictive-casestudy","commit_stats":null,"previous_names":["vardhin/predictive-casestudy"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/vardhin/predictive-casestudy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vardhin%2Fpredictive-casestudy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vardhin%2Fpredictive-casestudy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vardhin%2Fpredictive-casestudy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vardhin%2Fpredictive-casestudy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vardhin","download_url":"https://codeload.github.com/vardhin/predictive-casestudy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vardhin%2Fpredictive-casestudy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33684419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-30T08:04:21.970Z","updated_at":"2026-05-30T08:04:22.024Z","avatar_url":"https://github.com/vardhin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Case Study 16 — Rossmann Store Sales Forecasting\n\n**22BDS0114 — GSVARDHIN**\n\nEnd-to-end retail-sales forecasting on the **Rossmann Store Sales** dataset\n(Kaggle): EDA, decomposition, stationarity testing, five forecasting models,\nwalk-forward evaluation, and an interactive Streamlit UI.\n\n---\n\n## Why this case study\n\nThe original AirPassengers (144 monthly points) is a textbook toy. Real retail\nforecasting is *messier* — multiple stores, daily granularity, promos,\nholidays, store-type heterogeneity, and zero-sales days when stores close.\nRossmann gives us all of that on **1,115 stores × 942 daily observations**\n(~1M rows), so we can compare classical and ML models on a realistic problem.\n\n---\n\n## Dataset\n\n| Detail | Value |\n|---|---|\n| Source | [Rossmann Store Sales — Kaggle](https://www.kaggle.com/c/rossmann-store-sales) |\n| Files used | `train.csv` (sales), `store.csv` (store metadata) |\n| Stores | 1,115 |\n| Daily observations | 1,017,209 rows |\n| Period | 2013-01-01 → 2015-07-31 |\n| Target | `Sales` (€) |\n| Exogenous | `Promo`, `SchoolHoliday`, `StateHoliday`, `Promo2`, `CompetitionDistance`, `StoreType`, `Assortment` |\n\n**The CSVs are not committed.** Download from Kaggle and place `train.csv`,\n`store.csv`, (optionally `test.csv`) in the project root.\n\n---\n\n## Architecture\n\n```\npredictive-casestudy/\n├── main.py              # CLI: runs full headless pipeline, saves PNG/CSV to outputs/\n├── app.py               # Streamlit multi-page UI\n├── src/\n│   ├── data.py          # Load, merge, feature engineering (cached as parquet)\n│   ├── eda.py           # Plotly figures + ADF stationarity test\n│   └── models.py        # Naive, Holt-Winters, SARIMA, Prophet, XGBoost + walk-forward CV\n├── pyproject.toml\n├── train.csv  store.csv  test.csv      # (gitignored — fetched from Kaggle)\n├── cache/                              # parquet cache of engineered data\n└── outputs/                            # generated charts + leaderboard CSVs\n```\n\n---\n\n## Models compared\n\n| Model | Type | Notes |\n|---|---|---|\n| **Naive (seasonal-7)** | Baseline | `y_t = y_{t-7}` — must be beaten |\n| **Holt-Winters** | Classical ETS | Triple exponential smoothing, weekly seasonality |\n| **SARIMA(1,1,1)(1,1,1,7)** | Classical | Seasonal ARIMA on daily data |\n| **Prophet** | Decomposable | Trend + weekly + yearly + Promo/School regressors |\n| **XGBoost** | ML | Lag (1/7/14/28) + rolling stats + calendar/promo features, recursive forecast |\n\nAll models share a common `fit(train) → predict(future_index)` interface in\n[src/models.py](src/models.py), so adding a new model is one class.\n\n### Metrics\nMAE · RMSE · MAPE · **RMSPE** (Rossmann competition's official metric).\nWalk-forward evaluation across `n` folds is available via\n`models.walk_forward_eval()`.\n\n---\n\n## How to run\n\n```bash\n# Install deps (uv handles venv automatically)\nuv sync\n\n# Headless analysis on the network-wide aggregate (writes to ./outputs/)\nuv run main.py\n\n# Or analyse a single store:\nuv run main.py 1 28        # store_id=1, horizon=28 days\n\n# Interactive Streamlit UI\nuv run streamlit run app.py\n```\n\nThe Streamlit app has six pages — Overview, EDA, Decomposition \u0026 Stationarity,\nForecasting, Model Comparison, Future Forecast — and a sidebar to switch\nbetween **Network-wide** and **Single-store** scope.\n\n---\n\n## Limitations\n\n- Holt-Winters \u0026 SARIMA assume regular sampling and can be slow on long daily\n  series — we cap CV folds and horizons accordingly.\n- XGBoost recursive forecasting compounds errors over long horizons; for \u003e30 days\n  it loses to Holt-Winters on the network-wide series.\n- Exogenous regressors (`Promo`, `SchoolHoliday`) need to be known in advance for\n  out-of-sample prediction; we substitute day-of-week medians for the future.\n- We do not model store closures during the Rossmann refurbishment in 2014; the\n  raw zeros are filtered out by `only_open=True` in the per-store path.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvardhin%2Fpredictive-casestudy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvardhin%2Fpredictive-casestudy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvardhin%2Fpredictive-casestudy/lists"}