{"id":26372711,"url":"https://github.com/bits-bytes-nn/mofc-demand-forecast","last_synced_at":"2025-03-17T01:18:58.661Z","repository":{"id":201268610,"uuid":"392550746","full_name":"bits-bytes-nn/mofc-demand-forecast","owner":"bits-bytes-nn","description":"Time Series Forecasting for the M5 Competition ","archived":false,"fork":false,"pushed_at":"2021-10-29T07:35:15.000Z","size":3231,"stargazers_count":41,"open_issues_count":0,"forks_count":10,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-19T12:28:21.385Z","etag":null,"topics":["altair","deepar","demand-forecasting","gluonts","hyperopt","kats","lightgbm","prophet","time-series-analysis","tsfresh","vector-autoregression"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bits-bytes-nn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-08-04T04:43:32.000Z","updated_at":"2024-09-26T08:52:30.000Z","dependencies_parsed_at":null,"dependency_job_id":"b6d957a6-7291-43ae-a0b4-549b9110a6c8","html_url":"https://github.com/bits-bytes-nn/mofc-demand-forecast","commit_stats":null,"previous_names":["aldente0630/mofc-demand-forecast","bits-bytes-nn/mofc-demand-forecast"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-bytes-nn%2Fmofc-demand-forecast","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-bytes-nn%2Fmofc-demand-forecast/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-bytes-nn%2Fmofc-demand-forecast/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-bytes-nn%2Fmofc-demand-forecast/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bits-bytes-nn","download_url":"https://codeload.github.com/bits-bytes-nn/mofc-demand-forecast/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243955783,"owners_count":20374373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["altair","deepar","demand-forecasting","gluonts","hyperopt","kats","lightgbm","prophet","time-series-analysis","tsfresh","vector-autoregression"],"created_at":"2025-03-17T01:18:58.044Z","updated_at":"2025-03-17T01:18:58.646Z","avatar_url":"https://github.com/bits-bytes-nn.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MOFC Demand Forecasting with Time Series Analysis \n### Goals\n* Compare the accuracy of various time series forecasting algorithms such as *Prophet*, *DeepAR*, *VAR*, *DeepVAR*, and *[LightGBM](https://papers.nips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf)*\n* (Optional) Use `tsfresh` for automated feature engineering of time series data.\n\n### Requirements\n* The dataset can be downloaded from [this Kaggle competition](https://www.kaggle.com/c/m5-forecasting-accuracy).\n* In addition to the [Anaconda](https://www.anaconda.com) libraries, you need to install `altair`, `vega_datasets`, `category_encoders`, `mxnet`, `gluonts`, `kats`, `lightgbm`, `hyperopt` and `pandarallel`.\n  * `kats` requires Python 3.7 or higher.\n\n## Competition, Datasets and Evaluation\n* [The M5 Competition](https://mofc.unic.ac.cy/m5-competition) aims to forecast daily sales for the next 28 days based on sales over the last 1,941 days for IDs of 30,490 items per Walmart store.\n* Data includes (i) time series of daily sales quantity by ID, (ii) sales prices, and (iii) holiday and event information.\n* Evaluation is done through *Weighted Root Mean Squared Scaled Error*. A detailed explanation is given in the M5 Participants Guide and the implementation is at [this link](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834).\n* For hyperparameter tuning, 0.1% of IDs were randomly selected and used, and 1% were used to measure test set performance.\n\n## Algorithms\n### Kats: Prophet\n* *Prophet* can incorporate forward-looking related time series into the model, so additional features were created with holiday and event information.\n* Since a *Prophet* model has to fit for each ID, I had to use the `apply` function of the `pandas dataframe` and instead used `pandarallel` to maximize the parallelization performance.\n* *Prophet* hyperparameters were tuned through 3-fold CV using the *Bayesian Optimization* module built into the `Kats` library. In this case, *[Tweedie](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_tweedie_deviance.html)* was applied as the loss function. Below is the hyperparameter tuning result.\n  \n|seasonality_prior_scale|changepoint_prior_scale|changepoint_range|n_changepoints|holidays_prior_scale|seasonality_mode|\n|:---:|:---:|:---:|:---:|:---:|:---:|\n|0.01|0.046|0.93|5|100.00|multiplicative|\n\n* In the figures below, the actual sales (black dots), the point predictions and confidence intervals (blue lines and bands), and the red dotted lines representing the test period are shown.\n  \n![Forecasting](./img/prophet.svg)\n\n### Kats: VAR\n* Since *VAR* is a multivariate time series model, the more IDs it fits simultaneously, the better the performance, and the memory requirement increases exponentially.\n  \n![Forecasting](./img/var.svg)\n\n### GluonTS: DeepAR\n* *DeepAR* can incorporate metadata and forward-looking related time series into the model, so additional features were created with sales prices, holiday and event information. Dynamic categorical variables were quantified through [Feature Hashing](https://alex.smola.org/papers/2009/Weinbergeretal09.pdf).\n* As a hyperparameter, it is very important to set the probability distribution of the output, and here it is set as the *Negative Binomial* distribution.\n\n![Forecasting](./img/deepar.svg)\n\n### GluonTS: DeepVAR\n* In the case of *DeepVAR*, a multivariate model, what can be set as the probability distribution of the output is limited (i.e. *Multivariate Gaussian* distribution), which leads to a decrease in performance.\n  \n![Forecasting](./img/deepvar.svg)\n\n### LightGBM\n* I used `tsfresh` to convert time series into structured data features, which consumes a lot of computational resources even with minimal settings.\n* A *LightGBM* *Tweedie* regression model  was fitted. Hyperparameters were tuned via 3-fold CV using the *Bayesian Optimization* function of the `hyperopt` library. The following is the hyperparameter tuning result.\n  \n|boosting|learning_rate|num_iterations|num_leaves|min_data_in_leaf|min_sum_hessian_in_leaf|bagging_fraction|bagging_freq|feature_fraction|extra_trees|lambda_l1|lambda_l2|path_smooth|max_bin|\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n|gbdt|0.01773|522|11|33|0.0008|0.5297|4|0.5407|False|2.9114|0.2127|217.3879|1023|\n  \n* The sales forecast for day D+1 was used recursively to predict the sales volume for day D+2 through feature engineering, and through this iterative process, 28-day test set performance was measured.\n\n![Forecasting](./img/lgb.svg)\n\n## Algorithms Performance Summary\n|Algorithm|WRMSSE|sMAPE|MAE|MASE|RMSE|\n|:---:|:---:|:---:|:---:|:---:|:---:|\n|DeepAR|0.7513|1.4200|0.8795|0.9269|1.1614|\n|LightGBM|1.0701|1.4429|0.8922|0.9394|1.1978|\n|Prophet|1.0820|1.4174|1.1014|1.0269|1.4410|\n|VAR|1.2876|2.3818|1.5545|1.6871|1.9502|\n|Naive Method|1.3430|1.5074|1.3730|1.1077|1.7440|\n|Mean Method|1.5984|1.4616|1.1997|1.0708|1.5352|\n|DeepVAR|4.6933|4.6847|1.9201|1.3683|2.3195|\n\nAs a result, *DeepAR* was finally selected and submitted its predictions to Kaggle, achieving a WRMSSE value of 0.8112 based on the private leaderboard.\n\n### References\n* [Taylor SJ, Letham B. 2017. Forecasting at scale. *PeerJ Preprints* 5:e3190v2](https://peerj.com/preprints/3190.pdf)\n* [Prophet: Forecasting at Scale](https://research.fb.com/blog/2017/02/prophet-forecasting-at-scale)\n* [Stock, James, H., Mark W. Watson. 2001. Vector Autoregressions. *Journal of Economic Perspectives*, 15 (4): 101-115.](https://www.princeton.edu/~mwatson/papers/Stock_Watson_JEP_2001.pdf)\n* [David Salinas, Valentin Flunkert, Jan Gasthaus, Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks, *International Journal of Forecasting*, 36 (3): 1181-1191.](https://arxiv.org/pdf/1704.04110.pdf)\n* [David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico,\nJan Gasthaus. 2019. High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes. *In Advances in Neural Information Processing Systems*. 6827–6837.](https://arxiv.org/pdf/1910.03002.pdf)\n* [Kats - One Stop Shop for Time Series Analysis in Python](https://facebookresearch.github.io/Kats/)\n* [GluonTS - Probabilistic Time Series Modeling](https://ts.gluon.ai/index.html)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbits-bytes-nn%2Fmofc-demand-forecast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbits-bytes-nn%2Fmofc-demand-forecast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbits-bytes-nn%2Fmofc-demand-forecast/lists"}