{"id":31831263,"url":"https://github.com/raphaelsty/m5-forecasting-accuracy","last_synced_at":"2025-10-11T21:48:55.703Z","repository":{"id":43047621,"uuid":"251434093","full_name":"raphaelsty/M5-Forecasting-Accuracy","owner":"raphaelsty","description":"Deploying machine learning easily","archived":false,"fork":false,"pushed_at":"2023-05-01T21:23:49.000Z","size":1027,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-01T15:27:50.781Z","etag":null,"topics":["deployment","digitalocean","flask","kaggle","kaggle-solution","machine-learning","online-learning"],"latest_commit_sha":null,"homepage":"https://raphaelsty.github.io","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raphaelsty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-30T21:37:31.000Z","updated_at":"2023-01-21T13:29:02.000Z","dependencies_parsed_at":"2022-09-18T12:02:48.889Z","dependency_job_id":null,"html_url":"https://github.com/raphaelsty/M5-Forecasting-Accuracy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/raphaelsty/M5-Forecasting-Accuracy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2FM5-Forecasting-Accuracy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2FM5-Forecasting-Accuracy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2FM5-Forecasting-Accuracy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2FM5-Forecasting-Accuracy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raphaelsty","download_url":"https://codeload.github.com/raphaelsty/M5-Forecasting-Accuracy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raphaelsty%2FM5-Forecasting-Accuracy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279008824,"owners_count":26084518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deployment","digitalocean","flask","kaggle","kaggle-solution","machine-learning","online-learning"],"created_at":"2025-10-11T21:48:51.986Z","updated_at":"2025-10-11T21:48:55.686Z","avatar_url":"https://github.com/raphaelsty.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"**Hi,**\n\nDeploying and maintaining machine learning algorithms in a **production** environment is not an easy task. The **drift** of data over the time tends to degrade the performance of the algorithms because the models are static. Data Scientist **re-train models from scratch** to update them. This task is tedious and monopolizes highly qualified human resources. \n\n**I would like to present a solution to these problems**. I will use online learning and the open-source **[Creme](https://github.com/creme-ml/creme)** library (I am a core developer of Creme) to overcome the difficulties of deploying a machine learning model in production. \n\nI will illustrate my point with data from the **[M5-Forecasting-Accuracy kaggle competition](https://www.kaggle.com/c/m5-forecasting-accuracy/)** which is well suited to the use case of Creme. **The objective of the M5-Forecasting-Accuracy competition is to estimate the daily sales of 30490 products for the next 28 days.**\n\nMy goal is not to develop a competitive model, but to show the simplicity of an online learning model for an event-based dataset such as M5-Forecasting-Accuracy.\n\nFirst of all, I would like to share with you the deployment process I follow to deploy a machine learning algorithm such as LightGBM or scikit-learn models for a task similar to the M5-Forecasting-Accuracy competition.\n\nI will then compare the deployment of batch learning algorithms to the deployment of online learning algorithms. To do so, I will use the Creme and Chantilly libraries. I'll walk you through the entire process and deploy my [API](http://159.89.43.61:8080) to predict the targets of the Kaggle competition M5-Forecasting-Accuracy. \n\n[Max Halford](https://maxhalford.github.io) is the main developer of Creme and he's the one who initiated the project, he did a blog post **[here](https://towardsdatascience.com/machine-learning-for-streaming-data-with-creme-dacf5fb469df)**. This is a good introduction to the philosophy of online learning and especially Creme philosophy. Feel free to have a look at it if you are interested in the subject. \n\n![](static/creme.svg)\n\n### Model deployment when fitting data in one row:\n\nDeploying a model that learns by batch requires a well-oiled organization. I describe here the process I followed to deploy this kind of algorithm in production. **I would like to point out that we all had different experiences with deploying algorithms in production.** You may not agree with all of the points I'm making.\n\n**I distinguish two main steps in the organization of the project when deploying a machine learning algorithm in production:**\n\n- The **prototyping phase** phase is dedicated to the selection of the algorithm and the selection of the features to solve the problem.\n\n\n- The **engineering phase** phase is dedicated to the creation of robust machine learning systems. It aims at deploying the model in production, allows re-training the model on a regular basis.\n\n\n#### Prototyping:\n\nThe first thing to do during the prototyping phase phase is to define a method for evaluating the quality of the model. **Which objective do you want to optimize?** Then you have to define a validation process. Usually this is cross-validation. After defining the validation process, the whole point is to find the most suitable model with carefully selected hyperparameters. Without forgetting the feature engineering stage, which is the key to most problems. \n\nThe prototyping step is difficult and exciting. We rely on our expertise in the field concerned, our creativity and our scientific culture.\n\n#### Engineering:\n\nIt seems interesting to me to choose to deploy the product sales prediction algorithm behind an API. The API is a practical solution to allow the largest number of users and softwares to query the trained model. \n\nDuring the engineering phase I distinguish two modules. The first one is dedicated to the training of the model and its serialization. I call the first module **Offline**. The second one is dedicated to the behavior of the model in the production environment. I call this second module **Online**. I call it online because I host this second module in the cloud.\n\nThere is a lot of engineering work to ensure consistency between the offline training part and the online inference part. Any transformations that have been applied to the data during training must be applied to the data during the inference phase. This requires the development of code that is different from the training phase, but which produces the same results.\n\n\nThe development phase should lead to the creation of different sub-category modules:\n\n**Offline:**\n\n- **module 1:** Script dedicated to the calculation of features for the model training. The feature computation should be vectorized to speed up the process.\n\n\n- **module 2:** Script for training, and evaluating the model. The training of the model is based on the features computed by the module 1.\n\n\n- **module 3:** Script dedicated to the serialization of the model. It is important to redefine the model prediction method before serializing the model. Libraries like Scikit-Learn do not develop models so that they can quickly make predictions for a single observation. You can find more information [here](https://maxhalford.github.io/blog/speeding-up-sklearn-single-predictions/). The [sklearn-onnx](https://github.com/onnx/sklearn-onnx) library is an interesting solution to this problem. I already used [treelite](https://github.com/dmlc/treelite) and this is a suitable alternative for LightGBM.\n\n![](static/offline.png)\n\n**Online:**\n\n- **module 4**: Script for calculating features for the production environment. Usually the predictions in the production environment are made via the call of an API. The feature calculation should not be vectorized because it is performed for a single observation when calling the API. As a result, the source code of module 4 differs from the source code of module 1.\n\n**Deployment:**\n\n- **Module 5**: API definition. When a call is made, the API must load the serialized model, calculate the features using module 4 and make a prediction. The model could also be loaded into memory at API startup.\n\n![](static/online.png)\n\n**Tests:**\n\n- It is strongly recommended to integrate multiple unit tests such as unit tests for offline feature computation and unit test for online feature calculation. Non regression test to ensure that the offline model produces the same results as the online model are necessary too.\n\n**After deploying an algorithm in production, you will need to re-train the model regularly and maintain the architecture. Deploying a learning machine algorithm that learns by batch is tedious. It's a long-term project that requires a lot of rigor. Such a project represents a significant technological debt and monopolizes highly qualified human resources on a daily basis.**\n\n### Model deployment with Creme and Chantilly\n\nCreme is an online learning library. Creme allows to train machine learning models on data flows. \n\nEach Creme model has a ``fit_one`` method. **The ``fit_one`` method allows to update the model when there is a new observation available** for training. Similar to neural networks, there is no need to re-train the model from scratch when new observations come in.\n\nCreme is not a suitable solution for Kaggle. Learning in batch allows the model to converge faster. **I won't choose Creme to get a medal on Kaggle. However, in everyday life, Creme is a viable and flexible solution for modeling a complex problem**.\n\nI am going to make a tutorial to show how to deploy in production a Creme algorithm trained to predict the target of the M5-Forecasting-Accuracy competition. I'll use the library [Chantilly](https://github.com/creme-ml/chantilly) to deploy my solution in production. Chantilly is a library under development that allows you to easily deploy the models from Creme in production.\n\n**Here is what the data from the M5-Forecasting-Accuracy kaggle competition looks like after some pre-processing:**\n\n| id \t| date \t| y \t|\n|:------------------:\t|:----------:\t|:-:\t|:-:\t|:-:\t|\n| HOBBIES\\_1\\_001\\_CA\\_1 \t| 2016-04-25 \t|     1     \t| \n| HOBBIES\\_1\\_001\\_CA\\_2 \t| 2016-04-25 \t| 0 \t| \n| HOBBIES\\_1\\_001_CA\\_3 \t| 2016-04-25 \t|               3               \t|  \t\n\nThe field ``id`` is composed of the product identifier ``HOBBIES_1_001`` and the store identifier ``CA_1``. The variable to be predicted is the variable ``y``. My API will use the fields ``id`` and ``date`` to make predictions.\n\n#### Prototyping\n\nAs usual, during the prototyping phase, I define the validation process and the measures used to evaluate the quality of the models I develop. Online learning allows to do **progressive validation** which is the online counterpart of cross-validation. The progressive validation allows to take into account the temporality of the problem. For reasons of simplicity, I choose to use the MAE metric to evaluate the quality of my model.\n\nAfter a few tries on my side, **I choose to train a ``KNNRegressor``  and a ``LinearRegression`` per product and per store** to predict the number of sales. It represents **30490 * 2 models** models. I will choose the best of the two models for each of the products thanks to the validation score. \n\n#### Engineering\n\nInstall creme:\n\n\n```bash\npip install creme\n```\n\nI'm importing the packages that I need to train my models\n\n```python\nimport copy\nimport collections \nimport datetime\nimport random\nimport tqdm\n```\n\n```python\nfrom creme import compose\nfrom creme import feature_extraction\nfrom creme import linear_model\nfrom creme import metrics\nfrom creme import neighbors\nfrom creme import optim\nfrom creme import preprocessing\nfrom creme import stats\nfrom creme import stream\n```\n\nI use this first function to parse the date and extract the number of the day.\n\n```python\ndef extract_date(x):\n    \"\"\"Extract features from the date.\"\"\"\n    import datetime\n    if not isinstance(x['date'], datetime.datetime):\n        x['date'] = datetime.datetime.strptime(x['date'], '%Y-%m-%d')\n    x['wday'] = x['date'].weekday()\n    return x\n```\n\n``get_metadata`` allows you to extract the identifier of the product and the store where the product is sold.\n\n```python\ndef get_metadata(x):\n    key = x['id'].split('_')\n    x['item_id'] = f'{key[0]}_{key[1]}_{key[2]}'\n    x['store_id'] = f'{key[3]}_{key[4]}'\n    return x\n```\n\nBelow I define the feature extraction pipeline. I use the module ``feature_extraction.TargetAgg`` to calculate the features on the target variable. I calculate many rolling averages with various window sizes. I use different aggregates to calculate these rolling averages.  \n\n\n```python\nextract_features = compose.TransformerUnion(\n    \n    compose.Select('wday'),\n    \n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(1)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(2)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(3)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(4)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(5)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(6)),\n    feature_extraction.TargetAgg(by=['item_id'], how=stats.RollingMean(7)),\n\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(1)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(2)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(3)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(4)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(5)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(6)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(7)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(8)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(9)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(10)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(11)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(12)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(13)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(14)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(15)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(16)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(17)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(18)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(19)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(20)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(25)),\n    feature_extraction.TargetAgg(by=['wday'], how=stats.RollingMean(30)),\n)\n```\n\nI will train two models per product and per store, which represents **30490 * 2 models**. The first model is a ``KNeighborsRegressor``. The second is a linear model. I noticed that these two models are complementary. I will select the best of the two models for each product as the model I will deploy in production.\n\nThe code below allows me to declare my two pipelines, the first one dedicated to KNN and the second one to the linear model.\n\n```python\n# Init pipeline dedicated to KNN\nknn = (\n    compose.FuncTransformer(get_metadata) |\n    compose.FuncTransformer(extract_date) |\n    extract_features |\n    preprocessing.StandardScaler() |\n    neighbors.KNeighborsRegressor(window_size=30, n_neighbors=15, p=2)\n)\n\n\n# Init pipeline dedicated to linear model\nlm = (\n    compose.FuncTransformer(get_metadata) |\n    compose.FuncTransformer(extract_date) |\n    extract_features |\n    preprocessing.MinMaxScaler() |\n    linear_model.LinearRegression(optimizer=optim.SGD(0.005), intercept_lr=0.001)\n)\n```\n\nThe piece of code below creates a copy of both pipelines for all products in a dictionary.\n\n```python\nlist_model = []\n\nX_y = stream.iter_csv('./data/sample_submission.csv', target_name='F8')\n\nfor x, y in tqdm.tqdm(X_y, position=0):\n    \n    item_id = '_'.join(x['id'].split('_')[:3])\n\n    if item_id not in list_model:\n\n        list_model.append(item_id)\n        \ndict_knn = {item_id: copy.deepcopy(knn) for item_id in tqdm.tqdm(list_model, position=0)}\n\ndict_lm  = {item_id: copy.deepcopy(lm) for item_id in tqdm.tqdm(list_model, position=0)}\n```\n\nI do a warm-up of all the models from a subset of the training set. To do this pre-training, I selected the last two months of the training set and saved it in csv format. I use Creme's ``stream.iter_csv`` module to iterate on the training dataset. The pipeline below consumes very little RAM memory because we load the data into the memory one after the other.\n\n```python\nrandom.seed(42)\n\nparams = dict(\n    target_name = 'y',\n    converters  = {'y': int, 'id': str}, \n    parse_dates = {'date': '%Y-%m-%d'}\n)\n\n# Init streaming csv reader\nX_y = stream.iter_csv('./data/train.csv', **params)\n\nbar = tqdm.tqdm(X_y, position = 0)\n\n# Init online metrics:\nmetric_knn = collections.defaultdict(lambda: metrics.MAE())\n\nmetric_lm  = collections.defaultdict(lambda: metrics.MAE())\n\nmae = metrics.MAE()\n\nfor i, (x, y) in enumerate(bar):\n    \n    # Extract item id\n    item_id  = '_'.join(x['id'].split('_')[:3])\n    \n    # KNN\n    \n    # Evaluate performance of KNN\n    y_pred_knn = dict_knn[f'{item_id}'].predict_one(x)\n    \n    # Update metric of KNN\n    metric_knn[f'{item_id}'].update(y, y_pred_knn)\n    \n    # Fit KNN\n    dict_knn[f'{item_id}'].fit_one(x=x, y=y)\n    \n    # Linear Model\n    \n    # Evaluate performance of linear model\n    y_pred_lm  = dict_lm[f'{item_id}'].predict_one(x)\n    \n    # Update metric of linear model\n    metric_lm[f'{item_id}'].update(y, y_pred_lm)\n    \n    # Store MAE of the linear model during training\n    mae.update(y, y_pred_lm)\n    \n    dict_lm[f'{item_id}'].fit_one(x=x, y=y)\n        \n    if i % 300 == 0:\n        \n        bar.set_description(f'MAE, Linear Model: {mae.get():4f}')\n```\n\nI select the best model among the knn and the linear model for the 30490 products and save my models:\n\n```python\nmodels = {}\n\nfor item_id in tqdm.tqdm(scores_knn.keys()):\n    \n    score_knn = scores_knn[item_id]\n    \n    score_lm  = scores_lm[item_id]\n    \n    if score_knn \u003c score_lm:\n        models[item_id] = dict_knn[item_id]\n        \n    else:\n        models[item_id] = dict_lm[item_id]\n```\n\nSave selected models:\n\n```python\nimport dill\n\nwith open('models.dill', 'wb') as file:\n    dill.dump(models, file)\n```\n\n#### Deployment of the model:\n\n**Now that all the models are pre-trained, I will be able to deploy the pipelines behind an API in a production environment. I will use the [Chantilly](https://github.com/creme-ml/chantilly) library to do so.**\n\n**[Chantilly](https://github.com/creme-ml/chantilly) is a project that aims to ease train Creme models when they are deployed. Chantilly is a minimalist API based on the Flask framework.** Chantilly allows to make predictions, train models and measure model performance in real time. It gives access to a dashboard.\n\n\nI chose to deploy my API with [Digital Ocean](https://www.digitalocean.com). To deploy my API, I followed the following steps:\n\n\n- I selected the server on Digital Ocean with the smallest configuration\n\n\n- Tutorial to initialize my server and firewall [here](https://www.digitalocean.com/community/tutorials/initial-server-setup-with-ubuntu-16-04)\n\n\n- Tutorial to install Anaconda on my server [here](https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-16-04)\n\n- I create the environment dedicated to python ``conda create --name kaggle python=3.7.1``\n\n\n- Updating the package list using the following command ``sudo apt update``\n\n\n- Installation of git ``sudo apt install git``\n\n\n- Clone my git which contains folders dedicated to the dashboard of chantilly (static and templates folders) ``git clone https://github.com/raphaelsty/M5-Forecasting-Accuracy.git``\n\n\n- Install Chantilly ``pip install chantilly``\n\n\n- Install Waitress to start the API on my serveur: ``pip install waitress``\n\n\n- Allow reading on port 8080 to be able to request my API ``sudo ufw allow 8080``\n\n\n- I went to the repository M5-Forecasting-Accuracy I cloned and ran the following command to start my API with 30 threads:\n``waitress-serve --threads 30 --call 'chantilly:create_app'``.\n\n\nThat's it.\n\nI initialize my API with flavor regression (see Chantilly tutorial):\n\n```python\nimport requests\nurl = 'http://159.89.43.61:8080'\n```\n\n```python\nrequests.post(f'{url}/api/init', json= {'flavor': 'regression'})\n```\n\nAfter initializing the flavor of my API, I upload all the models I've pre-trained. Each model has a name. This name is the name of the product. I have used dill to serialize the model before uploading it to my API.\n\n```python\nfor model_name, model in tqdm.tqdm(models.items(), position=0):\n    r = requests.post(f'{url}/api/model/{model_name}', data=dill.dumps(model))\n```\n\nAll the models are now deployed in production and available to make predictions. The models can also be updated on a daily basis. That's it.\n\n![](static/online_learning.png)\n\n**As you may have noticed, the philosophy of online learning allows to reduce the complexity of the deployment of a machine learning algorithm in production. Moreover, to update the model, we only have to make calls to the API. We don't need to re-train the model from scratch.**\n\nTo maintain my models on a daily basis, I recommend setting up a script that queries the database that stores the sales made according to the day. This script would perform 30490 queries every day to update all the models.\n\n#### Make a prediction by calling the API:\n\n```python\nr = requests.post(f'{url}/api/predict', json={\n    'id': 1,\n    'model': 'HOBBIES_1_001_CA_1',\n    'features': {'date': '2016-05-23', 'id': 'HOBBIES_1_001_CA_1'}\n})\n```\n\n#### Update models with new data:\n\n```python\nr = requests.post(f'{url}/api/learn', json={\n    'id': 1,\n    'model': 'HOBBIES_1_001_CA_1',\n    'ground_truth': 1,\n})\n```\n\n#### Chantilly dashboard\n\nYou can consult my dashboard [here](http://159.89.43.61:8080) which is updated in real time. Chantilly allows me to visualize the performance of my models in live when sending new data.\n\n![](static/dashboard.png)\n\nFeel free to visit the [Chantilly](https://github.com/creme-ml/chantilly) github for more details on the API features.\n\n\n#### Kaggle\n\nThe M5-Forecasting-Accuracy kaggle competition uses the weighted root mean squared scaled error (WRMSSE) to measure model performance. My models gave me a score of ``0.88113``. The maintainability and interpretability of my solution takes precedence over its competitiveness.\n\n--\n\nThank you for reading me. \n\nRaphaël Sourty.\n\nraphael.sourty@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphaelsty%2Fm5-forecasting-accuracy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraphaelsty%2Fm5-forecasting-accuracy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraphaelsty%2Fm5-forecasting-accuracy/lists"}