{"id":17260047,"url":"https://github.com/paulescu/incremental-ml-training-and-serving","last_synced_at":"2025-03-27T00:13:03.289Z","repository":{"id":253474160,"uuid":"843604627","full_name":"Paulescu/incremental-ml-training-and-serving","owner":"Paulescu","description":"Incremental ML learning in the real-world","archived":false,"fork":false,"pushed_at":"2025-03-03T20:23:51.000Z","size":252,"stargazers_count":64,"open_issues_count":1,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-17T12:09:06.676Z","etag":null,"topics":["ml","online-ml","python","redpanda","stream"],"latest_commit_sha":null,"homepage":"https://www.realworldml.net/courses","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Paulescu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-16T22:31:37.000Z","updated_at":"2025-03-16T14:02:33.000Z","dependencies_parsed_at":"2025-01-09T16:53:51.528Z","dependency_job_id":"37cdff02-15fd-44cf-888c-8da0d439361d","html_url":"https://github.com/Paulescu/incremental-ml-training-and-serving","commit_stats":{"total_commits":2,"total_committers":1,"mean_commits":2.0,"dds":0.0,"last_synced_commit":"e30320708695d60b0e3e8c04be569b6fc30dc260"},"previous_names":["paulescu/incremental-ml-training-and-serving"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulescu%2Fincremental-ml-training-and-serving","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulescu%2Fincremental-ml-training-and-serving/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulescu%2Fincremental-ml-training-and-serving/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Paulescu%2Fincremental-ml-training-and-serving/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Paulescu","download_url":"https://codeload.github.com/Paulescu/incremental-ml-training-and-serving/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245755678,"owners_count":20667027,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ml","online-ml","python","redpanda","stream"],"created_at":"2024-10-15T07:47:03.400Z","updated_at":"2025-03-27T00:13:03.267Z","avatar_url":"https://github.com/Paulescu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003eLet's design a real-time ML system with incremental re-training ⚡\u003c/h1\u003e\n\u003c/div\u003e\n\n#### Table of contents\n* [The problem](#the-problem)\n* [Solution](#solution)\n* [Run the whole thing in 5 minutes](#run-the-whole-thing-in-5-minutes)\n* [Wanna learn more real-world ML?](#wanna-learn-more-real-world-ml)\n\n\n## The problem\n\nML models are pattern finding machines, that try to capture the relationship between\n\n- a set of inputs available at prediction time (aka features), and\n- a metric you want to predict (aka target)\n\nFor most real-world problems these patterns between the features and the target are not static, but change over time. So, if you don’t re-train your ML models, their accuracy degrades over time. This is commonly known as concept drift.\n\nNow, the speed at which patterns change, and you model degrades, depends on the particular phenomena you are modelling.\n\n\u003e **For example 💁**  \n\u003e If you are trying to predict rainfall, re-training your ML model daily is good enough. Rainfall patterns obey the laws of physics, and these do not change too much from one day to the next. \n\nOn the other hand, if you are trying to predict short-term crypto prices, where patterns between\n\n- available market data (aka features), and\n- future asset prices (aka target)\n\nare short-lived, you must re-train your ML model very frequently. Ideally, in real-time.\n\nA similar situation happens when you want to build a real-time recommender system, like [Tiktok’s famous monolith](https://arxiv.org/pdf/2209.07663), where user preferences change in the blink of an eye, and your ML models needs to be refreshed as often as possible.\n\nSo now the question is\n\n\u003e How do you design an ML system that continuously re-trains the ML model that is serving the predictions ❓\n\nIn this repo you can find a source code implementation.\n\n\n## Run the whole thing in 5 minutes\n\n1. Install all project dependencies inside an isolated virtual env, using Python Poetry\n    ```\n    $ make install\n    ```\n\n2. Start the feature pipelines with\n    ```\n    $ make producers\n    ```\n\n3. Start the training pipeline with\n    ```\n    $ make training\n    ```\n\n4. Start the inference pipeline\n    ```\n    $ make predict\n    ```\n\n## Wanna learn more real-world ML?\n\nJoin more than 18k builders to the **Real-World ML Newsletter**.\n\nEvery Saturday morning.\n\nFor **FREE**\n\n### [👉🏽 Subscribe for FREE](https://paulabartabajo.substack.com/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulescu%2Fincremental-ml-training-and-serving","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaulescu%2Fincremental-ml-training-and-serving","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulescu%2Fincremental-ml-training-and-serving/lists"}