{"id":21514920,"url":"https://github.com/getindata/streaming-ml-with-flink","last_synced_at":"2025-03-17T16:15:14.685Z","repository":{"id":137896263,"uuid":"472829295","full_name":"getindata/streaming-ml-with-flink","owner":"getindata","description":"Demo of running SciKit model on Flink, using Mleap serialization ","archived":false,"fork":false,"pushed_at":"2022-03-22T15:35:01.000Z","size":28,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-01-24T02:31:03.203Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getindata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-22T15:29:18.000Z","updated_at":"2023-10-13T10:46:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"c7f4c4f1-9b28-4c05-ba4d-edc29810dc02","html_url":"https://github.com/getindata/streaming-ml-with-flink","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fstreaming-ml-with-flink","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fstreaming-ml-with-flink/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fstreaming-ml-with-flink/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getindata%2Fstreaming-ml-with-flink/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getindata","download_url":"https://codeload.github.com/getindata/streaming-ml-with-flink/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244066192,"owners_count":20392407,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T23:53:35.415Z","updated_at":"2025-03-17T16:15:14.679Z","avatar_url":"https://github.com/getindata.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flink Mleap\n\nExample how to serve ML model in Flink using Mleap\n\n## Getting started\n\nBefore serving ML model you need to create one and export it using MLeap. \nExample how to do it https://getindata.com/blog/online-ml-model-serving-using-mleap/\n\nIn `./example//src/main/resources` you can find two random forest models:\n\n- `mleap-example-1`\n- `mleap-example-2`\n\nThose models are taking **one double as input** and **result is one double**.\n\n## Bundle loader\nWe have exported model inside files, so now we need a way to load them to Flink.\nTo do that, we will use: `com.getindata.mleap.BundleLoader` and its `FileBundleLoader` implementation inside `lib` module.\nThis allows us to load ML model stored locally (it is possible to prepare loader from any cloud storage).\nWe provide also `GCSBundleLoader` implementation, you can load your bundle from Cloud Storage.\n\n## Streaming API\nFirstly we will use MLeap with Streaming API in package: `com.getindata.datastream`,  \nIt is done in `FlinkDatastreamWithMleap` app in `example` module. To present it, stream of three random numbers was created, and \nit is processed using `MleapProcessFunction` (from `lib` module), which simply makes prediction. Making MLeap prediction is\ndone in `MleapMapFunction.map` method. \n\n## SQL\nUsing MLeap with SQL API is presented in package `com.getindata.sql` inside `example` module. \nApp `FlinkSqlWithMleap` contains e2e example how to use it. To test it `datagen` table is created with random features.\nTo make prediction UDF is implemented: `MLeapUDF` (in `lib` module) and  MLeap is used here in `eval` method. We make `MleapUDF` very generic. \nIts input arguments and output is casted to proper types based on MLeap model schema. Basically this one UDF allows using different \nmodels inside your job. To make registrations process of UDFs easier - we implemented MLeapUDFRegistry, which can register UDFs basing\non configuration file.\nFor test purposes we register two UDFs:\n\n- Predict\n- Predictv2\n\nTo show it is working, simple query is executed: \n`SELECT Predict(feature1) as prediction, Predictv2(feature1) as prediction2 FROM Features`","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fstreaming-ml-with-flink","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetindata%2Fstreaming-ml-with-flink","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetindata%2Fstreaming-ml-with-flink/lists"}