{"id":18941535,"url":"https://github.com/mchmarny/automodel","last_synced_at":"2026-05-01T02:34:48.488Z","repository":{"id":77051239,"uuid":"186165096","full_name":"mchmarny/automodel","owner":"mchmarny","description":"BigQuery automatic model rebuild based on r2 score deviation","archived":false,"fork":false,"pushed_at":"2020-03-18T22:03:35.000Z","size":6976,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-25T10:41:16.955Z","etag":null,"topics":["bigquery","gcp","iot","ml","model"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mchmarny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-11T17:55:45.000Z","updated_at":"2020-03-18T22:03:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"6f976b59-0f43-470e-8933-c3dee972c1d6","html_url":"https://github.com/mchmarny/automodel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mchmarny/automodel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fautomodel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fautomodel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fautomodel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fautomodel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mchmarny","download_url":"https://codeload.github.com/mchmarny/automodel/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fautomodel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32483406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","gcp","iot","ml","model"],"created_at":"2024-11-08T12:28:26.905Z","updated_at":"2026-05-01T02:34:48.465Z","avatar_url":"https://github.com/mchmarny.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# automodel\n\nBigQuery automatic model rebuild based on r2 score deviation\n\nTo demo automodel usage in IOT use-cases we will need a stream of events on our PubSub topic. For this we will use `iot-event-maker`. The complete walkthrough the IOT Core registry and device setup is outline here: https://github.com/mchmarny/iot-event-maker\n\n## Setup\n\nFor purposes of `automodel` demo however we are going to simply run the one-time setup command and, assuming everything works, we will then run the event generation command which will send the mocked up events to our topic.\n\n```shell\nbin/setup\n```\n\nShould result in output similar to this\n\n```shell\nSetting up PubSub...\nCreated topic [projects/PROJECT_ID/topics/automodel-event].\nCreated topic [projects/PROJECT_ID/topics/automodel-event-state].\nSetting up BigQuery...\nDataset 'PROJECT_ID:automodel' successfully created.\nCreated s9-demo.automodel.event\nSetting up Dataflow...\nname: automodel-bq-pump\nprojectId: PROJECT_ID\ntype: JOB_TYPE_STREAMING\nSetting up IOT Core...\nGenerating a 2048 bit RSA private key\nwriting new private key to 'device1-private.pem'\nCreated registry [automodel-reg].\nCreated device [automodel-device-1].\n```\n\n## Run\n\nTo stream mocked events to the new IOT Core device run the `eventmaker` utility. Besides references to the IOT Core resources we created in setup, there are a few demo-specific parameters worth explaining:\n\n| arg      | description                                                                                                                                                                                                                      |\n| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| `src`    | Unique name of the device from which you are sending events. This is used to identify the specific sender of events in case you are running multiple clients. For this demo we will use something simple like `automodel-client` |\n| `metric` | Name of the metric to generate that will be used as `label` in the sent event (e.g. `utilization`)                                                                                                                               |\n| `range`  | Range of the random data points that will be generated for the above defined metric (e.g. `0.01-2.00` which means floats between `0.01` and `2.00`)                                                                              |\n| `freq`   | Frequency in which these events will be sent to IoT Core (e.g. `2s` which means every 2 sec.)                                                                                                                                    |\n\nTo execute `eventmaker` run the following command:\n\n```shell\nbin/eventmaker --project=${GCP_PROJECT} --region=us-central1 --registry=automodel-reg \\\n\t\t--device=automodel-device-1 --ca=root-ca.pem --key=device1-private.pem \\\n\t\t--src=automodel-client --freq=2s --metric=utilization --range=0.01-2.00\n```\n\nAfter few lines of configuration output, you should see `eventmaker` posting to IOT Core\n\n```shell\n2019/05/12 16:38:39 Publishing: {\"source_id\":\"comp-client\",\"event_id\":\"eid-4d29eab6-a11d-4313-a514-19750f339c3c\",\"event_ts\":\"2019-05-12T23:38:39.352486Z\",\"label\":\"comp-stats\",\"mem_used\":45.30436197916667,\"cpu_used\":20.5,\"load_1\":3.45,\"load_5\":2.84,\"load_15\":2.6,\"random_metric\":1.086788711467383}\n```\n\nThe JSON payload in each one of these events looks something like this:\n\n```json\n{\n  \"source_id\":\"comp-client\",\n  \"event_id\":\"eid-4d29eab6-a11d-4313-a514-19750f339c3c\",\n  \"event_ts\":\"2019-05-12T23:38:39.352486Z\",\n  \"label\":\"comp-stats\",\n  \"mem_used\":45.30436197916667,\n  \"cpu_used\":20.5,\n  \"load_1\":3.45,\n  \"load_5\":2.84,\n  \"load_15\":2.6,\n  \"random_metric\":1.086788711467383\n}\n```\n\nAnd the BigQuery schema looks like this \n\n| Field name    | Type      | Mode     |\n|---------------|-----------|----------|\n| source_id     | STRING    | REQUIRED |\n| event_id      | STRING    | REQUIRED |\n| event_ts      | TIMESTAMP | REQUIRED |\n| label         | STRING    | REQUIRED |\n| mem_used      | FLOAT     | REQUIRED |\n| cpu_used      | FLOAT     | REQUIRED |\n| load_1        | FLOAT     | REQUIRED |\n| load_5        | FLOAT     | REQUIRED |\n| load_15       | FLOAT     | REQUIRED |\n| random_metric | FLOAT     | REQUIRED |\t\n\n## Model\n\n### Create Model\n\nIn BigQuery query window run\n\n```sql\n#standardSQL\nCREATE OR REPLACE MODEL automodel.utilization_model\nOPTIONS\n  (model_type='linear_reg', input_label_cols=['load_15']) AS\nSELECT\n  label,\n  cpu_used,\n  mem_used,\n  load_1,\n  load_5,\n  load_15,\n  random_metric,\n  case when load_1 \u003e load_5 then 1 else 0 end as load_increasing\nFROM automodel.event\nWHERE RAND() \u003c 0.05\n```\n\nresults in\n\n```shell\nThis statement created a new model named automodel.friction_model\n```\n\n\n## Evaluate model\n\nBigQuery model evaluation provides insight into the quality of the model. First we are going to create a table to store the model evaluation data\n\n```sql\nCREATE OR REPLACE TABLE automodel.utilization_model_eval (\n  eval_ts TIMESTAMP NOT NULL,\n  mean_absolute_error FLOAT64 NOT NULL,\n  mean_squared_error FLOAT64 NOT NULL,\n  mean_squared_log_error FLOAT64 NOT NULL,\n  median_absolute_error FLOAT64 NOT NULL,\n  r2_score FLOAT64 NOT NULL,\n  explained_variance FLOAT64 NOT NULL\n)\n```\n\nThen we can crate BigQuery Scheduled job\n\n\n```sql\n#standardSQL\nINSERT automodel.utilization_model_eval (\n   eval_ts,\n   mean_absolute_error,\n   mean_squared_error,\n   mean_squared_log_error,\n   median_absolute_error,\n   r2_score,\n   explained_variance\n) WITH T AS (\n  SELECT * FROM ML.EVALUATE(MODEL automodel.utilization_model,(\n        SELECT\n          label,\n          cpu_used,\n          mem_used,\n          load_1,\n          load_5,\n          load_15,\n          random_metric,\n          case when load_1 \u003e load_5 then 1 else 0 end as load_increasing\n        FROM automodel.event\n  ))\n)\nSELECT\n  CURRENT_TIMESTAMP(),\n  mean_absolute_error,\n  mean_squared_error,\n  mean_squared_log_error,\n  median_absolute_error,\n  r2_score,\n  explained_variance\nFROM T\n```\n\nresults in\n\n```shell\nmean_absolute_error\t mean_squared_error\t mean_squared_log_error\t median_absolute_error\tr2_score            explained_variance\n3.2502161606238453   227.0738450661901   0.008387276788977339    0.12880176496196327    0.9990422574648288  0.999079865551752\n```\n\n\u003e The R2 score is a statistical measure that determines if the linear regression predictions approximate the actual data. 0 indicates that the model explains none of the variability of the response data around the mean. 1 indicates that the model explains all the variability of the response data around the mean.\n\n# Use your model to predict utilization\n\n\n```shell\nbin/eventmaker --project=${GCP_PROJECT} --region=us-central1 --registry=automodel-reg \\\n\t\t--device=automodel-device-1 --ca=root-ca.pem --key=device1-private.pem \\\n\t\t--src=automodel-client --freq=2s --metric=utilization --range=0.01-100.00\n```\n\n```sql\n#standardSQL\nSELECT\n  label,\n  MIN(predicted_load_15) as min_predicted_load,\n  MAX(predicted_load_15) as max_predicted_load\nFROM\n  ML.PREDICT(MODEL automodel.utilization_model, (\n    SELECT\n      label,\n      cpu_used,\n      mem_used,\n      load_1,\n      load_5,\n      load_15,\n      random_metric,\n      case when load_1 \u003e load_5 then 1 else 0 end as load_increasing\n    FROM automodel.event\n  ))\nGROUP BY label\n```\n\n\n# Cleanup\n\nTo delete all the resources created on GCP during the `event` portion of this demo (topic, registry, and devices) run:\n\n```shell\nmake cleanup\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchmarny%2Fautomodel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmchmarny%2Fautomodel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchmarny%2Fautomodel/lists"}