{"id":13637920,"url":"https://github.com/mchmarny/preprocessd","last_synced_at":"2025-04-15T20:31:59.924Z","repository":{"id":77051897,"uuid":"190498093","full_name":"mchmarny/preprocessd","owner":"mchmarny","description":"Simple example showing how to use Cloud Run to pre-process raw events from PubSub and publish them to new topic. ","archived":false,"fork":false,"pushed_at":"2023-02-25T03:40:59.000Z","size":5892,"stargazers_count":5,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-29T02:04:40.748Z","etag":null,"topics":["cloudrun","events","gcp","go","processing","pubsub"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mchmarny.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-06-06T02:08:45.000Z","updated_at":"2021-04-26T14:19:46.000Z","dependencies_parsed_at":"2024-01-14T09:15:06.015Z","dependency_job_id":"4dd67024-1b21-4fcd-9d55-18b4e4cc0749","html_url":"https://github.com/mchmarny/preprocessd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fpreprocessd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fpreprocessd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fpreprocessd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mchmarny%2Fpreprocessd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mchmarny","download_url":"https://codeload.github.com/mchmarny/preprocessd/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249148006,"owners_count":21220459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloudrun","events","gcp","go","processing","pubsub"],"created_at":"2024-08-02T01:00:36.905Z","updated_at":"2025-04-15T20:31:55.448Z","avatar_url":"https://github.com/mchmarny.png","language":"Go","readme":"# preprocessd\n\nSimple example showing how to use Cloud Run to pre-process events before persisting them to the backing store (e.g. BigQuery). This is a common use-case where the raw data (e.g. submitted through REST API) needs to be pre-processed (e.g. decorated with additional attributed, classified, or simply validated) before saving.\n\nCloud Run is a great platform to build these kind of ingestion or pre-processing services:\n\n* Write each one of the pre-processing steps in the most appropriate (or favorite) development language\n* Bring your own runtime (or even specific version of that runtime) along with custom libraries\n* Dynamically scale up and down with your PubSub event load\n* Scale to 0, and don't pay anything, when there is nothing to process\n* Use granular access control with service account and policy bindings\n\n## Event Source\n\nIn this example will will use the synthetic events on PubSub topic generated by [pubsub-event-maker](https://github.com/mchmarny/pubsub-event-maker) utility. We will use it to mock synthetic `utilization` data from `3` devices and publish them to Cloud PubSub on the `eventmaker` topic in your project. The PubSub payload looks something like this:\n\n```json\n{\n    \"source_id\": \"device-1\",\n    \"event_id\": \"eid-b6569857-232c-4e6f-bd51-cda4e81f3e1f\",\n    \"event_ts\": \"2019-06-05T11:39:50.403778Z\",\n    \"label\": \"utilization\",\n    \"mem_used\": 34.47265625,\n    \"cpu_used\": 6.5,\n    \"load_1\": 1.55,\n    \"load_5\": 2.25,\n    \"load_15\": 2.49,\n    \"random_metric\": 94.05090880450125\n}\n```\n\nThe instructions on how to configure `pubsub-event-maker` to start sending these events are [here](https://github.com/mchmarny/pubsub-event-maker).\n\n## Pre-requirements\n\n### GCP Project and gcloud SDK\n\nIf you don't have one already, start by creating new project and configuring [Google Cloud SDK](https://cloud.google.com/sdk/docs/). Similarly, if you have not done so already, you will have [set up Cloud Run](https://cloud.google.com/run/docs/setup).\n\n\n## Setup\n\n### Build Container Image\n\nCloud Run runs container images. To build one we are going to use the included [Dockerfile](./Dockerfile) and submit the build job to Cloud Build using [bin/image](./bin/image) script.\n\n\u003e Note, you should review each one of the provided scripts for complete content of these commands\n\n```shell\nbin/image\n```\n\n\u003e If this is first time you use the build service you may be prompted to enable the build API\n\n### Service Account and IAM Policies\n\nIn this example we are going to follow the [principle of least privilege](https://searchsecurity.techtarget.com/definition/principle-of-least-privilege-POLP) (POLP) to ensure our Cloud Run service has only the necessary rights and nothing more:\n\n* `run.invoker` - required to execute Cloud Run service\n* `pubsub.editor` - required to create and publish to Cloud PubSub\n* `logging.logWriter` - required for Stackdriver logging\n* `cloudtrace.agent` - required for Stackdriver tracing\n* `monitoring.metricWriter` - required to write custom metrics to Stackdriver\n\nTo do that we will create a GCP service account and assign the necessary IAM policies and roles using [bin/account](./bin/account) script:\n\n```shell\nbin/account\n```\n\n### Cloud Run Service\n\nOnce you have configured the GCP accounts, you can deploy a new Cloud Run service and set it to run under that account using and preventing unauthenticated access [bin/service](./bin/service) script:\n\n```shell\nbin/service\n```\n\n## PubSub Subscription\n\nTo enable PubSub to send topic data to Cloud Run service we will need to create a PubSub topic subscription and configure it to \"push\" events to the Cloud Service we deployed above.\n\n```shell\nbin/pubsub\n```\n\n## Log\n\nYou can see the raw data and all the application log entries made by the service in Cloud Run service logs.\n\n\u003cimg src=\"images/log.png\" alt=\"Cloud Run Log\"\u003e\n\n## Saving Results\n\nThe process of saving resulting data from this service will depend on your target (the place where you want to save the data). HCP has a number of existing connectors and templates so, in most cases, you do not have to even write any code. Here is an example of a Dataflow template that streams PubSub topic data to BigQuery:\n\n```shell\ngcloud dataflow jobs run JOB_NAME \\\n    --gcs-location gs://dataflow-templates/latest/PubSub_to_BigQuery \\\n    --parameters \\\ninputTopic=projects/YOUR_PROJECT_ID/topics/YOUR_TOPIC_NAME,\\\noutputTableSpec=YOUR_PROJECT_ID:YOUR_DATASET.YOUR_TABLE_NAME\n```\n\nThis approach will automatically deal with back-pressure, retries, monitoring and is not subject to the batch insert quote limits.\n\n\n## Cleanup\n\nTo cleanup all resources created by this sample execute the [bin/cleanup](bin/cleanup) script.\n\n```shell\nbin/cleanup\n```\n\n## Disclaimer\n\nThis is my personal project and it does not represent my employer. I take no responsibility for issues caused by this code. I do my best to ensure that everything works, but if something goes wrong, my apologies is all you will get.\n\n\n","funding_links":[],"categories":["Tutorials"],"sub_categories":["Async and events"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchmarny%2Fpreprocessd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmchmarny%2Fpreprocessd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmchmarny%2Fpreprocessd/lists"}