Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mchmarny/preprocessd
Simple example showing how to use Cloud Run to pre-process raw events from PubSub and publish them to new topic.
https://github.com/mchmarny/preprocessd
cloudrun events gcp go processing pubsub
Last synced: 3 months ago
JSON representation
Simple example showing how to use Cloud Run to pre-process raw events from PubSub and publish them to new topic.
- Host: GitHub
- URL: https://github.com/mchmarny/preprocessd
- Owner: mchmarny
- License: apache-2.0
- Created: 2019-06-06T02:08:45.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-02-25T03:40:59.000Z (almost 2 years ago)
- Last Synced: 2024-08-03T01:11:59.287Z (6 months ago)
- Topics: cloudrun, events, gcp, go, processing, pubsub
- Language: Go
- Homepage:
- Size: 5.62 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-cloud-run - Use Cloud Run to pre-process raw events from PubSub and publish them to new topic
README
# preprocessd
Simple example showing how to use Cloud Run to pre-process events before persisting them to the backing store (e.g. BigQuery). This is a common use-case where the raw data (e.g. submitted through REST API) needs to be pre-processed (e.g. decorated with additional attributed, classified, or simply validated) before saving.
Cloud Run is a great platform to build these kind of ingestion or pre-processing services:
* Write each one of the pre-processing steps in the most appropriate (or favorite) development language
* Bring your own runtime (or even specific version of that runtime) along with custom libraries
* Dynamically scale up and down with your PubSub event load
* Scale to 0, and don't pay anything, when there is nothing to process
* Use granular access control with service account and policy bindings## Event Source
In this example will will use the synthetic events on PubSub topic generated by [pubsub-event-maker](https://github.com/mchmarny/pubsub-event-maker) utility. We will use it to mock synthetic `utilization` data from `3` devices and publish them to Cloud PubSub on the `eventmaker` topic in your project. The PubSub payload looks something like this:
```json
{
"source_id": "device-1",
"event_id": "eid-b6569857-232c-4e6f-bd51-cda4e81f3e1f",
"event_ts": "2019-06-05T11:39:50.403778Z",
"label": "utilization",
"mem_used": 34.47265625,
"cpu_used": 6.5,
"load_1": 1.55,
"load_5": 2.25,
"load_15": 2.49,
"random_metric": 94.05090880450125
}
```The instructions on how to configure `pubsub-event-maker` to start sending these events are [here](https://github.com/mchmarny/pubsub-event-maker).
## Pre-requirements
### GCP Project and gcloud SDK
If you don't have one already, start by creating new project and configuring [Google Cloud SDK](https://cloud.google.com/sdk/docs/). Similarly, if you have not done so already, you will have [set up Cloud Run](https://cloud.google.com/run/docs/setup).
## Setup
### Build Container Image
Cloud Run runs container images. To build one we are going to use the included [Dockerfile](./Dockerfile) and submit the build job to Cloud Build using [bin/image](./bin/image) script.
> Note, you should review each one of the provided scripts for complete content of these commands
```shell
bin/image
```> If this is first time you use the build service you may be prompted to enable the build API
### Service Account and IAM Policies
In this example we are going to follow the [principle of least privilege](https://searchsecurity.techtarget.com/definition/principle-of-least-privilege-POLP) (POLP) to ensure our Cloud Run service has only the necessary rights and nothing more:
* `run.invoker` - required to execute Cloud Run service
* `pubsub.editor` - required to create and publish to Cloud PubSub
* `logging.logWriter` - required for Stackdriver logging
* `cloudtrace.agent` - required for Stackdriver tracing
* `monitoring.metricWriter` - required to write custom metrics to StackdriverTo do that we will create a GCP service account and assign the necessary IAM policies and roles using [bin/account](./bin/account) script:
```shell
bin/account
```### Cloud Run Service
Once you have configured the GCP accounts, you can deploy a new Cloud Run service and set it to run under that account using and preventing unauthenticated access [bin/service](./bin/service) script:
```shell
bin/service
```## PubSub Subscription
To enable PubSub to send topic data to Cloud Run service we will need to create a PubSub topic subscription and configure it to "push" events to the Cloud Service we deployed above.
```shell
bin/pubsub
```## Log
You can see the raw data and all the application log entries made by the service in Cloud Run service logs.
## Saving Results
The process of saving resulting data from this service will depend on your target (the place where you want to save the data). HCP has a number of existing connectors and templates so, in most cases, you do not have to even write any code. Here is an example of a Dataflow template that streams PubSub topic data to BigQuery:
```shell
gcloud dataflow jobs run JOB_NAME \
--gcs-location gs://dataflow-templates/latest/PubSub_to_BigQuery \
--parameters \
inputTopic=projects/YOUR_PROJECT_ID/topics/YOUR_TOPIC_NAME,\
outputTableSpec=YOUR_PROJECT_ID:YOUR_DATASET.YOUR_TABLE_NAME
```This approach will automatically deal with back-pressure, retries, monitoring and is not subject to the batch insert quote limits.
## Cleanup
To cleanup all resources created by this sample execute the [bin/cleanup](bin/cleanup) script.
```shell
bin/cleanup
```## Disclaimer
This is my personal project and it does not represent my employer. I take no responsibility for issues caused by this code. I do my best to ensure that everything works, but if something goes wrong, my apologies is all you will get.