{"id":24087953,"url":"https://github.com/tinybirdco/data-rise-gcp","last_synced_at":"2025-06-29T08:36:49.566Z","repository":{"id":176552448,"uuid":"656633913","full_name":"tinybirdco/data-rise-gcp","owner":"tinybirdco","description":"Repo for the GCP-Tinybird workshop","archived":false,"fork":false,"pushed_at":"2023-06-26T21:55:41.000Z","size":20,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-27T05:24:49.934Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-21T10:28:19.000Z","updated_at":"2023-06-23T14:00:14.000Z","dependencies_parsed_at":"2023-06-28T14:45:31.569Z","dependency_job_id":null,"html_url":"https://github.com/tinybirdco/data-rise-gcp","commit_stats":null,"previous_names":["tinybirdco/data-rise-gcp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-rise-gcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-rise-gcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-rise-gcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-rise-gcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/data-rise-gcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-rise-gcp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259033827,"owners_count":22795769,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:27.906Z","updated_at":"2025-06-10T08:05:21.248Z","avatar_url":"https://github.com/tinybirdco.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# data-rise-gcp\n\nRepo for the GCP-Tinybird workshop\n\n## Part 1: Creating the first API Endpoint\n\nFor this first part of the workshop the plan is to ingest dimensional data from BigQuery, historical data from Google Cloud Storage, and realtime data from Pub/Sub.\n\nThen, mixing the 3 sources, we will create a dynamic API Endpoint.\n\n\n\n### Create your account and your first Workspace\n\nGo to [https://ui.tinybird.co/signup](https://ui.tinybird.co/signup) to log in / sign up, and create a new Workspace.\n\nChoose your region and go for empty workspace by default, we will not be using any Starter Kits for this workshop.\n\n### Ingest dimensional data from BigQuery\n\nUpload the [products CSV](./aux/products.csv) file to BigQuery.\n\nIn your Tinybird Workspace, create a new BigQuery Data Source from that BigQuery table following the [Big Query connector documentation](https://www.tinybird.co/docs/ingest/bigquery).\n\n### Ingest from GCS\n\nLet's do this step first cause the backfill may take a bit longer.\n\nFirst, let's create a Data Source from this small [parquet file](./aux/ecom_events.parquet) we have as sample. Just drag and drop the file into the Tinybird UI. You can adjust data types, Sortng Key...\n\nNow that the data source is created, we will ingest some bigger parquets from GCS.\n\nCopy your admin token —or a new one with append rights to your newly created _ecom_events_ data source—, edit the [backfill_gcs.sh](./aux/backfill_gcs.sh) script, and run it.\n\nNote: for private files, or to ingest every time there are new files in the bucket, you can follow the [Ingest from GCS guide](https://www.tinybird.co/docs/guides/ingest-from-google-gcs).\n\n### Send data from Pub/Sub\n\nFollow the steps in the [Ingest from Pub/Sub guide](https://www.tinybird.co/docs/guides/ingest-from-google-pubsub).\n\nNote: do not use the sample script, use [this one](./aux/pub_sub_demo.py) instead, editing lines 8,9 with your project id and topic.\n\n```python\nproject_id = \u003cYOUR_PROJECT\u003e\ntopic_id = \u003cYOUR_TOPIC_ID\u003e\n```\n\nNote 2: do not create a Materialized View to decode the messages yet, we will do that at query time.\n\n### Create an API Endpoint\n\nLet's create a Pipe with several nodes:\n\n  1. A first node to decode the messages from Pub/Sub. You'll need to use `base64()` and `JSONExtract` as shown in the [example](https://www.tinybird.co/docs/guides/ingest-from-google-pubsub.html#step-4-decode-message-data).\n  1. A second node to filter only the _sale_ events and for _long sleeve_ category products querying the previous node where we decoded the Pub/Sub messages.\n  1. A third node to apply the same filter to the historical data, and only the sales for today.\n  1. A fourth node to make a `union all` of nodes 2 and 3, and make an aggregation —a `count()` is fine— to know the number of sales\n  1. Let's enrich the ranking to show product _name_ instead of _id_ and _total_revenue_ (price * units sold)\n\nAnd let's create an API Endpoint from there.\n\n### Make it dynamic\n\nMake the endpoint accept query params with the templating language. Check the syntax [here](https://www.tinybird.co/docs/query-parameters)\n\nFor example, let's make the `category` and `event` types dynamic, and let's document them for our frontend colleagues to know what things they can pass.\n\n## Part 2: Some optimizations with Materialized Views\n\n### Create a Materialized View to decode Pub/Sub messages\n\nWith [Materialized Views](https://www.tinybird.co/docs/concepts/materialized-views) we can use a Pipe and persist them in a Datasource.\nChoose Sorting Key and Data Types wisely. Recommended reads after the workshop: [Best Practices for faster SQL](https://www.tinybird.co/docs/guides/best-practices-for-faster-sql) and [Thinking in Tinybird](https://www.tinybird.co/blog-posts/thinking-in-tinybird).\n\nCompare processed data —using [Service Data Sources](https://www.tinybird.co/docs/monitoring/service-datasources) like `tinybird.pipe_stats_rt`— to see the difference between querying the MV and having to decode at query time.\n\n## Create a MV to aggregate by time (hour, day…)\n\nAggregatingMergeTrees 101. Check [this guide](https://www.tinybird.co/docs/guides/master-materialized-views.html#doing-aggregations-the-right-way-with-materialized-views) to learn about State and Merge modifiers.\n\nNote that if you create the MV from the UI, Tinybird will add the `State` modifier for you, but you will still need to use `Merge` and group by at query time.\n\nCreate a MV that aggregates the sales, views, or carts per product and hour/day —tip: `toStartofHour()` and `toDate()` are your allies here—.\n\nCompare the same queries from raw data and from Aggregated MV.\n\n\n## Part 3: Data as Code with data projects and CLI\n\n### Download the CLI and check the Data project\n\nYou have already seen in the docs some resources —Data Sources and Pipes— in text format, let's download the [CLI](https://www.tinybird.co/docs/cli) and start working with it.\n\n```bash\ntb auth\n\ntb init\n\ntb workspace current\n\ntb pull --auto\n```\n\nEdit a Pipe that ends in an endpoint and send it back to the Workspace with `tb push`.\n\n### Push some resources to feed a dashboard\n\nGo to the branch called _chart-branch_ and copy its content —/pipes, /datasources, and /endpoints— into your data project.\n\n```bash\ngit checkout chart-branch\ncp -r ./data-project/pipes ./pipes\ncp -r ./data-project/datasources ./datasources\ncp -r ./data-project/endpoints ./endpoints\n```\n\nPush the resources.\n\n```bash\ntb push pipes/events_by_*.pipe --push-deps --populate\ntb push endpoints/api_*.pipe\n```\n\nGet your _dashboard_ token, go to this [webpage](https://ecommerce-svelte-tremor-dashboard.vercel.app/), and paste it in the Token input. You can start playing with the filters, hours...\n\nNote we are assuming that the GCS Data Source is called prods, and some types may mismatch. To see the demo fully working you can check this [repo](https://github.com/tinybirdco/ecommerce-svelte).\n\n## Extra: what we left outside the workshop\n\n- [Apigee](https://www.tinybird.co/docs/publish/api-gateways.html#google-cloud-apigee)\n- [Kafka connector](https://www.tinybird.co/docs/ingest/kafka)\n- [Snowflake connector](https://www.tinybird.co/docs/ingest/snowflake), very similar to BQ.\n- [Tokens](https://www.tinybird.co/docs/concepts/auth-tokens)\n- [Multitenancy](https://www.tinybird.co/blog-posts/multi-tenant-saas-options), [sharing data sources between workspaces](https://www.tinybird.co/blog-posts/new-feature-sharing-data-sources-across-workspaces).\n- [Copy Pipes](https://www.tinybird.co/docs/publish/copy-pipes.html)\n- [Time Series](https://www.tinybird.co/blog-posts/announcing-time-series)\n- Visualizing in [Grafana](https://www.tinybird.co/docs/guides/consume-api-endpoints-in-grafana) or sending data to [Datadog](https://www.tinybird.co/blog-posts/how-to-monitor-tinybird-using-datadog-with-vector-dev) using vector.dev\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fdata-rise-gcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Fdata-rise-gcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fdata-rise-gcp/lists"}