{"id":16991998,"url":"https://github.com/jolares/example-gcp-dataform","last_synced_at":"2025-04-15T15:58:47.201Z","repository":{"id":121097961,"uuid":"555689712","full_name":"jolares/example-gcp-dataform","owner":"jolares","description":"Example end-to-end ELT data pipeline using GCP Dataform.","archived":false,"fork":false,"pushed_at":"2023-08-09T23:35:39.000Z","size":131,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-15T15:58:41.354Z","etag":null,"topics":["bigquery","dataform","etl-pipeline"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jolares.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-10-22T04:57:34.000Z","updated_at":"2024-12-06T08:40:26.000Z","dependencies_parsed_at":"2023-08-10T00:41:32.546Z","dependency_job_id":"6ea35068-9e82-48a7-b5cb-603e793b0d16","html_url":"https://github.com/jolares/example-gcp-dataform","commit_stats":null,"previous_names":["jolares/example-elt-pipeline-with-dataform","jolares/example-gcp-dataform"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolares%2Fexample-gcp-dataform","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolares%2Fexample-gcp-dataform/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolares%2Fexample-gcp-dataform/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolares%2Fexample-gcp-dataform/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jolares","download_url":"https://codeload.github.com/jolares/example-gcp-dataform/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249105473,"owners_count":21213534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","dataform","etl-pipeline"],"created_at":"2024-10-14T03:28:01.877Z","updated_at":"2025-04-15T15:58:47.181Z","avatar_url":"https://github.com/jolares.png","language":"JavaScript","readme":"# stackoverflow-data-reports\n\n\n- `data-reports-etl`: A project built with BigQuery Dataform that transforms Stackoverflow\n  public raw data into reporting tables in a BigQuery data warehouse.\n\n- `ETL`: A project built with BigQuery Dataform that transforms Stackoverflow\n  public raw data into reporting tables in a BigQuery data warehouse.\n\n\n## Local Development Environment Setup\n\n\n- Clone the project workspace repository:\n  `git clone https://github.com/jolares/stackoverflow-ai.git`\n\n- Install project workspace node dependencies:\n  `npm install`\n\n\n#### (Optional) Install Recommended VSCode Extensions\n\n\n- `dataform.dataform`: provides syntax highlighting, compilation, and intellisense\n  for Dataform and SQLX projects. Refer to the [extension site](https://marketplace.visualstudio.com/items?itemName=dataform.dataform)\n\n\n\n---\n\n\n\n## GCP Project Configuration\n\n\n### Setup Secrets\n\n\n#### Create a Secret Token with Dataform Service Account Permissions\n\n\nA secret token is created for the GCP _Dataform Service Account_ to interact with\nDataform resources.\n\n- Enable GCP Secret Manager API for the GCP Project\n\n- Open the GCP Secret Manager console and create a new secret token with any\n  secure value of your preference.\n  \n  - This project named the secret token `GCP_BIGQUERY_DATAFORM_SA_TOKEN` \n  \n- After the secret is created, edit the secret's _permissions_ and\n  _grant access_ to the Dataform service account; for this, make the service\n  account a new principal for the secret, and assign to it the role\n  `Secret Manager Secret Accessor`\n\nA secret token is created for the GCP _Dataform Service Account_ to interact with\nDataform resources.\n\n\n#### Assign BigQuery Permissions to the Dataform Service Account\n\n\n- Open the [Google Cloud AIM Admin console](https://console.cloud.google.com/iam-admin)\n\n- Assign the role of `BigQuery Admin` to the dataform service account\n\n- Edit the _Dataform Service Account_ created by Google by adding the _Role_\n  `BigQuery Admin` to it.\n  \u003e Note: if you do not see the service account in the list of principals displayed\n  \u003e to you within the Permissions page, you probably need to enable/check the\n  \u003e option that indicates `Include Google-provided role grants`\n\n\n\n---\n\n\n\n## Project Folder Structure\n\n\n```\nstackoverflow-ai/\n├── ...\n├── definitions/\n├   ├── reporting/\n├   ├── sources/\n├   ├── testing/\n├── environments.json\n├── schedules.json\n└── package.json\n```\n\n### Dataform Dependencies\n\n\n### Dataform Configuration File\n\n\n```json\n// dataform.json file\n{\n    // (Required) Set this value to the GCP BigQuery dataset name (the Dataset ID without\n    // the GCP Project ID subdomain)\n    \"defaultSchema\": \"{GCP_BIGQUERY_DATASET_NAME}\",\n    // (Required)\n    \"assertionSchema\": \"dataform_assertions\",\n    // (Required)\n    \"warehouse\": \"bigquery\",\n    // (Required) Set this value to the GCP Project ID\n    \"defaultDatabase\": \"{GCP_PROJECT_ID}\"\n    // (Optional) Set this value the BigQuery Dataset Location (i.e. us-central-1, US)\n    \"defaultLocation\": \"US\"\n}\n```\n\n\n### Dataform Environments File\n\n\n```json\n{\n  \"environments\": [\n    {\n      \"name\": \"production\",\n      \"configOverride\": {},\n      // The git repository branch, or commit SHA, that triggers the workflow\n      // run using this environment (i.e. master, main, release, develop, etc)\n      \"gitRef\": \"master\"\n    },\n    {\n      \"name\": \"development\",\n      \"configOverride\": {},\n      \"gitRef\": \"master\"\n    },\n    {\n      \"name\": \"testing\",\n      \"configOverride\": {},\n      \"gitRef\": \"master\"\n    },\n\n    // ... Other environments can be added here\n  ]\n}\n```\n\n\n### Dataform Schedules File\n\n\n```json\n{\n  \"schedules\": [\n    {\n      \"name\": \"daily\",\n      \"options\": {\n        \"includeDependencies\": false,\n        \"fullRefresh\": false,\n        \"tags\": [\n          \"daily\"\n        ]\n      },\n      \"cron\": \"00 09 * * *\",\n      \"notification\": {\n        \"onSuccess\": false,\n        \"onFailure\": false\n      },\n      \"notifications\": [\n        {\n          \"events\": [\n            \"failure\"\n          ],\n          \"channels\": [\n            \"email jo\"\n          ]\n        }\n      ]\n    }\n  ],\n  \"notificationChannels\": [\n    {\n      \"name\": \"email jo\",\n      \"email\": {\n        \"to\": [\n          \"jolares@gatech.edu\"\n        ]\n      }\n    }\n  ]\n}\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolares%2Fexample-gcp-dataform","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjolares%2Fexample-gcp-dataform","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolares%2Fexample-gcp-dataform/lists"}