https://github.com/lightspeed/bigquery-uniview
Create BigQuery views that unify sets of table with the same prefix and different versions.
https://github.com/lightspeed/bigquery-uniview
Last synced: about 1 year ago
JSON representation
Create BigQuery views that unify sets of table with the same prefix and different versions.
- Host: GitHub
- URL: https://github.com/lightspeed/bigquery-uniview
- Owner: lightspeed
- License: apache-2.0
- Created: 2020-07-29T17:06:35.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-29T17:17:38.000Z (almost 6 years ago)
- Last Synced: 2025-02-17T02:25:01.630Z (over 1 year ago)
- Language: Python
- Size: 6.84 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BigQuery UniView
Create BigQuery views that unify sets of table with the same prefix and
different versions. The view returns a union of all records from the
different table versions and cast columns to a common data type.
This tool was written to be deployed as a Cloud Function. The function is
triggered by a Pub/Sub topic feeded by our BigQuery sink. The dispatched
messages contain information about table that are created by our process.
e.g:
```
{"dataset":"...","table":{"name":"customer","version":"5b6ffe5fb"}}
```
## Rules
Columns that conflict in their data type are cast to a common representation.
The specific cast used depends on the conflicting data types. Basic idea is
to extend the target data type. We go from a more specific to more generic
type. This mitigates any information loss. As such, STRING represents our most
generic type, since it can represent any other BQ type.
| Conflict Types | Cast To |
|--------------------|-----------|
| DATETIME TIMESTAMP | TIMESTAMP |
| ANY ANY | STRING |
## Variables
The `PROJECT_ID` environment variable must be set to a valid Google Project ID.
The service account must have READ access to the source dataset and WRITE access
to the output dataset.
## Deployment
The following example is the command that we use to deploy on GCP:
```
gcloud functions deploy --project --runtime python37 --retry \
--entry-point handler --source ./ --trigger-topic bigquery-table-create \
bigquery-uniview --set-env-vars PROJECT_ID= \
--service-account bigquery-uniview@.iam.gserviceaccount.com
```