{"id":16643632,"url":"https://github.com/takegue/bqmake","last_synced_at":"2025-10-30T12:30:23.425Z","repository":{"id":63278484,"uuid":"524636888","full_name":"takegue/bqmake","owner":"takegue","description":"BigQuery Powered Data Build Suite.","archived":false,"fork":false,"pushed_at":"2024-10-01T13:57:41.000Z","size":578,"stargazers_count":4,"open_issues_count":8,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-02T08:31:47.511Z","etag":null,"topics":["bigquery","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/takegue.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-08-14T10:22:01.000Z","updated_at":"2024-01-12T18:35:09.000Z","dependencies_parsed_at":"2023-10-31T23:28:30.316Z","dependency_job_id":"867b980e-8846-4113-9803-967a976ca892","html_url":"https://github.com/takegue/bqmake","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/takegue%2Fbqmake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/takegue%2Fbqmake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/takegue%2Fbqmake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/takegue%2Fbqmake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/takegue","download_url":"https://codeload.github.com/takegue/bqmake/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238961200,"owners_count":19559439,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","sql"],"created_at":"2024-10-12T08:09:11.073Z","updated_at":"2025-10-30T12:30:18.076Z","avatar_url":"https://github.com/takegue.png","language":null,"readme":"bqmake: BigQuery Powered Data Build Suite.\n===\n\n`bqmake` provides BigQuery routines that help you to make typical data-modeling.\\\nAll routines are designed to be idempotent and have smart data update mechanism.\\\nThis let free you from awkward DAG workflow management.\n\nThis tool gives following utilities.\n\n- **Dynamic whole/partial Data Refresh for BigQuery Table**:\\\n  Like materialized view, `bqmake.v0.partition_table__update` automatically checks freshness and then updates data if needed.\\\n  This is useful to build pre-computed tables which conists of frequent or expensive query.\\\n  See [Refreshing Partition Table Data](#refreshing-partition-table-data) section for more details.\n- **Data Snapshot Utilities**:\\\n  Table snapshot captures data changes and stores them in Slowly Changing Dimension II format.\n  You can recover table state at any timepoint you snapshoted.\n  `bqmake.v0.snapshot__init` and `bqmake.v0.snapshot__update` are optimized for BigQuery functionality using partitioning/clustering feature\n  and save processing amount and slots.\n- **Metadata Utilities**:\\\n  Preparing useful metadata for tables.\n    * Embedding intra-dataset data lineage into dataset description in mermaid.js format.\n    * Labeling available partition information.\n\nCurrently this is public beta and all routines are subject to change wihtout notice.\nPlease send us your comments and suggestion via issue!\n\n## Get Started\n\nAll utilities are **BigQuery Routines (UDF or PROCEDER)** and published at `bqmake.v0` dataset.\\\nYou can use them without any installation.\n\n### Refreshing Partition Table Data\n\n`bqmake.v0.partition_table__update` makes derived table fresh in specified partition range.\nIt dynamically analyze partition whose derived table and its referenced tables and update data if needed.\n\nBy using [Scheduling Query](https://cloud.google.com/bigquery/docs/scheduling-queries?hl=ja), the procedure is almost behaves like materialized view.\nBut comparing materialized view, you can get extra advanteges:\n* No restricted query syntax.\n* You can get vanilla BigQuery Table that has useful features in BigQuery console such as Preview, BI Engine supports and so on.\n\n```sql\ndeclare query string;\n\n-- Prepare dataset and table\ncreate schema if not exists `zsandbox`;\ncreate or replace table `zsandbox.ga4_count`(event_date date, event_name string, records int64)\npartition by event_date;\n\n-- Prepare data generation query parameterized by @begin and @end (DATE type)\nset query = \"\"\"\n  select date(timestamp_micros(event_timestamp)) as event_date, event_name, count(1)\n  from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`\n  where parse_date('%Y%m%d', _TABLE_SUFFIX) between @begin and @end\n  group by event_date, event_name\n\"\"\";\n\n-- First call procedure to update data\ncall `bqmake.v0.partition_table__update`(\n  (null, 'zsandbox', 'ga4_count')\n  , [('bigquery-public-data', 'ga4_obfuscated_sample_ecommerce', 'events_*')]\n  , `bqmake.v0.alignment_day2day`('2021-01-01', '2021-01-01')\n  , query\n  , null\n);\n--\u003e Affect 16 rows\n\n-- Second call won't update partition data because 2022-01-01 partition is still freshed.\ncall `bqmake.v0.partition_table__update`(\n  (null, 'zsandbox', 'ga4_count')\n  , [('bigquery-public-data', 'ga4_obfuscated_sample_ecommerce', 'events_*')]\n  , `bqmake.v0.alignment_day2day`('2021-01-01', '2021-01-01')\n  , query\n  , null\n);\n--\u003e No affect\n```\n\n### Snapshot Table\n\n```sql\ndeclare query string;\nset query = \"select * from `bigquery-public-data.austin_bikeshare.bikeshare_stations`\"\n\n-- Initialize Snapshot table\ncall `bqmake.v0.snapshot_table__init`(\n  (null, 'zsandbox', 'ga4_count')\n  , (\n    'station_id'\n    , query\n    , current_timestamp()\n  )\n  , null\n);\n\n-- Snapshot after some modification\ncall `bqmake.v0.snapshot_table__update`(\n  destination\n  , null\n  , (\n    'station_id'\n    -- This example changes some records on purpose\n    , 'select * replace(if(station_id in (2499), \"closed\", status) as status) from `bigquery-public-data.austin_bikeshare.bikeshare_stations`'\n    , current_timestamp()\n    )\n  )\n , to_json(struct(\n    -- Demo disables staleness check intentionally.\n    current_timestamp() as force_expired_at\n ))\n)\n```\n\n### Metadata Updates\n\n#### Labeling partition tables on Dataset\n\n`v0.dataset__update_table_labels` set useful labels for partitions tables.\n\n- `partition-min`: Oldest partition_id\n- `partition-max`: Latest partition_id\n- `partition-skip`: Skipped partition count\n\n```sql\ncall `v0.dataset__update_table_labels`(('your_project', 'your_dataset'))\n```\n\n#### Generating Intra-Dataset Lineage on Dataset\n\n`v0.dataset__update_description` generate dataset description with intra-dataset lineage in [marmaid.js](https://mermaid-js.github.io/mermaid/#/) representation.\n\n```sql\ncall `v0.dataset__update_description`(('your_project', 'your_dataset'))\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftakegue%2Fbqmake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftakegue%2Fbqmake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftakegue%2Fbqmake/lists"}