An open API service indexing awesome lists of open source software.

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/bankyadam/not-so-bigquery

An emulator for the Google BigQuery, that can be run locally, backed by PostgreSQL.

bigquery development devtool emulator sql

Last synced: 03 Oct 2025

https://github.com/cvs-health/coldstart

A package for automatic data collection and feature engineering

bigquery feature-engineering python sql sqlalchemy

Last synced: 03 Oct 2025

https://github.com/lennart-finke/pypi-map

A Map of PyPi Packages

bigquery pypi visualization

Last synced: 31 Oct 2025

https://github.com/mercari/dataflowtemplates

Convenient Dataflow pipelines for transforming data between cloud data sources

apache-beam bigquery dataflow dataflow-templates spanner

Last synced: 25 Oct 2025

https://github.com/googlecloudplatform/deviceconnect

https://deviceconnect.readthedocs.io/

bigquery cloudrun fitbit gcp

Last synced: 20 Oct 2025

https://github.com/mlr-org/mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases

bigquery data-backend database duckdb machine-learning mariadb mlr3 mysql odbc postgresql r r-package spark sqlite

Last synced: 28 Feb 2026

https://github.com/kitta65/bq-extension-vscode

Visual Studio Code extension for GoogleSQL

bigquery visual-studio-code vscode

Last synced: 22 Feb 2026

https://github.com/googlecloudplatform/bq-utilization-alerts

A serverless bot which periodically checks configured BigQuery capacity commitments, reservations and assignments against actual slot consumption of running jobs and reports findings to Slack/Google Chat.

bigquery bot chat-ops cloud-run cloud-scheduler google-chat google-cloud serverless slack slots

Last synced: 20 Oct 2025

https://github.com/banditml/faucetml

High speed mini-batch data reading & preprocessing from BigQuery.

bigquery feature-engineering features machine-learning ml preprocessing pytorch

Last synced: 11 Feb 2026

https://github.com/digitalghost-dev/stock-data-pipeline

Code Repository for my 1st Data Project.

bigquery google-cloud-platform python

Last synced: 06 Apr 2025

https://github.com/googlecloudplatform/bigquery-dlp-remote-function

Use Remote Functions to tokenize data with DLP in BigQuery using SQL

bigquery cloud-run data-loss-prevention dlp google-cloud

Last synced: 20 Oct 2025

https://github.com/shinichi-takii/vscode-language-sql-bigquery

Syntax highlighting and code snippets for BigQuery SQL in Visual Studio Code

bigquery grammar snippets sql syntax-highlighting vscode vscode-extension

Last synced: 11 Apr 2025

https://github.com/nownabe/go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

bigquery etl golang google-cloud google-cloud-functions google-cloud-storage

Last synced: 04 Aug 2025

https://github.com/miraisolutions/sparkbq

Sparklyr extension package to connect to Google BigQuery

bigquery r spark sparklyr

Last synced: 04 Sep 2025

https://github.com/kitta65/prettier-plugin-bq

Prettier plugin for GoogleSQL

bigquery prettier

Last synced: 22 Feb 2026

https://github.com/ronoaldo/aetools

Utilities to build and manage Google App Engine apps

bigquery datastore go

Last synced: 04 Oct 2025

https://github.com/googleclouddataproc/hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

apache bigquery gcp google hadoop hive

Last synced: 16 Mar 2026

https://github.com/miraisolutions/spark-bigquery

Google BigQuery data source for Apache Spark

bigquery google-dataproc spark spark-datasource

Last synced: 04 Sep 2025

https://github.com/modataconsulting/dbt_ga4_project

This project uses Google Analytics 4 BigQuery Exports as its source data, and offers useful base transformations to provide report-ready dimension & fact models that can be used for reporting purposes, blending with other data, and/or feature engineering for ML models.

bigquery bq data-build-tool dbt ga4 google-analytics-4 sql

Last synced: 10 Apr 2025

https://github.com/hackersandslackers/bigquery-sqlalchemy-tutorial

:bar_chart: :arrow_right: :floppy_disk: ETL script to migrate data from BigQuery to SQL.

bigquery bigquery-sqlalchemy-tutorial databases etl mysql postgres python sql sqlalchemy tutorial

Last synced: 24 Aug 2025

https://github.com/ginokent/bqschema-gen-go

BigQuery table schema Go struct generator

bigquery bigquery-schema gcp gcp-bigquery go golang

Last synced: 07 May 2025

https://github.com/ocadaruma/scalikejdbc-bigquery

ScalikeJDBC extension for Google BigQuery

bigquery scala scalikejdbc

Last synced: 10 Apr 2025

https://github.com/ottogroup/bquest

Effortlessly validate and test your Google BigQuery queries with the power of pandas DataFrames in Python.

bigquery google-big-query google-cloud integration testing

Last synced: 14 Aug 2025

https://github.com/googlecloudplatform/cloud-composer-mssql-dataflow-bigquery

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.

airflow bigquery cloud-composer dataflow microsoft-sql-server

Last synced: 08 Jul 2025

https://github.com/nodefluent/bigquery-kafka-connect

:cloud: nodejs kafka connect connector for Google BigQuery

big-data bigquery connect etl google-cloud kafka kafka-connect nodejs

Last synced: 26 Apr 2025

https://github.com/hirosassa/bqvalid

SQL linter tool for BigQuery GoogleSQL (formerly known as StandardSQL).

bigquery google linter sql

Last synced: 18 Jan 2026

https://github.com/naseemkullah/gcp-accountant

A tool to identify high cost resources in GCP at a granular level

bigquery cost cost-engineering cost-resources gcp gcp-accountant

Last synced: 30 Apr 2025

https://github.com/googlecloudplatform/google-cloud-abap

ABAP SDK for Google Cloud and BigQuery Connector for SAP enable customers to easily consume Google Products and Services natively from their SAP Landscape.

abap abap-development abapsdk abapsdkforgcp bigquery google-cloud-platform google-generative-ai google-maps-api vertex-ai

Last synced: 17 Jun 2025

https://github.com/omeryasirkucuk/amx

AI-driven CLI for documenting database schemas. DB + docs + codebase agents, 10 backends, BYO LLM, human-in-the-loop review.

agentic-ai ai-agents bigquery cli data-catalog data-engineering database-documentation databricks human-in-the-loop llm metadata postgre python snowflake

Last synced: 07 Jun 2026

https://github.com/medjed/embulk-input-bigquery

BigQuery input plugin for Embulk loads records from BigQuery

bigquery embulk

Last synced: 30 Oct 2025

https://github.com/fivetran/zetasql-npm

npm package for ZetaSQL library

bigquery grpc sql zetasql

Last synced: 16 Apr 2025

https://github.com/mesmacosta/bq-fake-pii-table-creator

Library for creating BigQuery tables with fake PII data

bigquery fake-data faker governance-dapps metadata piidata piii

Last synced: 30 Apr 2025

https://github.com/livebook-dev/req_bigquery

Conveniences for querying Google BigQuery with Req

bigquery req

Last synced: 27 Apr 2025

https://github.com/ksalama/datalab-notebooks

This repository includes end-to-end labs on how to use GCP for applied data science

bigquery cloudml dataflow datalab gcp

Last synced: 29 Jul 2025

https://github.com/sungchun12/schedule-python-script-using-google-cloud

:clock4: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job

appengine-python bigquery chicago-traffic cron google-cloud python-script

Last synced: 01 Sep 2025

https://github.com/blockchain-etl/tezos-etl

Python scripts for ETL (extract, transform and load) jobs for Tezos blocks, balance updates, and operations

bigquery blockchain cryptocurrency csv sql tezos

Last synced: 26 Oct 2025

https://github.com/data-tools/big-data-types

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

apache-spark bigquery bigquery-tables cassandra circe database-types scala schemas spark typeclass typeclass-derivation typesafe

Last synced: 30 Oct 2025

https://github.com/yoheimuta/dbq

CLI tool to easily Decorate BigQuery table name

bigquery bq cli golang table-decorator

Last synced: 07 Mar 2026

https://github.com/terashim/dataform-google-analytics-4-example

Dataform による Google アナリティクス 4 エクスポートデータの変換パイプライン

bigquery dataform google-analytics

Last synced: 08 Oct 2025

https://github.com/googlecloudplatform/datacatalog-tag-history

Historical metadata of your data warehouse is a treasure trove to discover not just insights about changing data patterns, but also quality and user behaviour. This solution creates Data Catalog Tags history in BigQuery since Data Catalog keeps only the latest version of metadata for fast searchability.

analytics bigquery data-catalog data-governance metadata-management

Last synced: 03 Oct 2025

https://github.com/kesin11/ts-junit2json

Convert JUnit XML format to JSON with TypeScript

bigquery junit-xml

Last synced: 25 Apr 2025

https://github.com/oliveroneill/bigqueryswift

BigQuery client for Swift

bigquery google-cloud-platform swift

Last synced: 26 Oct 2025

https://github.com/openbridge/ob_datastash

Stream your CSV files to an HTTP API

aws bigquery csv csv-files logstash parquet redshift

Last synced: 10 Apr 2025

https://github.com/datatovalue/datatovalue-tools

datatovalue-tools

bigquery sql terraform

Last synced: 01 May 2026

https://github.com/future-architect/gbilling-plot

Create graphed invoice for Google Cloud Platform. You can see billing amount per GCP project.

bigquery billing cloud-scheduler gcp-billing go golang slack

Last synced: 17 Oct 2025

https://github.com/toddbirchard/ghost-webhook-api

📑 🎛️ API to automate optimizations for self-hosted blogging platforms.

api automation bigquery blogging ghost github-api google-cloud-storage python webhook-api

Last synced: 07 Mar 2026

https://github.com/badal-io/gcp-airflow-foundations

Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse

airflow apache-airflow bigquery dags data-engineering data-pipeline etl-pipeline

Last synced: 24 Mar 2025

https://github.com/orisano/bqspec

SQL testing tool for Google BigQuery.

bigquery cli python test yaml

Last synced: 08 Jul 2025

https://github.com/vickyjkwan/sqlanalyzer

A SQL parser and analyzer for sql flavors including MySQL, PostgreSQL, BigQuery Standard SQL, Presto SQL and Hive SQL.

athena bigquery hiveql metastore presto sqlparser standardsql

Last synced: 09 Apr 2025

https://github.com/wintermi/imdb-dataform

An example Dataform project to load and transform the publicly available dataset from IMDB.

bigquery dataform google-cloud google-cloud-platform

Last synced: 05 May 2025

https://github.com/google-marketing-solutions/cwv_from_ga4_exports

Simple solution to make reporting on CWVs from BQ simpler to set up.

analytics bigquery google google-cloud-platform

Last synced: 01 Aug 2025

https://github.com/splitmedialabslimited/supermigration

A CLI tool to perform migrations on BigQuery tables

bigquery bigquery-schema gcp node nodejs

Last synced: 01 Aug 2025

https://github.com/naustica/openalex

Repository containing scripts for importing OpenAlex snapshots into BigQuery

bigquery openalex python scholarly-metadata

Last synced: 30 Apr 2025

https://github.com/janaom/gcp-de-project-streaming-pubsub-beam-dataflow

This project demonstrates an end-to-end solution for processing and analyzing real-time conversations data from a JSON file using GCP services and infrastructure automation, showcasing data storage, streaming, processing, and analysis at scale.

apache-beam bigquery dataflow de-project gcp pubsub streaming-data

Last synced: 18 Oct 2025

https://github.com/minodisk/zoq

Convert Zod to BigQuery Schema

bigquery bigquery-schema bigquery-schema-converter zod

Last synced: 22 Apr 2025

https://github.com/keito5656/firebase-authentication-to-bigquery-export

An automatic tool for copying and converting Firebase Authentication data to BigQuery.

bigquery firebase-auth typescript

Last synced: 16 Jan 2026

https://github.com/viant/bigquery

BigQuery database/sql golang driver

bigquery driver golang sql

Last synced: 16 Feb 2026

https://github.com/pcorbel/metaquery

An API to analyze BigQuery metadata

bigquery golang gorm vue-router vuejs vuetifyjs vuex

Last synced: 25 Apr 2025

https://github.com/jashparekh/bigquery-action

This Github action can be used to deploy tables/views schemas to BigQuery.

actions bigquery gbq github-actions google google-bigquery google-cloud-platform hacktoberfest

Last synced: 07 May 2025

https://github.com/hackersandslackers/bigquery-python-tutorial

:bar_chart: :snake: Create tables in Google BigQuery, auto-generate their schemas, and retrieve said schemas.

bigquery data-warehouse gcs google-bigquery google-cloud google-cloud-sdk google-cloud-storage python tutorial

Last synced: 28 Apr 2025

https://github.com/christippett/bigquery-geo-router

Calculate routes from long/lat coordinates in BigQuery using OpenStreetMap/OSRM

bigquery geospatial google openstreetmap osrm

Last synced: 30 Oct 2025

https://github.com/manuelguerra1987/data-engineering-zoomcamp-notes

Notes and material from 2025 Data Engineering Zoomcamp by Datatalks.Club

airflow bigquery data-engineering docker kubernetes

Last synced: 23 Aug 2025

https://github.com/stape-io/request-to-gcs-function

Google Cloud Function that saves everything that came in request to Google Cloud Storage

bigquery gtm gtm-server-side stape

Last synced: 14 Apr 2025

https://github.com/kununu/mysql-to-bigquery-schema-converter

Python lib and cli tool to convert MySQL schemas into BigQuery schemas

bigquery bigquery-schema converter converter-app converter-library des mysql mysql-schema

Last synced: 15 Apr 2025

https://github.com/janaom/gcp-de-project-uber-etl-pipeline

Technologies used: GCS, Compute Engine, Mage, BigQuery, Looker, Python

bigquery gcp looker mage

Last synced: 12 Apr 2025

https://github.com/snithish/tpc-di_benchmark

Benchmark for Airflow with BigQuery as the Data Warehouse using TPC - DI

airflow benchmark bigquery tpc-di

Last synced: 07 May 2025

https://github.com/tufin/espresso

A framework for writing testable BigQuery queries

bigquery sql testing

Last synced: 04 Oct 2025

https://github.com/nathadriele/data-engineering-zoomcamp

The Data Engineering Zoomcamp covers essential skills in containerization, workflow orchestration, data warehousing, analytics engineering, batch, and streaming processing. It includes tools like Docker, Terraform, BigQuery, dbt, Spark, Kafka, Kestra, Postgres, Google Data Studio, and Metabase.

bigquery containerization data-engineering dbt docker google-data-studio kafka kestra metabase orchestration postgresql spark streaming terraform warehousing workflow-automation

Last synced: 21 Mar 2025

https://github.com/ExpediaGroup/circus-train-bigquery

Circus Train plugin which replicates BigQuery tables to Hive

bigquery circus-train google-cloud hive replication

Last synced: 24 Mar 2025

https://github.com/edgarrmondragon/meltano-dogfood

Personal dogfood Meltano project

bigquery dbt dogfood elt evidence-dev meltano

Last synced: 14 Apr 2025

https://github.com/urish/nn-function-generator

Experimenting with automatic generation of TS function bodies using ANN models

bigquery tensorflow tsquery typescript

Last synced: 08 Sep 2025

https://github.com/expediagroup/circus-train-bigquery

Circus Train plugin which replicates BigQuery tables to Hive

bigquery circus-train google-cloud hive replication

Last synced: 23 Sep 2025

https://github.com/wintermi/bqe-dataform

A Dataform project which aggregates BigQuery system metadata for the purpose of analysing the slot usage and storage within an organization by project.

bigquery dataform google-cloud google-cloud-platform

Last synced: 05 May 2025

https://github.com/ksalama/data2cooc2emb2ann

Learning embeddings from item co-occurrence statistics, and building an approx. nearest neighbour index

apache-beam bigquery dataflow embeddings machine-learning python3 tensorflow

Last synced: 13 Jun 2025

https://github.com/gbotemib/gharchive_de_project

An end-to-end data engineering project on github activities data

bigquery dbtcloud docker gcp gcs-bucket looker-studio prefect spark terraform

Last synced: 27 Feb 2025

https://github.com/tomayac/http-archive-progressive-web-apps

Different approaches to estimate the number of Progressive Web Apps in the HTTP Archive

bigquery httparchive

Last synced: 15 Apr 2025

https://github.com/sigpwned/litecene

A simple cross-data store full-text search language for Java 8+

bigquery full-text-search java query-language search

Last synced: 12 Apr 2025

https://github.com/snithish/tpc-ds_big-query

Scripts to execute TPC - DS on Big Query

benchmark bigquery tpc-ds-benchmark tpc-ds-queries

Last synced: 07 May 2025

BigQuery Awesome Lists
BigQuery Categories