An open API service indexing awesome lists of open source software.

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/garbetjie/phpunit-bigquery-schema

A BigQuery schema validator constraint for BigQuery

bigquery phpunit

Last synced: 13 Apr 2026

https://github.com/teraearlywine/sample-sql

The following repo contains samples of SQL code that can be referenced by future clients or employers.

bigquery database mysql sql

Last synced: 13 Apr 2026

https://github.com/mchmarny/sbomer

Generates daily SBOM and vulnerability reports for container images and saves resulting files into GCS bucket and data into BigQuery tables.

bigquery gcp gcs grype report sbom syft vex vulnerability

Last synced: 23 Mar 2026

https://github.com/uw-labs/mobiq

Mongo -> BigQuery importer

bigquery looker mongo

Last synced: 08 May 2026

https://github.com/redis-developer/demo-redis-bigquery

This app uses Redis and BigQuery. Data is prefetched from BigQuery and queried using Redis Search and JSON.

bigquery demo express formula1 google-cloud javascript node-redis react redis

Last synced: 13 Apr 2026

https://github.com/salrashid123/gcp_cloud_status_dataset

BigQuery Dataset to query GCP Cloud Status Dashboard (https://status.cloud.google.com/)

bigquery gcp google-cloud google-cloud-platform

Last synced: 20 Apr 2026

https://github.com/alterra-greeve/de-capstone

Capstone Project SIB Batch 6 x Alterra Academy - Data Engineer

bigquery cloud-function data-engineering docker googlefirebase looker-studio python

Last synced: 26 Jan 2026

https://github.com/miguelapp10/etl_operadorlogistico

extraer datos de la API de SimpliRoute, AndesExpress y Urbano en un rango de fechas específico y procesarlos para su análisis y almacenamiento en Google BigQuery

api-client bigquery pandas python

Last synced: 13 Apr 2026

https://github.com/rbmuller/scherlok

A detective for your data. Zero-config data quality monitoring — works with dbt, Postgres, BigQuery, Snowflake. No YAML.

anomaly-detection bigquery cli data-engineering data-observability data-quality dbt etl monitoring open-source postgres postgresql python snowflake

Last synced: 15 May 2026

https://github.com/thumbtack/becquerel

Gateway server that provides an OData interface to BigQuery and Elasticsearch

bigquery elasticsearch odata play play-scala scala sql

Last synced: 05 Feb 2026

https://github.com/erickkhosasi/thelook-data_analysis

Final project for my SQL mini bootcamp. This project explores an e-commerce dataset to uncover key business insights. Data insights were queried in Google BigQuery and visualized with Google Sheets.

bigquery data-analysis e-commerce sql

Last synced: 05 Oct 2025

https://github.com/brunopata/dataco-supply-chain

A complete end-to-end data project exploring operational efficiency, customer behavior and profitability across 180K+ orders. Built using BigQuery, Power BI and Jupyter — includes modeling, KPI analysis and actionable business insights.

analytics bigquery business-intelligence dashboard data-modeling power-bi sql supply-chain

Last synced: 28 Jan 2026

https://github.com/logica-web/logica-web.github.io

Documentation for Logica, a logic programming language that compiles to DuckDB, Google BigQuery, PostgreSQL and SQLite.

bigquery datalog logic-programming query-language

Last synced: 08 Oct 2025

https://github.com/targetta/ankaflow

YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.

bigquery clickhouse data-analysis dataops deltalake duckdb elt-pipeline etl etl-automation motherduck parquet python sql

Last synced: 09 Oct 2025

https://github.com/pedrocarmona/big_query_adapter

An ActiveRecord Google BigQuery adapter

activerecord bigquery gem ruby-on-rails

Last synced: 11 Oct 2025

https://github.com/ksmin23/gcp-datastream-cdc-data-pipeline

A complete Terraform setup for creating a secure, private data replication pipeline from Cloud SQL (MySQL) to BigQuery using Datastream and Private Service Connect (PSC).

bigquery cloud-sql data-pipeline datastream google-cloud-platform mysql terraform

Last synced: 14 Apr 2026

https://github.com/kellyjadams/bigquery-python-weekly-report

A script to automate a weekly report that runs BigQuery in Python.

bigquery python

Last synced: 27 Jan 2026

https://github.com/pmhalvor/whale-speech

A pipeline to map whale sightings to hydrophone audio

beam bigquery gcs mle model-as-a-service python tensorflow2

Last synced: 24 Feb 2026

https://github.com/aleenprd/docbt

Documentation Build Tool - Generate YAML documentation for dbt models with optional AI assistance. Built with Streamlit for an intuitive and familiar web interface.

ai analytics-engineering bigquery data data-modeling data-science dbt docker llm lmstudio ollama openai snowflake sql streamlit

Last synced: 11 Nov 2025

https://github.com/fpopic/bigquery-schema-select

(Script) Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.

bigquery bigquery-schema scala sql

Last synced: 15 Mar 2026

https://github.com/alimarzouk/paris-aq

ELTL pipeline to monitor air quality in the Paris Île-de-France area

airflow airquality big-data bigquery dataengineering gcs spark

Last synced: 06 Feb 2026

https://github.com/george-nyamao/gcp_etl_project

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

airflow bigquery cloudcomposer data-fusion gcs-bucket looker python3 wrangler

Last synced: 07 Feb 2026

https://github.com/yaph/queries

Collection of Data Queries in SPARQL and SQL

bigquery data-mining dbpedia openstreetmap osm queries sparql sql stackoverflow wikidata

Last synced: 01 Feb 2026

https://github.com/niteshchawla/nc-sql-business-case

A Leading Retail chain brand and a prominent retailer in the United States. It makes itself a preferred shopping destination by offering outstanding value, inspiration, innovation and an exceptional guest experience that no other retailer can deliver.

bigquery retail sql supermarket

Last synced: 07 Feb 2026

https://github.com/borisgra/fullweb

FullStack Web Applications with React and Kotlin JS and NPM(ag-grid). Look and modify ANY view from ANY base (PGSql, Bigquery Google)

bigquery compose docker dockerhub fullstack kotlin kotlin-fullstack kotlin-js kotlin-js-react kotlin-jvm kotlin-multiplatform ktor npm-module postgresql

Last synced: 08 Feb 2026

https://github.com/prathmeshyelne/etl-pipeline-for-employee-data-using-data-fusion-airflow

This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.

airflow bigquery dataengineering etl gcp googlecloudplatform

Last synced: 08 Feb 2026

https://github.com/rsachdeva/illuminatingdeposits-gcp-trigger

Terraform usage in the context of Google Cloud Platform GCP based Trigger of Resources applied to Cloud Functions. Both resource creation and destruction is through Terraform.

bigquery bigquery-table cloud-events functions-framework gcp go golang golangci-lint google-cloud google-cloud-function-pubsub-trigger google-cloud-functions google-cloud-pubsub google-cloud-sdk google-cloud-storage google-cloud-terraform sendgrid terraform

Last synced: 11 Feb 2026

https://github.com/branb97/jobstreet-data-eng-project

Building a data pipeline to deliver job listing data from Jobstreet for analysis.

airflow bigquery data-engineering etl-pipeline google-cloud looker-studio python sql

Last synced: 12 Feb 2026

https://github.com/nguyendangxuanlinh/newyorkbike-rental-trip-time-prediction-model-googlebigquery

The ML project uses Linear Regression to predict the trip time of a bike rental for a new prediction system in new mobile application. The ML datasets have been collected and stored in a BigQuery public dataset

bigquery linear-regression machine-learning

Last synced: 12 Feb 2026

https://github.com/alexye-mapleleafs/ceba_process

Canada Emergency Business Account (CEBA) Process Automation in GCP

airflow-dags bigquery docker-container docker-image google-cloud-platform google-cloud-storage python3

Last synced: 16 Apr 2026

https://github.com/mchirico/gmail

Inserts Gmail messages into BigQuery, then, deletes.

angular9 bigquery gcp gmail python3

Last synced: 10 May 2026

https://github.com/ajaxbarcelonacruyff/databricks_bigquery

Extract BigQuery tables in Databricks Notebook

bigquery databricks databricks-notebooks ga4 googleanalytics

Last synced: 17 Apr 2026

https://github.com/viveknaskar/cloud-bigquery-poc

A POC on the data warehousing solution provided by Google Cloud

bigquery data-warehouse gcp googlecloud googlecloudplatform

Last synced: 18 Apr 2026

https://github.com/victorcezeh/elt-data-pipeline-project

An end-to-end ELT project using the Brazilian E-Commerce dataset from Kaggle. This project demonstrates the use of Python, PostgreSQL, Docker, Docker Compose, Airflow, dbt, and BigQuery to ingest, transform, and analyze data, providing insights into sales, delivery times, and order distributions.

airflow bigquery dbt-core docker docker-compose postgresql python

Last synced: 04 Jun 2026

https://github.com/tupizz/fiap_pnad-covid-19

Este projeto realiza a análise e transformação de dados da PNAD COVID-19 de maio a julho de 2020, utilizando PySpark para processamento de dados em larga escala e BigQuery como destino para armazenamento e análise posterior. O objetivo é consolidar os dados mensais em um único conjunto de dados transformado.

analysis bigquery pyspark python

Last synced: 27 Apr 2026

https://github.com/paulpierre/google-bq-export-downloader

Google BigQuery Export Downloader

big-data bigquery dump export gcs

Last synced: 28 Apr 2026

https://github.com/pkpkpk/gcp

clojure bindings for select GCP sdks

bigquery cloudstorage gcp gemini google-cloud-platform pubsub vertexai

Last synced: 28 Apr 2026

https://github.com/edonosotti/gmail-accounting-automation

Automate accounting from invoices in Gmail, using Apps Script, Google Sheets and optionally BigQuery.

accounting apps-script automation bigquery expense-tracker expenses finance finance-automation finances gmail google google-api google-apis google-sheets

Last synced: 29 Apr 2026

https://github.com/brunopata/adventureworks-sql-analysis

SQL-driven analysis of sales, customer behavior, time trends and regional performance using the AdventureWorks dataset. Built using Google BigQuery and SQL to uncover key business insights. Data is structured through clean queries and views designed to support product, customer and geographic decisions.

bigquery business-intelligence data-analytics data-engineering etl google-cloud sales-analysis sql

Last synced: 30 Apr 2026

https://github.com/mchmarny/automodel

BigQuery automatic model rebuild based on r2 score deviation

bigquery gcp iot ml model

Last synced: 01 May 2026

https://github.com/lokimcpuniverse/gcp-mcp-server

MCP server for Google Cloud Platform - Complete GCP services integration for GenAI

ai-agents bigquery cloud gcp genai google-cloud infrastructure mcp model-context-protocol vertex-ai

Last synced: 02 May 2026

https://github.com/nais/bqrator

Operator for creating BigQuery datasets

bigquery bigquery-operator kubernetes kubernetes-operator nais-features

Last synced: 03 May 2026

https://github.com/topefolorunso/musicaly-project

An end-to-end data pipeline that ingests simulated music stream data, structures, cleans and models the raw data, and visualizes clean data.

airflow bigquery data-pipeline dbt google-cloud-platform kafka python spark-streaming

Last synced: 04 May 2026

https://github.com/camilajaviera91/mock-data-factory

Generate large-scale synthetic datasets using SQL and BigQuery.

bigquery dbt dotenv exceute-batch faker load-dotenv os postgresql psycopg2 psycopg2-extras random sql yml

Last synced: 04 May 2026

https://github.com/tamanyan/digdag-embulk-server

Digdag Server for building Data Lake

bigquery digdag docker docker-compose embulk etl

Last synced: 04 May 2026

https://github.com/elithrar/finding-bugs-with-bigquery

A talk on using BigQuery, the GitHub Public Data & some elbow grease to find bugs in OSS projects.

big-data bigquery bugs github golang open-source

Last synced: 06 May 2026

https://github.com/undisputed-jay/etl-on-gcp-with-apache-airflow

In this project, files were ingested to Google Cloud Storage and later to moved to BigQuery so as to perform some queries and the result moved back to Google Cloud Storage.

apache-airflow bigquery data-engineering data-warehouse docker etl-pipeline google-cloud-platform

Last synced: 06 May 2026

https://github.com/rembertdesigns/smart-vinyl-catalog

AI-powered vinyl cataloging and music discovery platform leveraging BigQuery’s generative AI. Processes mixed-format data to deliver personalized recommendations, collection analytics, and intelligent search. Created for the Kaggle BigQuery AI Challenge to showcase real-world, scalable AI solutions for music lovers.

ai bigquery data-science data-visualization generative-ai hackathon kaggle kaggle-competition machine-learning music-analytics music-recommendation-algorithm python recommender-system vinyl

Last synced: 07 May 2026

https://github.com/tatamiya/new-books-notification

Fetch new books from [版元ドットコム](https://www.hanmoto.com/) and notify them to Slack

bigquery cloudrun-jobs gcs golang slack

Last synced: 07 May 2026

https://github.com/aicayzer/bigquery-mcp

A Model Context Protocol (MCP) server for secure BigQuery access across multiple Google Cloud projects with advanced analytics, security controls, and Docker support.

ai bigquery bigquery-api claude cursor gcloud gcloud-sdk gcp google-cloud-platform mcp mcp-server

Last synced: 08 May 2026

https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds

Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow

apache-arrow apache-beam bigquery dataflow neo4j python

Last synced: 09 May 2026

https://github.com/toskpl/googlecloud

Challnege 30 days - GoogleCloud

bigquery google-cloud google-cloud-platform ml

Last synced: 10 May 2026

https://github.com/janascher/engenharia-de-dados

Resoluções das atividades das aulas de Engenharia de Dados da Alpha EdTech.

bigquery dash-plotly data-engineer data-warehouse google-cloud-platform google-cloud-storage pandas pyspark python

Last synced: 10 May 2026

https://github.com/zkan/running-bigquery-query-from-airflow-using-bigqueryexecuteoperator

Running BigQuery Query from Airflow using BigQueryExecuteOperator

airflow bigquery sql

Last synced: 10 May 2026

https://github.com/ackeecz/terraform-gcp-dataflow_pubsub_to_bq

Dataflow job subscriber to PubSub subscription. It takes message from subscription and push it into BigQuery table.

bigquery dataflow pubsub terraform-module

Last synced: 13 May 2026

https://github.com/shirlyngit/elt-pipeline-with-gcp-airflow-looker-studio

Scalable ELT pipeline on GCP using Airflow and BigQuery to ingest, validate, and transform 1M+ anonymized medical records and visualized in Looker Studio."

airflow-dags bigquery elt-pipeline gcp looker-studio python

Last synced: 13 Aug 2025

https://github.com/armahdavi/bigdata_pyspark_sales_analytics

Summarizing my big data code in python pyspark to analyze sales data with retail and walmart superstore to draw sales insights

big-data bigquery clustering dataframe hadoop k-means machine-learning pyspark pyspark-ml python spark unsupervised-learning

Last synced: 12 Apr 2026

https://github.com/byn227/accidents_in_france

Ce projet est un pipeline d’analytique DBT (Data Build Tool) pour BigQuery.

bigquery dbt sql

Last synced: 13 Aug 2025

https://github.com/pittica/google-bigquery-helpers

Helpers for Google Cloud BigQuery.

bigquery gcp google-cloud-platform pittica

Last synced: 06 Jan 2026

https://github.com/andrii04/andreamonforte-bi-assignment

Automated Data Pipeline that ingests daily GA4-formatted CSV files from a private Google Cloud Storage bucket, validates and loads them into BigQuery, and prepares analysis-ready views. The solution is built for deployment as a Cloud Function triggered by Cloud Scheduler and uses Python with the Google Cloud Storage and BigQuery client libraries.

automation bigquery cloud cloudfunctions data data-analysis data-engineering etl etlpipeline gcp google googlecloudplatform pipeline python sql

Last synced: 09 Nov 2025

https://github.com/hrialan/dataform-prune

An open-source tool for automating the cleanup of outdated objects in Dataform configurations, optimizing data workflows with seamless CI/CD integration.

automation bigquery data-analytics dataform

Last synced: 09 Mar 2026

https://github.com/olahsymbo/etl-gcs-postgres-bigquery

ETL Pipeline (postgres, bigquery, csv, json, google storage)

bigquery data-pipeline etl flask google-cloud-scheduler postgresql python

Last synced: 20 Apr 2026

https://github.com/diabahmed/london-bicycle-analysis

Apache Beam pipeline for analyzing London bicycle sharing data using Google Cloud Dataflow and BigQuery.

apache-beam bigquery data-engineering etl-pipeline google-cloud-platform python

Last synced: 13 Aug 2025

https://github.com/squidmin/bigquery-labs

GCP BigQuery CLI

bigquery gcp java

Last synced: 12 Jun 2025

https://github.com/valeqm/data-engineer-zoomcamp-homework

Data Engineering Zoomcamp 2025 Homework Repository. Contains assignments on containerization, workflow orchestration, cloud, data warehouses, analytics engineering, batch processing, and streaming.

bigquery dbt docker etl-pipeline gcp google-cloud-platform kafka kestra python spark sql terraform

Last synced: 12 Apr 2026

https://github.com/amitkumarj441/mysql2bigquery

A script to load a MySQL table in BigQuery. Extracts schema and data as JSON.

bigquery docker mysql scala

Last synced: 09 Apr 2026

https://github.com/sahilmb/employee-churn-da

A data analysis project on employee churn rate using Google Bigquery, Looker, Pycaret and Colab

bigquery looker-studio pycaret

Last synced: 09 Apr 2025

https://github.com/samanthalang/samanthalang_portfolio

Une data analyste avec la vision d'une consommatrice et la stratégie d'une marketeuse.

bigquery excel figma mysql notebook numpy pandas postgresql powerbi powerquery python sql sqlite wordpress

Last synced: 09 Apr 2026

https://github.com/RobinNoiret/internship-zendesk_reporting_migration

This project involves developing a Python script to import csv export from Zendesk to BigQuery. It is not intended for recurring use, but to enable an initial dump of historical data.

bigquery connector export-csvfile json zendesk

Last synced: 03 Oct 2025

https://github.com/sayed-ashfaq/target-sql

In this project, I analyzed Target company's data using SQL in BigQuery, focusing on data extraction, manipulation, and performing various analytical queries to derive insights.

aggregation bigquery cte joins sql

Last synced: 09 Apr 2025

https://github.com/borfergi/stock-market-data-pipeline

A fully serverless data pipeline that prepares stock market data from your selected companies using GCS, PySpark, BigQuery, Composer (Airflow), and Terraform.

airflow bigquery composer data-pipeline dataproc gcs polygon-api pyspark terraform

Last synced: 09 Apr 2026

https://github.com/space-lumps/oroboro-dw-dbt

This project builds a clean, BI‑ready table user_base in BigQuery

analytics-engineering bigquery business-intelligence ci-cd data-mart data-modeling dbt elt gcp geospatial jinja metabase sql yaml

Last synced: 04 Sep 2025

https://github.com/misaober/datove_inzenyrstvi_projekt

Kurz Datové inzenýrství v praxi (Czechitas, 36 hod) - vytvoření vlastního projektu na reálných datech obsahující skripty pro vytvoření vrstev L1, L2, L3, datový model a design architektury projektu.

bigquery python sql

Last synced: 14 May 2026

https://github.com/kevin-rsj/real-estate-investments

Sistema de scoring que clasifica ciudades francesas para inversión en segundas viviendas según perfil de riesgo(alto, moderado y bajo). Evalúa ratios clave en áreas como demanda, disponibilidad, infraestructura, demografía y precios.

bigquery data-analytics looker-studio numpy pandas python sklearn-library sql visualization

Last synced: 09 Apr 2026

https://github.com/thanhloc81/customer-segmentation

✨ Analyze customer segments of Adventure World dataset

bigquery google-cloud powerbi sql

Last synced: 03 Feb 2026

https://github.com/allanreda/video-processing-and-categorization

Video processing and categorization using computer vision, machine learning and cloud computing

bigquery cloud-storage-bucket cnn computer-vision google-cloud kmeans-clustering machine-learning opencv2 tensorflow virtual-machine

Last synced: 13 Apr 2026

https://github.com/zoroken123/gcp-mcp-server

GCP MCP Server provides a streamlined solution for managing cloud resources efficiently. Join the community and contribute to its growth! 🌟👩💻

app-platform bigquery caprover cloud cobra-cli cursor-ai digitalocean docker droplet genai infrastructure mcp mcp-server model-context-protocol multi-language nodejs python vertex-ai

Last synced: 20 Jan 2026

https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery

Google Apps Script to streamline loading CSV data from Google Cloud Storage (GCS) into BigQuery.

bigquery csv-import google-apps-script google-cloud-storage

Last synced: 12 Apr 2026

https://github.com/yasarsultan/olist_datawarehouse

An end-to-end data pipeline that extracts data, processes it, and then loads it into the BigQuery data warehouse.

airflow bigquery data-warehouse docker

Last synced: 03 Jan 2026

https://github.com/henrikwarf/bq-health-optimizer-analyzer

Agentic Application - BigQuery Compass - Health and Optimization Analyzer

adk agents bigquery google-cloud

Last synced: 13 Jul 2025

https://github.com/celiason/coffee-funnel

webpage for visualizing sales projections of a small coffee business

bigquery prophet sales-analysis streamlit-webapp

Last synced: 12 Apr 2026

https://github.com/MatheusOtenio/cloud-bootcamp-cloud-mart

Bootcamp focado em Cloud Computing, DevOps e IA, abordando Infraestrutura como Código (IaC), CI/CD, Kubernetes, computação serverless e análise de dados, proporcionando um ambiente otimizado e escalável.

aws aws-ec2 aws-lambda azure bigquery cicd cloud docker dynamodb inteligencia-artificial kubernetes terraform

Last synced: 02 Apr 2026

https://github.com/quipper/send-ci-result-to-bigquery-action

Send test results to BigQuery in GitHub Actions

bigquery github-actions google-bigquery junit-xml

Last synced: 01 May 2026

https://github.com/vanducng/miu-db

A headless database CLI for humans and agents.

bigquery database mysql postgresql snowflake sql sqlite terminal textual tui

Last synced: 08 Jun 2026

https://github.com/jiahuaxue/data-analysis

Data analysis portfolio showcasing E-commerce, IMDB, and Airbnb projects with Python and SQL.

bigquery data-analysis-project jupyter-notebook portfolio python sql

Last synced: 07 May 2026

https://github.com/augo-amos/dbt-bigquery-analytics

A modern data analytics platform built on Google BigQuery that transforms raw e-commerce data into actionable business intelligence.

analytics bigquery dbt gcp

Last synced: 23 May 2026

https://github.com/karencofre/riesgorelativo-lookerstudio

proyecto de análisis de datos y análisis perdicitvo en looker studio y google colab

bigquery data-analysis data-science machine-learning matplotlib python sklearn sql

Last synced: 03 Jan 2026

BigQuery Awesome Lists
BigQuery Categories