An open API service indexing awesome lists of open source software.

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/vaxdata22/city-weather-and-s3file-rds-s3-bigquery-etl-by-airflow-on-ec2

This is my third AWS Cloud ETL project. This data pipeline orchestration uses Apache Airflow on AWS EC2. It demonstrates how to build an ETL data pipeline that would perform data extraction to a database in parallel to a loading process into the same database, join the tables, copy joined data to S3 and finally copy the S3 file to BigQuery DW.

apache-airflow aws-ec2 aws-rds-postgres aws-s3 bigquery business-intelligence dags data-warehousing etl-pipeline openweathermap-api orchestration python3 sql

Last synced: 21 May 2026

https://github.com/night-fury-me/real-time-vehicle-data-processing

A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.

bigquery cpp data-engineering data-streaming flink grpc kafka python real-time-data-processing

Last synced: 02 Jan 2026

https://github.com/azapeti/bigquery-python-bash-automation

Since you're using the free version, you can only get data from your website through the Google Analytics API for the last 60 days. I would like to demonstrate in this repository how to run BigQuery queries in Python and automate it using bash and crontab for collecting historical data.

analytics automation bash bigquery cronjob crontab ga4 python python3

Last synced: 02 Jan 2026

https://github.com/jakwakwa/risk-management-system

modern technologies such as machine learning and AI for our internal risk team. These tools are expected to streamline operations, quickly highlight anomalies, and support more informed decision-making

bigquery bun ml nexjs16 rag react risk-analysis shadcn-ui temporal typescript vertex-ai

Last synced: 22 Jan 2026

https://github.com/nlgtuankiet/bq-noti

BigQuery notification

bigquery bq notification notifier

Last synced: 02 Jan 2026

https://github.com/sintef/bigquery-postgresql-wire-proxy

A PostgreSQL wire protocol proxy server for BigQuery.

bigquery postgresql proxy

Last synced: 05 May 2026

https://github.com/shvetsihorr/sql-projects

SQL and Google BigQuery-Portfolio Projects

azuredatastudio bigquery mssql postgresql sql

Last synced: 15 Mar 2026

https://github.com/abdullahasghar/sql

The repo includes all projects and assessments I have completed with SQL. IDE/s used: MS SQL Server, Google Big Query.

bigquery mssqlserver sql

Last synced: 15 Mar 2026

https://github.com/kartikeya443/automated-data-pipeline-gcp

This project showcases the integration of various Google Cloud Platform services to build an efficient and automated data pipeline for sales data.

bigquery cloud data-engineering flask gcp google-cloud-platform looker-studio pipeline python sql

Last synced: 03 Feb 2026

https://github.com/swatisinghit/e-commerce-trend-analysis-for-target

An exploratory and in-depth study of the E-Commerce sales data for a Brazilian store using SQL.

bigquery data-analysis mysql sql

Last synced: 19 May 2026

https://github.com/chukwuemekaaham/ny_taxi_rides

Analytics engineering using Dbt and Google Cloud BigQuery

analytics-engineering bigquery dbt github

Last synced: 19 May 2026

https://github.com/goosethedev/de-zoomcamp-2025

Homeworks for the DataTalksClub's Data Engineering Zoomcamp 2025.

bigquery data-engineering kestra python terraform

Last synced: 29 Apr 2026

https://github.com/knands42/data-ingestion

Data Ingestion project to evaluate my Kotlin skill using concurrency

bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow

Last synced: 23 Jan 2026

https://github.com/cyber-programmer/web-traffic-analytics-ml-model

This Jupyter Notebook focuses on classifying website visitors using logistic regression. The project leverages Google Analytics sample data and BigQuery for data analysis and feature engineering. It provides a comprehensive workflow that includes data import, preprocessing, exploratory data analysis.

bigquery logistic-regression machine-learning

Last synced: 06 Feb 2026

https://github.com/jtwebman/bigquery-local

Node.js + DuckDB local emulator for the Google BigQuery REST API. Drop-in for testing, CI, and local dev — with working PATCH.

bigquery duckdb emulator local-development nodejs sql testing typescript

Last synced: 19 May 2026

https://github.com/epomatti/gcp-bigquery

Data sync via CDC from GCP Cloud SQL to Big Query using Datastream

bigquery cloud-sql datastream gcp

Last synced: 01 Jun 2026

https://github.com/trowdan/missing-finder

Homeward: Missing persons finder. Accelerate the search for missing persons by analyzing video footage and intelligently linking potential sightings to registered cases

bigquery cloud gemini google hackaton

Last synced: 19 May 2026

https://github.com/theng23/e-commerce-website-performance-analysis-sql

Using Bigquery base on Google Analytics dataset to analyze E-commerce Website

bigquery sql

Last synced: 24 Oct 2025

https://github.com/xennis/particulate-matter-sensor-storage

Store the particulate matter data from a luftdaten.info sensor in BigQuery

bigquery cloud-function luftdaten particulate-matter sensor-data

Last synced: 12 May 2025

https://github.com/manesioz/airflow-without-code

Dynamically generate DAGs to ingest SQL files into BigQuery with one line of "code"

airflow airflow-plugin bigquery python sql

Last synced: 18 Apr 2026

https://github.com/jasontanx/terraform-practice

Creating datasets and tables in Google BigQuery via Terraform

bigquery iac-terraform infrastructure-as-code terraform

Last synced: 18 May 2026

https://github.com/khangtran85/user-behavior-analysis-for-ecommerce

A SQL project analyzing eCommerce data to uncover insights on traffic, customer behavior, and purchasing patterns. Covers key metrics like visits, transactions, and conversion rates, providing data-driven support for optimizing revenue and user experience.

bigquery ga-session google-analytics-sample publicdata sql

Last synced: 21 Feb 2026

https://github.com/larisanti/transaction-ml

This project demonstrates a sequence of BigQuery ML queries to build and evaluate a logistic regression model that predicts customer transactions based on website traffic data from Google Analytics.

bigquery machine-learning

Last synced: 11 May 2025

https://github.com/yohanesnuwara/bigquery-sodirchat

Chat interface to Sodir Norwegian oil database using Google BigQuery and Gemini

bigquery retrieval vector-search

Last synced: 18 May 2026

https://github.com/fabioba/sales-analytics

This is an exercises provided by ChatGPT about sales data.

airflow bigquery etl-pipeline googlecloudplatform googlecloudstorage

Last synced: 18 May 2026

https://github.com/andrewm4894/gcp-telemetry-example

Simple HTTP endpoint for telemetry data type events in GCP.

bigquery gcp-cloud-functions gcp-storage python terraform

Last synced: 05 May 2026

https://github.com/vbalalian/littlefield

Combined web-scraping, loading, and reporting tool for Littlefield simulation, built for use with Google Cloud Run functions and Google Cloud Scheduler

bigquery cloud-functions extraction google-cloud-platform littlefield-simulation-game loading python reporting sql webscraping

Last synced: 14 May 2026

https://github.com/mohamedkashifuddin/gcp-ecommerce-data-pipeline

An e-commerce data lakehouse implemented on Google Cloud Platform (GCP). This project features an end-to-end data pipeline, from raw data generation via Cloud Functions, layered processing with PySpark on Dataproc, to structured data warehousing in BigQuery. It's fully orchestrated by Apache Airflow, enabling analytics and BI with Metabase.

airflow bigquery cloud-functions data-pipeline dataproc ecommerce gcp metabase pyspark

Last synced: 18 May 2026

https://github.com/flowerinthenight/bqstream

A simple library to help facilitate streaming to BigQuery.

bigquery go golang streaming

Last synced: 18 May 2026

https://github.com/smohanta23/uber_data-engineering_etl-project

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api

Last synced: 01 Jan 2026

https://github.com/robinnoiret/internship-zendesk_reporting_migration

This project involves developing a Python script to import csv export from Zendesk to BigQuery. It is not intended for recurring use, but to enable an initial dump of historical data.

bigquery connector export-csvfile json zendesk

Last synced: 06 May 2026

https://github.com/vidyadnina/other-sql-projects-and-queries

Other SQL projects and queries.

bigquery mysql sql

Last synced: 06 Feb 2026

https://github.com/mysto-007/cyclistic-bike-share-analysis

Analyzed the dataset of Cyclistic Rental Service as the Capstone project for Google Data Analytics SpecializationAnalyzed the dataset of Cyclistic bike-share (Capstone project for Google Data Analytics Specialization)

bigquery data-analysis excel ms-sql-server sql tableau tableau-public

Last synced: 16 Mar 2026

https://github.com/tomgorb/some-data-monitoring

fully functional DAG using Airflow 2 and minikube (locally) to help monitor GCP billing

airflow2 bigquery gcp minikube

Last synced: 07 Apr 2026

https://github.com/patriciavalentine/loan-data-queries

In this project, I analyzed a vehicle loan dataset using BigQuery to identify demographic, financial, and loan patterns. Through SQL queries, I extracted insights such as the credit scores, and loan distribution by region, and explored high-risk profiles. The findings are visualized in Looker Studio, thus helping to inform strategic decisions.

asset-finance bigquery loan-data looker-studio

Last synced: 30 Oct 2025

https://github.com/batou9150/google-cortex-quickstart

Quick start with Google Cloud Cortex Framework

bigquery cloud cortex google salesforce sap

Last synced: 11 Sep 2025

https://github.com/sejalmankar1012/product_data_analyst_assessement

Analyzing the Impact of Business Hour Mismatch on Order Volume in the Food Delivery Industry: A Case Study of UEats and Ghub

assessment-project bigquery loop product-analyst sql-query

Last synced: 21 Mar 2025

https://github.com/makism/bq-ethereum

bq-ethereum

bigquery ethereum sql

Last synced: 06 May 2026

https://github.com/vigneshSs-07/Cloud-BigQuery-and-SQL---The-Interview-Guide

This deals with SQL commands, interview preparation and query questions and solutions in BigQuery

azuresql bigquery gcp sql sql-query sql-server sqlalchemy

Last synced: 09 May 2025

https://github.com/vigneshss-07/mastering-sql-and-bigquery-on-google-cloud-platform

Take your Data Analytics skills to the next level with this comprehensive playlist. Learn SQL from the basics to advanced techniques while mastering BigQuery on Google Cloud.

analytics bigquery gcp sql

Last synced: 21 Jun 2025

https://github.com/blandoncj/terraform-bigquery-gcp

Infrastructure as Code (IaC) for Google Cloud BigQuery using Terraform. Automates dataset and table provisioning with best practices for cloud resource management.

automation bigquery gcp terraform

Last synced: 17 May 2026

https://github.com/syedsajjadaskari/end-to-end-chicago-taxi-tip-prediction-with-bigquery-and-vertex-ai

An end-to-end example of Chicago taxi on Google Cloud using TensorFlow, TFX, and Vertex AI

bigquery gcp tensorflow tfx vertex-ai

Last synced: 06 May 2026

https://github.com/chdl17/nyc_green_taxis_peak_hour_analysis

This project analyzes GCP BigQuery data and uses Looker Studio to build a Peak Hour Analysis.

bigquery gcp google-cloud-platform looker-studio sql

Last synced: 06 Feb 2026

https://github.com/venugopal9578/youtube-trending-sql-analysis

SQL project analyzing YouTube trending videos in India using Google BigQuery. Includes ranking, aggregation, and channel performance analysis with visuals.

analytics bigquery dataanalysis freelance portfolio sql youtube

Last synced: 17 May 2026

https://github.com/suv05/brazilian-ecommerce-data-analysis

End-to-End Big Data Analytics on Google Cloud Platform

bigquery dataproc kaggle-dataset spark

Last synced: 15 Apr 2026

https://github.com/plishka/blockchain_analysis

Cryptocurrency On-Chain Analysis (Bitcoin Blockchain)

bigquery blockchain data-cleaning scraping-websites sql tableau

Last synced: 25 Feb 2026

https://github.com/lisabensoussan/bigdataminig_finalassignment

This repository contains solutions for the final assignment of the Big Data Mining course (52002/52019), focusing on querying large datasets with BigQuery, network analysis with Python, and distributed data processing with Apache Spark.

bigquery community-detection data-cleaning dataframe exploratory-data-analysis pagerank rdd sql text-analysis visualization

Last synced: 07 Feb 2026

https://github.com/hitthecodelabs/bigquery_ml

Jupyter notebooks that utilize Google BigQuery's machine learning capabilities.

bigquery notebooks python sql

Last synced: 15 Apr 2026

https://github.com/spacepatcher/google-workspace-gmail-collector

👁 App for collecting Gmail logs from your Google Workspace account and sending them to Kafka

bigquery gmail google-workspace security soc

Last synced: 31 Jan 2026

https://github.com/refrainit/zangetsu-data

PostgreSQL、BigQuery、Googleスプレッドシートへアクセスし、データを取得することをサポートするライブラリ

bigquery pip postgresql python spreadsheet zangetsu

Last synced: 06 May 2026

https://github.com/karencofre/marketing-segmentacion-en-powerbi

Proyecto prueba de hipótesis en powerbi y python

bigquery google-colab powerbi python sql statsmodels

Last synced: 31 Jan 2026

https://github.com/kahfisa/business-performance-analytics-kimia-farma

Dashboard Performance Analytics Kimia Farma

bigquery lookerstudio

Last synced: 10 Sep 2025

https://github.com/thanhloc81/sql-project-bicycles-practise

✨ Utilizing SQL to extract data following a simulated task involving the Sales and Product modules

adventureworks bicycle bigquery google-cloud sql

Last synced: 01 Feb 2026

https://github.com/hardik-agrl/youtube_trending_pipeline

The Project fetches trending YouTube videos using the YouTube Data API, stores the data in Google Cloud Storage, and loads it into a BigQuery table for analysis.

bigquery google-cloud-storage python python-dotenv youtube-api-v3

Last synced: 15 Apr 2026

https://github.com/cmmasaba/ms-ads-integration

Extract ads performance data from Microsoft Ads platform and store in BigQuery

bigquery microsoft-ads python

Last synced: 15 Apr 2026

https://github.com/vaishnavipaithane/bellabeat-data-analysis-case-study

This capstone project was done as a part of Google Data Analytics Professional Certificate course.

bigquery sql tableau

Last synced: 01 Feb 2026

https://github.com/minhajuddin2510/bigquery_alerts

In today’s data-driven world, organisations heavily rely on timely alerts to monitor critical systems and make informed decisions. However, when working with BigQuery, a popular cloud-based data warehouse, there is no built-in functionality to generate alerts. In this article, we will explore how I recently built a cloud function to address this

alerting bigquery cloudfunctions monitoring-tool slack

Last synced: 06 May 2026

https://github.com/lambdamusic/dimschema

CLI to retrieve SQL schema information about the Dimensions on Google BigQuery dataset.

bigquery dimensions python scholarly-metadata

Last synced: 12 May 2026

https://github.com/paty-oliveira/carris-data-pipeline

Repository for Extraction, Loading and Transformation of Carris data.

apache-airflow bigquery docker docker-compose elt-pipeline

Last synced: 25 Feb 2026

https://github.com/scraly/bigquery

Google BigQuery AaaS tools, tips and fun

bigquery java

Last synced: 17 May 2026

https://github.com/kajinmo/lightweight-etl-pipeline-to-gcp

An ETL pipeline that extracts data from multiple sources, masks sensitive information, and loads it into Google Storage and Google BigQuery. Designed for environments where Airflow is unavailable. It provides a no-frills, dependency-light way to define, schedule, and monitor ETL workflows using Python libraries.

bigquery etl gcp pipelines pydantic python storage

Last synced: 17 May 2026

https://github.com/jancervenka/bqcli

REPL for BigQuery

bigquery gcp python sql

Last synced: 08 Feb 2026

https://github.com/ekoepplin/dbt-bigquery-core

How to get data to BigQuery (or duckDB) and setup dbt tests for SODA cloud monitoring

bigquery data data-quality dbt dlt duckdb gcp soda

Last synced: 06 May 2026

https://github.com/bleakmego/wrenai

WrenAI is an open-source GenBI agent designed for seamless integration and powerful performance. Explore the code on GitHub! 🐙🌟

agent anthropic bedrock bigquery business-intelligence charts duckdb genbi llm openai postgresql rag sql sqlai text-to-chart text-to-sql text2sql vertex

Last synced: 07 Apr 2026

https://github.com/kina2711/datapipeline_omnichanneltobigquery

A data pipeline that fetches, normalizes and sorts Caresoft omnichannel data, then loads it into BigQuery—with Kafka-based schema validation coming soon.

bigquery etl pipeline python

Last synced: 19 May 2026

https://github.com/shakeel-data/amazon-sales-forecasting-python-bigquery-ml

An end-to-end analytics project using Python, SQL, & ML to forecast Amazon sales and segment customers. We build predictive models (LightGBM, Prophet) and clustering (KMeans) to deliver actionable insights for revenue growth and targeted marketing.

bigquery kmeans-clustring lightgbm linear-regression prophet-facebook scikit-learn

Last synced: 09 May 2026

https://github.com/jmfeck/bigquery-local-framework

This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.

bigquery data-engineering ingestion pandas python

Last synced: 06 May 2026

https://github.com/themihirmathur/uber-data-analytics

The goal of this project is to perform comprehensive data analytics on Uber trip data using a modern data engineering stack on Google Cloud Platform (GCP).

bigquery data-analysis data-engineering etl-pipeline google-cloud-platform looker python

Last synced: 09 Feb 2026

https://github.com/aisurjyasamantaray/-optimizing-target-s-brazilian-operations-insights-from-order-processing-pricing-and-payment-trends-

This project offers an in-depth analysis of consumer behavior, logistical performance, and payment preferences within the e-commerce sector. By examining order costs, delivery times, and payment methods, businesses can uncover valuable insights into operational efficiency and customer preferences.

bigquery consumer-insights data-analysis database sql target

Last synced: 26 Feb 2026

https://github.com/fahmiaziz98/sql_agent

build sql agent using different pattern rag/self-correction/optimization

agent bigquery langchain sql sql-agent sqlite toolkit

Last synced: 09 Feb 2026

https://github.com/peippo1/gcp-datawarehouse-terraform

Infrastructure-as-Code (IaC) project that provisions a foundational data warehouse environment on Google Cloud Platform using Terraform. Includes a BigQuery dataset and a Cloud Storage bucket, ready for integration with analytics tools like Dataiku or custom ETL pipelines.

bigquery data-engineering devops gcp infrastructure-as-code terraform

Last synced: 16 Apr 2026

https://github.com/nealwp/blobview

Generate BigQuery SQL views from JSON

bigquery cli json sql

Last synced: 27 Feb 2026

https://github.com/miliar/database-adapters

Prototype for easy data transfer between MySql, Bigquery and CSV

adapter bigquery csv datapipeline mysql unittest

Last synced: 06 May 2026

https://github.com/vigneshss-07/cloud-bigquery-and-sql---the-interview-guide

This deals with SQL commands, interview preparation and query questions and solutions in BigQuery

azuresql bigquery gcp sql sql-query sql-server sqlalchemy

Last synced: 27 Feb 2026

https://github.com/jhermienpaul/google-data-analytics-program

Hands-on learning materials from the 8-course Google Data Analytics Professional Certificate program, covering foundational data skills, tools, and real-world business problem-solving

bigquery dashboard data-analysis data-analytics data-modeling data-storytelling data-visualization data-wrangling descriptive-analytics diagnostic-analytics etl-pipeline r-programming rstudio sql tableau

Last synced: 13 Jul 2025

https://github.com/oliveroneill/wilt-cloud-functions

Wilt Google Cloud Functions

bigquery google-cloud-functions

Last synced: 12 May 2026

https://github.com/myktorijus/retention-cohort

Extracted cohort data using SQL in BigQuery focusing on weekly retention from week 0 to week 6

bigquery data-analysis data-visualization powerbi sql

Last synced: 13 Jul 2025

https://github.com/ivanildobarauna/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 27 Feb 2026

https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance

Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.

bigquery data-analysis data-cleaning data-visualization excel rstudio sql tableau tidyverse

Last synced: 12 Feb 2026

https://github.com/jasontanx/ridership-headline-project

This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.

bigquery data-engineering minio-server public-transport-ridership terraform

Last synced: 09 May 2026

https://github.com/vbalalian/three-gits

Group analytics project for a predictive analytics course. Using the Yelp open dataset to predict restaurant success.

bigquery dbt predictive-analytics python regression sentiment-analysis sklearn sql vader-sentiment-analysis yelp-dataset

Last synced: 13 May 2026

https://github.com/olahsymbo/analytics-dashboard-service

data analytics dashboard based on python Plotly-Dash library https://plotly.com/dash/

bigquery css dash flask flask-sqlalchemy gunicorn html plotly postgres python

Last synced: 05 Apr 2026

https://github.com/valenthr/purchase_funnel

Google merch store sales analysis

bigquery product-analysis sql

Last synced: 21 Jun 2025

https://github.com/lorinczakos/sql-projects

This is a collection of my SQL scripts that I wrote and were approved through my course with GoIT Romania Data Analyst course

bigquery cte data data-analysis dbeaver marketing-analytics postgresql project-repository sql vscode

Last synced: 16 May 2026

https://github.com/ket0825/v1-gcp-preview

Manage GCP src for preview services / Preview 서비스를 위한 GCP 레포

bigquery cloud-functions cloud-run cloudbuild gcp gcp-batch gcs logging pubsub

Last synced: 13 Feb 2026

https://github.com/rohit196/sql-learning-hub

A comprehensive collection of SQL resources, projects, tutorials, and interview preparation materials

bigquery datawarehouse learning-resources nosql sql sql-projects

Last synced: 17 Jun 2025

https://github.com/kaanevranportfolio/kafka_spark_bigquery_newsstream

Create VMs using Terraform, Install Kafka & Zookeeper on VM using Ansible (GCP)

ansible ansible-playbook ansible-role bigquery bigquery-table gcp kafka python terraform

Last synced: 13 Apr 2025

https://github.com/jey-37/nginx-pipeline

The Apache Beam program which reads nginx access logs from Google Cloud Pub/Sub, parses them, and saves into BigQuery.

apache-beam bigquery dataflow gcp-pubsub

Last synced: 16 May 2026

https://github.com/janmin123/cyclistic

Capstone project for Google/Coursera Data Analytics Course

analysis bigquery sql tableau visualization

Last synced: 10 Jul 2025

https://github.com/ansh-info/databridge

End-to-end financial data pipeline unifying real-time and batch ingestion with PySpark ETL, BigQuery storage, DBT modeling, Kafka streaming, and Airflow/Docker orchestration.

airflow apache-spark bash big-data bigquery dbt docker docker-compose etl etl-pipeline gcp google kafka kafka-consumer kubernetes orchestration pyspark python3 real-time stock

Last synced: 28 Feb 2026

BigQuery Awesome Lists
BigQuery Categories