An open API service indexing awesome lists of open source software.

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/govau/warcraider

Convert WARC files into Avro for big data processing

avro bigquery crawler rust warc

Last synced: 16 May 2026

https://github.com/alessine/zurich_air_quality

End-to-end Data Pipeline built as the Final Project of the Data Engineering Zoomcamp

bigquery data-engineering dataform docker google-cloud-platform kestra looker-studio

Last synced: 01 Mar 2026

https://github.com/istinnew/cook-me-up

[In Progress] Welcome to Cook-Me-Up! This project aims to analyze and organize cooking recipes using data analysis (Python, BigQuery SQL, Looker Studio etc.) and machine learning techniques. The goal is to simplify meal preparation and offer users a comprehensive database of culinary delights.

bigquery clustering cookme culinary data data-science dataanalysis datavisualization looker-studio machine-learning python recipe-search recipes unsupervised-learning

Last synced: 16 May 2026

https://github.com/alessine/data-engineering-zoomcamp

Materials from the Data Engineering Zoomcamp 2025

bigquery data-engineering dbt docker kestra spark

Last synced: 16 Apr 2026

https://github.com/santiago-giordano/aws-gcp-pipeline

Simple pipeline, downloads csv from aws bucket, does some transformations, creates tables in gcp bq, loads data, and runs queries

aws bigquery etl gcp jupyter pipeline python

Last synced: 04 Mar 2026

https://github.com/izmian/google-business-intelligence_professionalcertificate

Included my course and project of Google Business Intelligence by Google on Coursera

bigquery business-intelligence datavisualization sql tableau

Last synced: 06 May 2025

https://github.com/novucs/local-bigquery

Run BigQuery locally. A BigQuery emulator for local testing and development.

bigquery duckdb emulator sqlglot testing

Last synced: 07 May 2026

https://github.com/paulveillard/cybersecurity-analytics

An ongoing collection of awesome software, libraries, learning tutorials, documents and books, technical resources and cool stuff about Analytics Engineering in Cybersecurity.

analytics bigdata bigquery cybernetics cybersecurity data data-engineering data-science encryption encryption-decryption seo seo-friendly seo-optimization

Last synced: 28 Mar 2025

https://github.com/anahimamani/gcp-sqlserver-to-bigquery-medallion

Pipeline de dados end-to-end na GCP com Python, BigQuery e Dataform, seguindo a arquitetura Medallion (Bronze, Silver e Gold) com cargas incrementais.

bigquery dataform etl gcp python

Last synced: 05 Mar 2026

https://github.com/IvanildoBarauna/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 12 Jul 2025

https://github.com/lumapps/tap-bigquery

Fork of z3z1ma/target-bigquery — Singer target for BigQuery supporting storage write, GCS, streaming, and batch load methods, built with the Meltano SDK.

bigquery meltano python singer

Last synced: 02 Apr 2026

https://github.com/marcosach/tp-infra

Este repositorio contiene todos los archivos que componen al trabajo práctico final de la materia Infraestructura para la Ciencia de Datos de la Licenciatura en Ciencia de Datos (UNSAM).

bigquery buckets datamart datawarehouse etl gcp gcs pipelines python sql

Last synced: 17 Apr 2026

https://github.com/ymyzk/bq-globalip

Record the current global IPv4 address to a BigQuery table.

bigquery golang

Last synced: 17 Apr 2026

https://github.com/nghiant3110/google_fiber_bi_5

This is a BI Capstone project based on the Google Fiber dataset from Google BI Course

bigquery google-sheets looker-studio sql

Last synced: 11 Apr 2025

https://github.com/lucashomuniz/project-22

[Dashboard] Data and Sustainability: Optimizing Green Flow's Fertilizer Portfolio

agrotech bigquery data-analytics data-structures data-visualization google-cloud-platform powerbi powerbi-visuals powerquery sql sustainability

Last synced: 20 Mar 2025

https://github.com/metlinskyi/englishdom

Filtered and evaluated EnglishDom.com teachers according to custom criteria.

bigquery google-sheets python webscraping

Last synced: 16 May 2026

https://github.com/fabiopapais/chat-your-data

Chat with your SQL database and make complex queries with natural language using LLMs

bigquery chainlit langchain llm

Last synced: 17 Apr 2026

https://github.com/lawal-hash/olistelt

An end-to-end ELT data pipeline of the Brazilian olist e-commerce dataset using the modern data stack

airflow bigquery dbt dbt-core docker postgresql sql

Last synced: 17 Feb 2026

https://github.com/g-schumacher44/analyst_resource_hub

A collection of guidebooks, quickref, and resources for data analysis

analytics bigquery data lookerstudio machine-learning model python sql yaml-configuration

Last synced: 20 Jun 2026

https://github.com/gabrieladados/people-analytics

People Analytics: Insights para Retenção de Talentos

bigquery figma people-analytics sql tableau

Last synced: 17 Apr 2026

https://github.com/khanovico/energy-data-analysis

This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.

big-data-processing bigquery docker google-app-engine jupyter-notebook mlflow python scikit-learn seaborn xgboost

Last synced: 17 Feb 2026

https://github.com/chaaalistaa/thelookecommerce---project

Analysis "TheLook" eCommerce with highlight goals such as identifying sales trends, understanding customer behaviors, enhancing customer retention, and driving repeat purchases.

big-data-analytics bigquery data-analytics data-visualization looker-studio sql

Last synced: 17 Apr 2026

https://github.com/ajaxbarcelonacruyff/ec_demo

Generate EC demo data / ECサイト用のサンプルデータを生成

bigquery ecommerce google-analytics-4

Last synced: 04 Apr 2026

https://github.com/wan-huiyan/gcp-dataform-rest-api-deploy

Claude Code skill: Deploy .sqlx files to Google Cloud Dataform via REST API — full lifecycle with gotcha documentation

automation bigquery ci-cd claude-code claude-code-skill cloud-workflows dataform gcp google-cloud sql

Last synced: 04 Apr 2026

https://github.com/digitaloptimizationgroup/digitaloptgroup-r-notebooks

A collection of R notebooks to analyze data from the Digital Optimization Group Platform

ab-testing bigquery jupyter-notebook performance-analysis r web-analytics

Last synced: 07 May 2026

https://github.com/tuanai-vireox/dataform-utils

Bigquery Dataform Javascript Utils Package - Support Ads, Query Common, ...

bigquery dataform datawarehouse

Last synced: 19 Apr 2026

https://github.com/alex-nettekoven/gcp-kafka-spark-bigquery-etl-newstream

Terraform-Deployed ETL Pipeline (Python → Kafka → Spark → BigQuery) for NewsAPI Data

bigquery etl-pipeline kafka newsapi python spark

Last synced: 08 May 2026

https://github.com/coatless/bigquery-reddit-ask-your-advisor

Analysis code that counts instances of a phrase on Reddit (e.g. "ask your advisor")

ask-your-advisor bigquery r reddit

Last synced: 20 Apr 2026

https://github.com/vaxdata22/city-weather-and-s3file-rds-s3-bigquery-by-airflow-on-ec2

This is my third industry-level ETL project. This data pipeline orchestration uses Apache Airflow on AWS EC2. It demonstrates how to build an ETL data pipeline that would perform data extraction to a database in parallel to a loading process into the same database, join the tables, copy joined data to S3 and finally copy the S3 file to BigQuery DW.

apache-airflow aws-ec2 aws-rds-postgres aws-s3 bigquery business-intelligence dags data-warehousing etl-pipeline openweathermap-api orchestration python3 sql

Last synced: 18 Mar 2025

https://github.com/garbetjie/monolog-bigquery-handler

A simple Monolog handler for writing to BigQuery.

bigquery logging monolog monolog-handler

Last synced: 20 Apr 2026

https://github.com/zaynabbug/end-to-end-fitness-data-pipeline-on-gcp

A cloud-native data pipeline that ingests, processes, and visualizes real-time and batch fitness data. Built with Pub/Sub, Airflow, BigQuery, dbt, Looker Studio, Terraform, and Docker to automate data workflows and provide actionable insights.

airflow bigquery dbt docker gcp looker-studio pubsub terraform

Last synced: 05 May 2026

https://github.com/rohithay/sql-bench

An elegant CLI toolkit for validating BigQuery queries, comparing schemas, and estimating costs before pushing code to production.

bigquery cli cloud-tools data-engineering developer-tools google-cloud query-validator schema-diff sql sql-lint

Last synced: 22 Apr 2026

https://github.com/tranngoca5039/bigquery-a5y

📊 Streamline your data analysis with bigquery-a5y, a powerful tool for optimizing BigQuery performance and improving query efficiency.

analytics api big-data bigquery cloud-computing data-analysis data-integration data-management data-pipeline data-visualization data-warehouse google-cloud machine-learning serverless sql

Last synced: 05 Jun 2026

https://github.com/edisedis777/bigquery-cost-optimization

GitHub repository showcasing strategies to optimize Google BigQuery (GBQ) costs when dealing with raw data dumps.

bigquery cost gbq google googlebigquery

Last synced: 24 Apr 2026

https://github.com/tosh2230/bigquery-table-history

Diff daily changes by BigQuery INFORMATION_SCHEMA.PARTITIONS records.

bigquery

Last synced: 24 Apr 2026

https://github.com/raqssoriano/hha504_assignment_nosql_dbs

This task is part of my assignment focused on creating and configuring databases in different platforms, such as GCP's BigQuery, MongoDB Atlas, and Redis Cloud.

bigquery mongodb-atlas mongodbcompass redis redisinsight

Last synced: 24 Apr 2026

https://github.com/tosh2230/cdc-rds-bq

Change data capture from Amazon RDS to Google BigQuery

bigquery changedatacapture rds

Last synced: 24 Apr 2026

https://github.com/ackeecz/terraform-gcp-cloud-run_pubsub_to_bq

Cloud Run subscribes itself to given topic and inserts each message to BigQuery table.

bigquery gcp pubsub terraform

Last synced: 24 Apr 2026

https://github.com/push-protocol/push-google-bigquery

The Power of Web3 Big Data: A Guide to Using Google BigQuery and Push Protocol for Data Communication and Analysis

bigquery data push push-notifications web3

Last synced: 26 Mar 2025

https://github.com/naustica/semantic_scholar_bq

Repository containing scripts for importing Semantic Scholar snapshots into BigQuery

bigquery python scholarly-metadata semantic-scholar

Last synced: 06 Jun 2026

https://github.com/stoqey/rasputia

Rasputia Latimore - The Big Data Bitch 💋

bigquery

Last synced: 15 May 2026

https://github.com/antbit96/dataform_poc

Template for basic data preparation

bigquery bigquery-dataform data-preparation

Last synced: 26 Apr 2026

https://github.com/nszoni/dbtgen

dbt: write nothing, generate (almost) everything.

analytics bigquery dbt documentation generative-ai github tooling

Last synced: 08 May 2026

https://github.com/richardbnk/data_tools

Python Library to Accelerate Creation of Data ETL Processes on multiple database systems.

bigquery etl gcp sql

Last synced: 27 Apr 2026

https://github.com/aviadklein/fluq

fluq provides a set of utilities and an intuitive API for constructing SQL queries programmatically in a python way, making it easier to build, read and maintain complex SQL statements

bigquery package python python3 sql

Last synced: 15 May 2026

https://github.com/seahrh/nyc-taxi-trips

REST API for the New York City Taxi Trips public dataset, implemented in Scala and Play Framework 2.7

bigquery nyc-taxi-dataset play-framework rest-api scala

Last synced: 14 May 2026

https://github.com/kellyjadams/ap-exam-scores

Analyzing AP exam scores for a school.

bigquery sql

Last synced: 07 Jun 2026

https://github.com/logicoffee/dbt-bigquery-extras

An experimental dbt package for BigQuery offering several materializations.

bigquery dbt dbt-packages

Last synced: 13 Jun 2026

https://github.com/pompierninja/scio-demo

Playing w/ Scio

apache-beam bigquery scio

Last synced: 15 May 2026

https://github.com/marielachirinosr/nyc-taxi-trip-exploration-2019-2020

Explores passenger behavior & impact of COVID-19 on NYC taxi industry (Q1 2019-2020).

bigquery data data-analysis data-visualization python sql tableau

Last synced: 15 Jun 2026

https://github.com/rrmcguinness/protoc-gen-bq-schema

A protocol buffer compiler (protoc) plugin for generating Google BigQuery JSON table definitions.

bigquery bigquery-schema protobuf

Last synced: 01 May 2026

https://github.com/shaundann/autosight

AutoSight is an AI-powered multi-agent data analysis pipeline built on Google Cloud. From ingesting raw CSVs to generating visualizations and natural language summaries — all results are displayed live in a Streamlit dashboard.

ai-agents automated-data-analysis bigquery data-pipeline gcp google-cloud llm multi-agent-systems python streamlit vertex-ai

Last synced: 09 May 2026

https://github.com/marceloneppel/map-to-bigquery-structs

Tool to convert a Golang map to a struct containing fields with types like bigquery.Null*.

bigquery golang map struct

Last synced: 28 Apr 2026

https://github.com/martinkalema/bigquery-pubsub

Loading data into BigQuery Table

bigquery data-engineering flat-file kafka

Last synced: 15 May 2026

https://github.com/wayanradit29/tutas-recommender

End-to-end student–tutor recommender system with synthetic data generation, preprocessing, feature engineering in BigQuery, and model deployment on Google Cloud.

ai bigquery cloud-computing data-engineering education google-cloud machine-learning python recommender-system vertex-ai

Last synced: 15 May 2026

https://github.com/adindasarianti/PBI_Rakamin_X_Kimia_Farma

This repository contains my project as a Big Data Analytics intern at Kimia Farma, where I analyzed the performance of Kimia Farma from 2020 to 2023

bigquery dataanalytics lookerstudio

Last synced: 07 Sep 2025

https://github.com/malbiruk/million-songs-pipeline

End-to-end batch pipeline joining audio features, lyrics, and genres from the Million Song Dataset

batch-processing bigquery data-engineering data-pipeline data-warehouse dataproc dbt dezoomcamp gcp million-song-dataset prefect pyspark streamlit terraform

Last synced: 08 Jun 2026

https://github.com/alessio-siciliano/bigquery-advanced-utils

BigQuery-advanced-utils is a lightweight utility library that extends the official Google BigQuery Python client. It simplifies tasks like query management, data processing, and automation. Aimed at developers and data scientists, the project is open to contributions to improve and enhance its functionality.

bigquery datatransfer google-cloud python

Last synced: 28 Apr 2026

https://github.com/mahdiik1/bigquery-retail-analysis

Uses Google BigQuery to query a sample retail dataset. Features example SQL queries and a Python script for large-scale data analysis. Great for learning GCP integration, basic analytics, and high-volume querying.

analytics bigquery gcp python

Last synced: 28 Apr 2026

https://github.com/rafal-kowalski-dev/selling-cars-analize

Hobby project for learning PySpark, AirFlow and BigQuery

airflow bigquery gcp pyspark python sqlalchemy

Last synced: 28 Apr 2026

https://github.com/allanreda/share-of-search-retrieval-and-visualization

Share of search analysis including data retrieval from Google Ads API, storing data in BigQuery and visualizing it in Looker Studio

bigquery google-ads-api looker-studio python share-of-search

Last synced: 28 Apr 2026

https://github.com/jmorl96/bitcoin-ecosystem-insights-data-engineering-project

Data engineering project that collects, processes, and visualizes Bitcoin blockchain and Kraken API data using Google Cloud Platform. Technologies: Terraform, Airflow, DBT, Looker Studio, BigQuery, Python, Cloud Run. The project ensures reproducibility and scalability with infrastructure as code and automated data pipelines.

airflow bigquery cloud-run dbt docker google-cloud-platform looker-studio python terraform

Last synced: 29 Apr 2026

https://github.com/fakhri098/project-sql-bigquery

This project aims to analyze taxi trip data with a focus on trip duration patterns, popular routes, and trip costs. The study was conducted to gain in-depth insights into taxi travel behavior based on historical data.

bigquery sql

Last synced: 08 Jun 2026

https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer

Mackerel plugin to post bigquery's query result

bigquery mackerel-plugin

Last synced: 10 Apr 2025

https://github.com/mikeghen/metadata

Pulls data from Socrata open data portals

bigquery python socrata

Last synced: 29 Apr 2026

https://github.com/prodriguezdefino/dataflow-cassandra-to-bigquery

Captures data from a Cassandra instance and sends it to BigQuery

bigquery cassandra dataflow

Last synced: 29 Apr 2026

https://github.com/alwayssany/bigquery-hackathon

A bigquery powered Smart Substitute Recommender that Suggest ideal product substitutes based on a deep understanding of product attributes, not just shared tags or categories.

bigquery bigquery-ai bigquery-ml google-cloud google-cloud-platform notebook-jupyter public-dataset python sql vector vector-search

Last synced: 29 Apr 2026

https://github.com/humairarizwan/uber-ride-dataengineering-analysis

This project creates a pipeline to process data and performs data analytics on Uber data.

bigquery dataanalysis dataengineering gcp-project googlestorage looker-studio

Last synced: 29 Apr 2026

https://github.com/sanjay-k08/python-for-gcp-interact-with-google-cloud-using-python

Python For GCP is a project aimed at simplifying the interaction with Google Cloud Platform (GCP) services using Python. This repository provides code examples and scripts that help you manage and automate various GCP resources such as BigQuery, Cloud Storage, BigTable, Compute Engine, and more entirely through Python.

bigdata bigquery cloudstorage computeengine data-pipelines devops gcp gcp-automation python-script terraform-alternative

Last synced: 29 Apr 2026

https://github.com/kaushik-puttaswamy/walmart-sales-data-ingestion-and-transformation-in-bigquery-using-airflow

An ETL pipeline that ingests Walmart sales data from Google Cloud Storage into BigQuery, automates table creation, and performs data transformation using SQL MERGE with Apache Airflow.

airflow-dags bigquery etl-pipeline gcs-bucket google-cloud-platform merge python sql transformation

Last synced: 29 Apr 2026

https://github.com/jorbriib/nodejs-bigquery-connect-api-rest

NodeJS service to connect to BigQuery through API REST

api api-rest bigquery javascript node node-js nodejs npm sql

Last synced: 29 Apr 2026

https://github.com/ruru-lyy/nyc-taxi-service-pipeline

In this project, I built a data pipeline using Mage.ai for ETL, GCP for storage, BigQuery for querying, and Looker Studio for analytics. This project helped me learn how to process, store, and visualize data effectively using modern tools.

bigquery data-engineering data-modeling etl-pipeline looker mage-ai python

Last synced: 29 Apr 2026

https://github.com/amirrezaskh/nyc-taxi-dashboard

A comprehensive data analytics platform that processes NYC taxi trip data from Google BigQuery and visualizes insights through an interactive React dashboard. Features real-time heatmaps, temporal analysis, and geographic intelligence across 263 NYC taxi zones.

bigquery dashboard data-analytics data-science data-visualization geospatial leaflet material-ui nyc-taxi plotly react typescript

Last synced: 29 Apr 2026

https://github.com/machinelearningzuu/data-engineering-projects

This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.

airflow bigquery data-engineering data-science data-visualization data-warehouse

Last synced: 30 Apr 2026

https://github.com/vaibhavs10/ml-on-gcp

The repository walks through a Data Scientist focused way of building and deploying Machine Learning models on Google Cloud

aiplatform bigquery googlecloudplatform ml

Last synced: 30 Apr 2026

https://github.com/gayatri1505/real-time-stock-market-data-pipeline-with-google-cloud-platform

This project builds a complete real-time stock market data pipeline on Google Cloud Platform. It ingests intraday stock prices from the Alpha Vantage API, stores and transforms the data using Cloud Functions, Pub/Sub, Cloud Storage, and BigQuery, and performs rich SQL-based analytics to uncover trading patterns, price movements, & volume anomalies.

bigquery data-engineering google-cloud-platform python3 real-time sql time-series visio

Last synced: 30 Apr 2026

https://github.com/crudek-data/bigquery-kaggle-apis

kaggle api to download free datasets along with google bigquery api to read/write from cloud data warehouse

bigquery data-engineering kaggle

Last synced: 10 May 2026

https://github.com/brpy/nyc-trips

Data engineering | Zoomcamp journey on nyc trip data with gcp stack

bigquery dbt gcp pyspark

Last synced: 30 Apr 2026

https://github.com/fsistemas/bigquery-td

ETL to extract data from mysql load and merge in BigQuery

bigquery etl mysql python sql

Last synced: 30 Apr 2026

https://github.com/nkurata/dbt-tutorial

A repository for learning and experimenting with dbt (Data Build Tool) through guided tutorials and practical exercises.

bigquery database dbt dbt-core sql

Last synced: 30 Apr 2026

https://github.com/yiu31802/gcp-project

GCP AppEngine project of Twitter data and some sample code

appengine bigquery gcp google-bigquery google-cloud google-datastore resas twitter twitter-data twitter4j

Last synced: 30 Apr 2026

https://github.com/kaushik-puttaswamy/train-ticket-booking-customer-data-ingestion-via-pub-sub-stream-dataflow-and-bigquery-with-looker

This project demonstrates real-time train ticket booking customer data ingestion and transformation using Pub/Sub, Dataflow, BigQuery, and visualization with Looker. It enables efficient data processing, storage, and analysis for customer insights.

bigquery dataflow etl gcp looker pubsub real-time-analytics

Last synced: 30 Apr 2026

https://github.com/nph1508/exploring_sales_and_product_performance_in_bicycle_manufacturing_sql

Analyzed sales and product performance data from a bicycle manufacturing company using SQL in BigQuery. Focused on identifying trends in product categories, revenue distribution, and monthly performance to guide production and inventory planning.

analysis bigquery sql

Last synced: 08 Jun 2026

https://github.com/justinjsd/analytics-engineering

📊 A repository focusing on analytics engineering, particularly using dbt on the Northwind Sample dataset

bigquery dbt sql

Last synced: 13 Jun 2026

https://github.com/anyesh/gbq-helpers

GBQ related helper functions and snippets.

bigquery google

Last synced: 01 May 2026

https://github.com/ddzikri/analisis-data-kimia-farma

Project Based Internship Kimia Farma Rakamin Academy

bigquery dataset sql

Last synced: 18 Mar 2025

https://github.com/nikhilsree5/targetcasestudy

An exploratory and in-depth study of the e-commerce market in Brazil.

bigquery eda sql visualization

Last synced: 15 Mar 2025

https://github.com/yu-iskw/bigquery-lineage

Visualize BigQuery data lineage graph

bigquery data-governance data-management visualization

Last synced: 01 May 2026

https://github.com/nph1508/sql_for_ecommerce_analyzing_sales_customer_behavior_in_bigquery

Designed and executed complex SQL queries on an ecommerce dataset using Google BigQuery to uncover customer behavior patterns, sales performance, and category-level insights. Focused on extracting business value through data exploration and aggregation techniques.

analysis bigquery sql

Last synced: 01 Jul 2025

https://github.com/robertofernandezmartinez/logistics-fleet-dbt

🏗️ Modern Analytics Engineering project using dbt and BigQuery to model fleet operations. Implementing a Medallion Architecture, it transforms raw GPS data into a reliable Star Schema. Focuses on resolving data quality issues like sensor noise and duplicates through automated testing and CI/CD to ensure production-grade reporting.

analytics-engineering bigquery data-engineering data-modeling data-pipeline data-quality dbt etl google-cloud-platform logistics-analytics medallion-architecture sql

Last synced: 19 Jun 2026

BigQuery Awesome Lists
BigQuery Categories