Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/icarusso/bigqueryexporter

Export query data from google bigquery to local machine

bigquery csv export python

Last synced: 21 Nov 2024

https://github.com/dav009/bqt

Local unit tests for your BigQuery queries

bigquery bq data test unittest

Last synced: 21 Jan 2025

https://github.com/george-nyamao/gcp_etl_project

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

airflow bigquery cloudcomposer data-fusion gcs-bucket looker python3 wrangler

Last synced: 21 Jan 2025

https://github.com/miguelapp10/etl_operadorlogistico

extraer datos de la API de SimpliRoute, AndesExpress y Urbano en un rango de fechas específico y procesarlos para su análisis y almacenamiento en Google BigQuery

api-client bigquery pandas python

Last synced: 13 Feb 2025

https://github.com/justinbeckwith/bisquick

🥞Synchronize your GitHub issues with BigQuery. Do neat stuff.

bigquery dotnet github

Last synced: 12 Feb 2025

https://github.com/chandanpasunoori/event-sync

Event Sync is for syncing events from multiple sources to multiple destinations, targetted for adhoc events, where sources support acknowledgement functionality.

bigquery golang-tools google-cloud-platform pubsub

Last synced: 12 Feb 2025

https://github.com/paulpierre/google-bq-export-downloader

Google BigQuery Export Downloader

big-data bigquery dump export gcs

Last synced: 21 Jan 2025

https://github.com/fpopic/bigquery-schema-select

(Script) Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.

bigquery bigquery-schema scala sql

Last synced: 21 Jan 2025

https://github.com/johannaojeling/go-data-ingestion

Cloud Function for ingesting data from Cloud Storage to BigQuery

bigquery cloud-functions cloud-storage go google-cloud

Last synced: 31 Jan 2025

https://github.com/metrics-pli/bigquery-export

Exports collected metrics to Google Big Query

bigquery datastudio lighthouse metrics metrics-pli performance pupeteer

Last synced: 25 Jan 2025

https://github.com/pmhalvor/whale-speech

A pipeline to map whale sightings to hydrophone audio

beam bigquery gcs mle model-as-a-service python tensorflow2

Last synced: 20 Dec 2024

https://github.com/kyoshidajp/bqcop

Save your BigQuery cost.

bigquery golang

Last synced: 21 Jan 2025

https://github.com/cartodb/carto-auth

Python library to authenticate with CARTO

auth bigquery carto carto-dw oauth

Last synced: 16 Feb 2025

https://github.com/m-mizutani/bqs

BigQuery Schema utility in Go

bigquery bigquery-schema go

Last synced: 08 Jan 2025

https://github.com/windi-wulandari/pbi_kimia-farma-x-rakamin

A data-driven analytics project for Kimia Farma to evaluate business performance from 2020-2023 using BigQuery. Focused on transaction data, inventory, branch operations, and product insights. Results were visualized through an interactive dashboard to support strategic decisions and optimizations.

big-data-analytics bigquery datawarehouse googlelooker sql

Last synced: 23 Jan 2025

https://github.com/danlessa/meta_qa

A practical one-liner metalanguage for describing common-sense in an machine-friendly way.

bigquery metalanguage

Last synced: 08 Feb 2025

https://github.com/mchirico/gmail

Inserts Gmail messages into BigQuery, then, deletes.

angular9 bigquery gcp gmail python3

Last synced: 23 Jan 2025

https://github.com/ostrokach/uniparc_xml_parser

UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).

bigquery bioinformatics csv-files parquet-files protein-domains protein-sequences

Last synced: 21 Jan 2025

https://github.com/yu-iskw/terraform-google-copy-bq-datasets

A terraform module to copy BigQuery datasets across regions

bigquery data-engineering google-cloud terraform

Last synced: 13 Feb 2025

https://github.com/thunchanokbow/audiblebook-revenue

Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand

apache-airflow bigquery cloudcomposer data-visualization datalake datawarehouse googlecloudstorage lookerstudio pandas python3

Last synced: 09 Jan 2025

https://github.com/rohitsanj/superset-dbt-demo

This repository contains an example project (Jaffle Shop) demonstrating integration between Superset and dbt, with BigQuery as the data warehouse.

apache-superset bigquery dbt superset

Last synced: 23 Jan 2025

https://github.com/tupizz/fiap_pnad-covid-19

Este projeto realiza a análise e transformação de dados da PNAD COVID-19 de maio a julho de 2020, utilizando PySpark para processamento de dados em larga escala e BigQuery como destino para armazenamento e análise posterior. O objetivo é consolidar os dados mensais em um único conjunto de dados transformado.

analysis bigquery pyspark python

Last synced: 09 Feb 2025

https://github.com/anilkhichar/bq-table-copy-automation

Copy table from one dataset to another in google big query using bash script

automation bash bash-script big-query bigquery bigquery-cp gcp google

Last synced: 19 Feb 2025

https://github.com/salrashid123/gcp_cloud_status_dataset

BigQuery Dataset to query GCP Cloud Status Dashboard (https://status.cloud.google.com/)

bigquery gcp google-cloud google-cloud-platform

Last synced: 22 Jan 2025

https://github.com/tomgorb/project-template-for-production

project template to (help) put a Machine/Deep learning algorithm into production

airflow bigquery gcp

Last synced: 09 Jan 2025

https://github.com/nghiant3110/google_analytic_4

This is a DA project based on the GA4 Sample dataset on Big Query

bigquery google-analytics looker-studio sql

Last synced: 15 Feb 2025

https://github.com/thunchanokbow/inventory-amazon

Inventory value is also important for determining a company's liquidity, or its ability to meet its short-term financial obligations. A high inventory value can indicate that a company has too much money tied up in inventory, which could make it difficult for the company to pay its bills.

azure bigquery cloudcomposer clouddatabase cloudstorage compute-engine dataproc postgresql powerbi pyspark-sql python3

Last synced: 09 Jan 2025

https://github.com/pedrocarmona/big_query_adapter

An ActiveRecord Google BigQuery adapter

activerecord bigquery gem ruby-on-rails

Last synced: 21 Nov 2024

https://github.com/kellyjadams/bigquery-python-weekly-report

A script to automate a weekly report that runs BigQuery in Python.

bigquery python

Last synced: 22 Jan 2025

https://github.com/zkan/running-bigquery-query-from-airflow-using-bigqueryexecuteoperator

Running BigQuery Query from Airflow using BigQueryExecuteOperator

airflow bigquery sql

Last synced: 12 Feb 2025

https://github.com/googlecloudplatform/dcm2bq

About A service for creating a JSON metadata representation for DICOM from multiple input sources and storing into Google Cloud Big Query (BQ).

bigquery dicom gcs googlecloud googlecloudplatform googlecloudstorage json

Last synced: 28 Jan 2025

https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds

Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow

apache-arrow apache-beam bigquery dataflow neo4j python

Last synced: 15 Feb 2025

https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart

Welcome to the dbt-BigQuery Quickstart Project! 🎉 This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.

bigquery dbt e-commerce quickstarts

Last synced: 17 Jan 2025

https://github.com/ajaxbarcelonacruyff/ga4_bigquery_session_source

Creating GA4 session references in BigQuery.

bigquery ga4 googleanalytics

Last synced: 17 Feb 2025

https://github.com/ajaxbarcelonacruyff/gcp_cost

Monitoring Google Cloud costs with Looker Studio.

bigquery googlecloud googlecloudplatform lookerstudio

Last synced: 17 Feb 2025

https://github.com/essien1990/etl_pipeline_airflow

Creating pipelines using Python3 and Apache Airflow to load tables into Google Big Query Dataware House

airflow airflow-dags airflow-operators bash bigquery bq datawarehouse etl-pipeline python3

Last synced: 21 Jan 2025

https://github.com/undisputed-jay/etl-on-gcp-with-apache-airflow

In this project, files were ingested to Google Cloud Storage and later to moved to BigQuery so as to perform some queries and the result moved back to Google Cloud Storage.

apache-airflow bigquery data-engineering data-warehouse docker etl-pipeline google-cloud-platform

Last synced: 13 Feb 2025

https://github.com/tosh2230/pubsub-dataflow-bigquery

Google Cloud Dataflow for 'Exactly-Once' streaming insertion, from Google Cloud Pub/Sub to Google BigQuery.

bigquery dataflow gcp google-cloud google-cloud-platform pubsub

Last synced: 21 Jan 2025

https://github.com/esanchezros/bigquery-maven-plugin

Maven plugin for managing BigQuery datasets, tables and views

bigquery java maven maven-plugin

Last synced: 22 Jan 2025

https://github.com/rolandbende/python-bigquery-migrations

Python bigquery-migrations package is for creating and manipulating BigQuery databases easily.

bigquery google migration-automation migration-scripts migration-tool migrations python

Last synced: 24 Jan 2025

https://github.com/ddzikri/analisis-data-kimia-farma

Project Based Internship Kimia Farma Rakamin Academy

bigquery dataset sql

Last synced: 24 Jan 2025

https://github.com/stoqey/rasputia

Rasputia Latimore - The Big Data Bitch 💋

bigquery

Last synced: 19 Jan 2025

https://github.com/janaom/gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.

airflow bigquery bigqueryml cloud-functions cloud-run-functions composer data-engineering-project google-cloud-platform

Last synced: 26 Jan 2025

https://github.com/knands42/data-ingestion

Data Ingestion project to evaluate my Kotlin skill using concurrency

bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow

Last synced: 25 Jan 2025

https://github.com/nghiant3110/google_fiber_bi_5

This is a BI Capstone project based on the Google Fiber dataset from Google BI Course

bigquery google-sheets looker-studio sql

Last synced: 15 Feb 2025

https://github.com/loinguyen3108/sportify-music-analysis

Engineered the streaming crawler pipeline using Kafka to extract, transform, and load Spotify data into PostgreSQL and ClickHouse for real-time analytics. Additionally, developed an automated batching pipeline using Airflow and Spark to efficiently ETL crawled data into BigQuery.

airflow bigquery clickhouse kafka pyspark spotify

Last synced: 14 Feb 2025

https://github.com/aafaf-arharas/de-zoomcamp2025

This repository contains my work completed during the Data Engineering Zoomcamp 2025

bigquery docker gcp kestra python sql terraform

Last synced: 14 Feb 2025

https://github.com/squidmin/bigquery-labs

GCP BigQuery CLI

bigquery gcp java

Last synced: 07 Feb 2025

https://github.com/lucashomuniz/project-22

[Dashboard] Data and Sustainability: Optimizing Green Flow's Fertilizer Portfolio

agrotech bigquery data-analytics data-structures data-visualization google-cloud-platform powerbi powerbi-visuals powerquery sql sustainability

Last synced: 25 Jan 2025

https://github.com/akihokurino/dbt-sample

dbt sample

bigquery dbt python3

Last synced: 07 Feb 2025

https://github.com/tharun2806/end-to-end-internship-data-analysis

Internship Dataset Analysis is an end-to-end project analyzing an internship dataset obtained from Kaggle. The project involves cleaning and preprocessing the data using Excel and SQL, followed by exploratory data analysis (EDA). The analysis includes statistical, sectoral and geospatial insights, visualized through an interactive Tableau dashboard

bigquery data-analysis data-cleaning data-preprocessing data-visualization exploratory-data-analysis geospatial-analysis microsoft-excel reporting sectoral-analysis statistical-analysis tableau-public

Last synced: 07 Feb 2025

https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer

Mackerel plugin to post bigquery's query result

bigquery mackerel-plugin

Last synced: 16 Feb 2025

https://github.com/minhajuddin2510/bigquery_alerts

In today’s data-driven world, organisations heavily rely on timely alerts to monitor critical systems and make informed decisions. However, when working with BigQuery, a popular cloud-based data warehouse, there is no built-in functionality to generate alerts. In this article, we will explore how I recently built a cloud function to address this

alerting bigquery cloudfunctions monitoring-tool slack

Last synced: 31 Jan 2025

https://github.com/nghiant3110/b2b_crm_3

This is a DA project based on the B2B Sales CRM dataset from Maven Analytics

bigquery google-sheets looker-studio sql

Last synced: 15 Feb 2025

https://github.com/nghiant3110/e_com_1

This is a DA project base on E-com Data set (Thelook_ecom) in Big Query from Google

bigquery looker-studio sql

Last synced: 15 Feb 2025

https://github.com/jasontanx/terraform-practice

Creating datasets and tables in Google BigQuery via Terraform

bigquery iac-terraform infrastructure-as-code terraform

Last synced: 01 Feb 2025

https://github.com/alessio-siciliano/bigquery-advanced-utils

BigQuery-advanced-utils is a lightweight utility library that extends the official Google BigQuery Python client. It simplifies tasks like query management, data processing, and automation. Aimed at developers and data scientists, the project is open to contributions to improve and enhance its functionality.

bigquery datatransfer google-cloud python

Last synced: 01 Feb 2025

https://github.com/davelester/gharchive-bigquery-examples

Examples Using BigQuery to Analyze GH Archive Data

bigquery gharchive

Last synced: 01 Feb 2025

https://github.com/jasontanx/ridership-headline-project

This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.

bigquery data-engineering minio-server public-transport-ridership terraform

Last synced: 01 Feb 2025

https://github.com/sayed-ashfaq/target-sql

In this project, I analyzed Target company's data using SQL in BigQuery, focusing on data extraction, manipulation, and performing various analytical queries to derive insights.

aggregation bigquery cte joins sql

Last synced: 15 Feb 2025

https://github.com/richardbnk/data_tools

Python Library to Accelerate Creation of Data ETL Processes on multiple database systems.

bigquery etl gcp sql

Last synced: 02 Feb 2025

https://github.com/oguzgn/firebase-ab-test-analysis-for-a-mobile-race-game

This repository showcases an infrastructure designed for analyzing A/B tests in mobile games. It leverages BigQuery to process Firebase and GA4-based event data and uses Looker Studio for dynamic visualization. The project simplifies A/B test comparisons, enabling stakeholders to view results directly through interactive dashboards.

ab-testing ab-testing-analysis bigquery event-based-tracking firebase looker-studio mobile-game-analytics race-game sql

Last synced: 26 Jan 2025

https://github.com/alessio-siciliano/google-cloud-python-class-wrapper

An example of several classes written in Python to interact with GCP

bigquery datatransfer gcp google-cloud

Last synced: 26 Jan 2025

https://github.com/vigneshss-07/cloud-bigquery-and-sql---the-interview-guide

This deals with SQL commands, interview preparation and query questions and solutions in BigQuery

azuresql bigquery gcp sql sql-query sql-server sqlalchemy

Last synced: 15 Feb 2025

https://github.com/sejalmankar1012/product_data_analyst_assessement

Analyzing the Impact of Business Hour Mismatch on Order Volume in the Food Delivery Industry: A Case Study of UEats and Ghub

assessment-project bigquery loop product-analyst sql-query

Last synced: 26 Jan 2025

https://github.com/digitaloptimizationgroup/digitaloptgroup-r-notebooks

A collection of R notebooks to analyze data from the Digital Optimization Group Platform

ab-testing bigquery jupyter-notebook performance-analysis r web-analytics

Last synced: 21 Jan 2025

https://github.com/oliveroneill/wilt-cloud-functions

Wilt Google Cloud Functions

bigquery google-cloud-functions

Last synced: 07 Jan 2025

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 27 Jan 2025

https://github.com/isaacmg/mimic_iv_bq_queries

Queries needed to recreate time series features for model training

bigquery mimic-iv sql

Last synced: 21 Jan 2025

https://github.com/mikeghen/metadata

Pulls data from Socrata open data portals

bigquery python socrata

Last synced: 18 Feb 2025

https://github.com/ackeecz/terraform-gcp-cloud-run_pubsub_to_bq

Cloud Run subscribes itself to given topic and inserts each message to BigQuery table.

bigquery gcp pubsub terraform

Last synced: 07 Jan 2025

https://github.com/ackeecz/terraform-gcp-cloud-function_pubsub_to_bq

Cloud function subscribes itself to given topic and inserts each message to BigQuery table.

bigquery cloud-functions pubsub terraform-module

Last synced: 07 Jan 2025

https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt

Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.

apache-airflow bigquery csv dbt python3 soda

Last synced: 28 Jan 2025

https://github.com/eddieatgoogle/sql-based-genai-data-pipeline

GenAI data pipeline that performs data preparation, management and performance evaluation tasks for RAG systems using SQL as the primary development language. Please feel free to use this as a starting point for your own projects.

bigquery bqml dataform embeddings gemini google-cloud-platform sql vector-search vertex-ai

Last synced: 08 Jan 2025

https://github.com/adadalshabab/data-engineering-gcp-project

An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.

bigquery data data-science data-visualization databases dataengineering-a engineering etl-pipeline looker-studio powerbi

Last synced: 11 Feb 2025

https://github.com/humairarizwan/uber-ride-dataengineering-analysis

This project creates a pipeline to process data and performs data analytics on Uber data.

bigquery dataanalysis dataengineering gcp-project googlestorage looker-studio

Last synced: 21 Jan 2025

https://github.com/kellyjadams/ap-exam-scores

Analyzing AP exam scores for a school.

bigquery sql

Last synced: 08 Jan 2025

https://github.com/flowerinthenight/bqstream

A simple library to help facilitate streaming to BigQuery.

bigquery go golang streaming

Last synced: 08 Jan 2025

BigQuery Awesome Lists
BigQuery Categories