Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/essien1990/etl_pipeline_airflow

Creating pipelines using Python3 and Apache Airflow to load tables into Google Big Query Dataware House

airflow airflow-dags airflow-operators bash bigquery bq datawarehouse etl-pipeline python3

Last synced: 21 Jan 2025

https://github.com/justinbeckwith/bisquick

🥞Synchronize your GitHub issues with BigQuery. Do neat stuff.

bigquery dotnet github

Last synced: 19 Dec 2024

https://github.com/salrashid123/gcp_cloud_status_dataset

BigQuery Dataset to query GCP Cloud Status Dashboard (https://status.cloud.google.com/)

bigquery gcp google-cloud google-cloud-platform

Last synced: 22 Jan 2025

https://github.com/rohitsanj/superset-dbt-demo

This repository contains an example project (Jaffle Shop) demonstrating integration between Superset and dbt, with BigQuery as the data warehouse.

apache-superset bigquery dbt superset

Last synced: 23 Jan 2025

https://github.com/googlecloudplatform/dcm2bq

About A service for creating a JSON metadata representation for DICOM from multiple input sources and storing into Google Cloud Big Query (BQ).

bigquery dicom gcs googlecloud googlecloudplatform googlecloudstorage json

Last synced: 28 Jan 2025

https://github.com/miguelapp10/etl_operadorlogistico

extraer datos de la API de SimpliRoute, AndesExpress y Urbano en un rango de fechas específico y procesarlos para su análisis y almacenamiento en Google BigQuery

api-client bigquery pandas python

Last synced: 20 Dec 2024

https://github.com/justinjsd/analytics-engineer-bootcamp

This repository serves as a collection of my work and learnings throughout the bootcamp, focusing on developing skills in analytics engineering, particularly using dbt.

analytics bigquery dbt engineering sql

Last synced: 05 Nov 2024

https://github.com/nghiant3110/google_analytic_4

This is a DA project based on the GA4 Sample dataset on Big Query

bigquery google-analytics looker-studio sql

Last synced: 24 Dec 2024

https://github.com/tosh2230/pubsub-dataflow-bigquery

Google Cloud Dataflow for 'Exactly-Once' streaming insertion, from Google Cloud Pub/Sub to Google BigQuery.

bigquery dataflow gcp google-cloud google-cloud-platform pubsub

Last synced: 21 Jan 2025

https://github.com/esanchezros/bigquery-maven-plugin

Maven plugin for managing BigQuery datasets, tables and views

bigquery java maven maven-plugin

Last synced: 22 Jan 2025

https://github.com/mattwelke/packt-book-bot

Bot that tweets and logs the Packt free eBook of the day in BigQuery daily.

bigquery bot ebooks ibm-cloud-functions java openwhisk

Last synced: 18 Dec 2024

https://github.com/antoinegiraud/dataform_hypermarche

SQL repo orchestrated by Dataform for BigQuery

bigquery dataform

Last synced: 08 Jan 2025

https://github.com/kellyjadams/bigquery-python-weekly-report

A script to automate a weekly report that runs BigQuery in Python.

bigquery python

Last synced: 22 Jan 2025

https://github.com/zkan/running-bigquery-query-from-airflow-using-bigqueryexecuteoperator

Running BigQuery Query from Airflow using BigQueryExecuteOperator

airflow bigquery sql

Last synced: 19 Dec 2024

https://github.com/thunchanokbow/audiblebook-revenue

Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand

apache-airflow bigquery cloudcomposer data-visualization datalake datawarehouse googlecloudstorage lookerstudio pandas python3

Last synced: 09 Jan 2025

https://github.com/thunchanokbow/inventory-amazon

Inventory value is also important for determining a company's liquidity, or its ability to meet its short-term financial obligations. A high inventory value can indicate that a company has too much money tied up in inventory, which could make it difficult for the company to pay its bills.

azure bigquery cloudcomposer clouddatabase cloudstorage compute-engine dataproc postgresql powerbi pyspark-sql python3

Last synced: 09 Jan 2025

https://github.com/ajaxbarcelonacruyff/gcp_cost

Monitoring Google Cloud costs with Looker Studio.

bigquery googlecloud googlecloudplatform lookerstudio

Last synced: 25 Dec 2024

https://github.com/ajaxbarcelonacruyff/ga4_bigquery_session_source

Creating GA4 session references in BigQuery.

bigquery ga4 googleanalytics

Last synced: 25 Dec 2024

https://github.com/matt-strautmann/dbt-bigquery-ecommerce-quickstart

Welcome to the dbt-BigQuery Quickstart Project! 🎉 This repository is designed as a hands-on guide to help you build a modern data stack leveraging powerful tools like Airbyte for ingestion, dbt for transformation, and BigQuery for storage and analytics.

bigquery dbt e-commerce quickstarts

Last synced: 17 Jan 2025

https://github.com/miguelapp10/workinghoursbetweentwodate_bigquery

Este proyecto es una calculadora de horas laborales que determina la cantidad de horas trabajadas entre dos fechas, teniendo en cuenta días hábiles y horas de trabajo especificadas con Bigquery

bigquery bigquery-dataset bigquery-table querying sql sql-query

Last synced: 15 Jan 2025

https://github.com/gdbecker/dbtlabslearning

Learn the foundational steps of transforming data in dbt Cloud. Start by connecting dbt Cloud to a data warehouse and Git repository, then explore key concepts like modeling, sources, testing, documentation, and deployment. Get hands-on by building a model and running tests in dbt Cloud.

analytics-engineering bigquery dbt dbt-cloud jinja macros models packages sql testing

Last synced: 22 Jan 2025

https://github.com/romange/puma

Bigquery-like engine for processing structured json-like records

bigquery cpp11 engine

Last synced: 23 Jan 2025

https://github.com/teraearlywine/sample_sql

The following repo contains samples of SQL code that can be referenced by future clients or employers.

bigquery database mysql sql

Last synced: 21 Jan 2025

https://github.com/pedrocarmona/big_query_adapter

An ActiveRecord Google BigQuery adapter

activerecord bigquery gem ruby-on-rails

Last synced: 21 Nov 2024

https://github.com/nais/bqrator

Operator for creating BigQuery datasets

bigquery bigquery-operator kubernetes kubernetes-operator nais-features

Last synced: 04 Feb 2025

https://github.com/miguelapp10/api_simpliroute_urbano

extraer datos de la API de SimpliRoute y Urbano en un rango de fechas específico y procesarlos para su análisis y almacenamiento en Google BigQuery

api-client bigquery pandas python

Last synced: 21 Nov 2024

https://github.com/prathmeshyelne/etl-pipeline-for-employee-data-using-data-fusion-airflow

This repository contains code and configuration files for an Extract, Transform, Load (ETL) project using Google Cloud Data Fusion for data extraction, Apache Airflow/Composer for orchestration, and Google BigQuery for data loading.

airflow bigquery dataengineering etl gcp googlecloudplatform

Last synced: 21 Jan 2025

https://github.com/analyticace/data-engineering-projects

Collection of Open Source Data Engineering Projects

aws big-data bigquery data docker engineering etl oracle-database pipeline sql

Last synced: 22 Dec 2024

https://github.com/cartodb/carto-auth

Python library to authenticate with CARTO

auth bigquery carto carto-dw oauth

Last synced: 12 Oct 2024

https://github.com/morphl-ai/morphl-model-publishers-churning-users-bigquery

BigQuery connector, pre-processor and model for predicting churning users for digital publishers using Google Analytics 360

bigquery google-analytics machine-learning morphl-platform pipeline preprocessor pyspark

Last synced: 11 Jan 2025

https://github.com/paulpierre/google-bq-export-downloader

Google BigQuery Export Downloader

big-data bigquery dump export gcs

Last synced: 21 Jan 2025

https://github.com/kyoshidajp/bqcop

Save your BigQuery cost.

bigquery golang

Last synced: 21 Jan 2025

https://github.com/tatamiya/new-books-notification

Fetch new books from [版元ドットコム](https://www.hanmoto.com/) and notify them to Slack

bigquery cloudrun-jobs gcs golang slack

Last synced: 12 Jan 2025

https://github.com/samedhi/gaend

Convert GAE Models into endpoints

bigquery elasticsearch google-app-engine restful taskqueue

Last synced: 12 Jan 2025

https://github.com/m-mizutani/bqs

BigQuery Schema utility in Go

bigquery bigquery-schema go

Last synced: 08 Jan 2025

https://github.com/rsachdeva/illuminatingdeposits-gcp-trigger

Terraform usage in the context of Google Cloud Platform GCP based Trigger of Resources applied to Cloud Functions. Both resource creation and destruction is through Terraform.

bigquery bigquery-table cloud-events functions-framework gcp go golang golangci-lint google-cloud google-cloud-function-pubsub-trigger google-cloud-functions google-cloud-pubsub google-cloud-sdk google-cloud-storage google-cloud-terraform sendgrid terraform

Last synced: 18 Jan 2025

https://github.com/nghiant3110/b2b_crm_3

This is a DA project based on the B2B Sales CRM dataset from Maven Analytics

bigquery google-sheets looker-studio sql

Last synced: 24 Dec 2024

https://github.com/andrewm4894/gcp-telemetry-example

Simple HTTP endpoint for telemetry data type events in GCP.

bigquery gcp-cloud-functions gcp-storage python terraform

Last synced: 01 Feb 2025

https://github.com/raqssoriano/hha504_assignment_nosql_dbs

This task is part of my assignment focused on creating and configuring databases in different platforms, such as GCP's BigQuery, MongoDB Atlas, and Redis Cloud.

bigquery mongodb-atlas mongodbcompass redis redisinsight

Last synced: 18 Dec 2024

https://github.com/adadalshabab/data-engineering-gcp-project

An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.

bigquery data data-science data-visualization databases dataengineering-a engineering etl-pipeline looker-studio powerbi

Last synced: 19 Dec 2024

https://github.com/smohanta23/uber_data-engineering_etl-project

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api

Last synced: 21 Jan 2025

https://github.com/scraly/flume-bigquery-sink

An Apache Flume Sink implementation to publish data to Google BigQuery

bigquery flume sink

Last synced: 25 Dec 2024

https://github.com/francois-lenne/elt-mp4-quiberon

the goal of this project is to retrieve the video of the municipality of quiberon and see if a person is in or no

bigquery cicd data-engineering docker elt google-cloud-functions google-cloud-platform google-cloud-run google-cloud-storage pipeline python sql unstructured-data

Last synced: 25 Dec 2024

https://github.com/fakhri098/project-sql-bigquery

This project aims to analyze taxi trip data with a focus on trip duration patterns, popular routes, and trip costs. The study was conducted to gain in-depth insights into taxi travel behavior based on historical data.

bigquery sql

Last synced: 17 Jan 2025

https://github.com/celiason/coffee-funnel

webpage for visualizing sales projections of a small coffee business

bigquery prophet sales-analysis streamlit-webapp

Last synced: 26 Dec 2024

https://github.com/kavyachippada/hva

Mini-Hackathon 1.0

bigquery excel pandas powerbi sql

Last synced: 13 Oct 2024

https://github.com/prashhhant213/strategic-analysis-of-retail-brand-in-south-america-using-sql

Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services

bigquery database mysql-server sql

Last synced: 26 Dec 2024

https://github.com/epomatti/gcp-bigquery

Data sync via CDC from GCP Cloud SQL to Big Query using Datastream

bigquery cloud-sql datastream gcp

Last synced: 17 Jan 2025

https://github.com/spacepatcher/google-workspace-gmail-collector

👁 App for collecting Gmail logs from your Google Workspace account and sending them to Kafka

bigquery gmail google-workspace security soc

Last synced: 23 Oct 2024

https://github.com/anyesh/gbq-helpers

GBQ related helper functions and snippets.

bigquery google

Last synced: 10 Jan 2025

https://github.com/entur/terraform-aiven-kafka-connect-bigquery-sink

Terraform module for BigQuery sink connector on Aiven KafkaConnect cluster

aiven bigquery kafka-connect sink-connector terraform terraform-modules

Last synced: 17 Jan 2025

https://github.com/ngangawairimu/clv-rfm-and-customer-segmentation-analysis

This project performs cohort analysis to estimate Customer Lifetime Value (CLV) by analyzing weekly revenue and user registrations over 12 weeks, forecasting future revenue, and providing actionable insights for marketing and business strategy.

bigquery clv-analysis cohort-analysis customer-segmentation excel rfm-analysis

Last synced: 03 Jan 2025

https://github.com/shikanime/seeker

Data platform based on BigQuery

bigquery dataform google-cloud

Last synced: 04 Jan 2025

https://github.com/marceloneppel/map-to-bigquery-structs

Tool to convert a Golang map to a struct containing fields with types like bigquery.Null*.

bigquery golang map struct

Last synced: 30 Jan 2025

https://github.com/yasarsultan/taxi-trip-analysis

The NYC Taxi Trip Batch Data Pipeline automates processing of large-scale trip data using Apache Spark and Airflow, integrating AWS S3 and Google BigQuery for storage and analytics. It features scalable, containerized workflows with robust data validation.

airflow aws-s3 bash-script batch-processing bigquery data-lake data-warehouse docker python3 spark

Last synced: 11 Jan 2025

https://github.com/santiago-giordano/aws-gcp-pipeline

Simple pipeline, downloads csv from aws bucket, does some transformations, creates tables in gcp bq, loads data, and runs queries

aws bigquery etl gcp jupyter pipeline python

Last synced: 12 Jan 2025

https://github.com/denisogr/kaggle-notebook-to-production

This is a study project. I get analytics/ML examples from Kaggle and use different technologies to re-implement them.

bigquery data-engineering gcp kaggle-competition kaggle-dataset python spark

Last synced: 12 Jan 2025

https://github.com/justinjsd/analytics-engineering

📊 A repository focusing on analytics engineering, particularly using dbt on the Northwind Sample dataset

analytics bigquery dbt engineering sql

Last synced: 12 Jan 2025

https://github.com/shvetsihorr/sql-projects

SQL and Google BigQuery-Portfolio Projects

azuredatastudio bigquery mssql postgresql sql

Last synced: 18 Jan 2025

https://github.com/rolandbende/python-bigquery-migrations

Python bigquery-migrations package is for creating and manipulating BigQuery databases easily.

bigquery google migration-automation migration-scripts migration-tool migrations python

Last synced: 24 Jan 2025

https://github.com/ddzikri/analisis-data-kimia-farma

Project Based Internship Kimia Farma Rakamin Academy

bigquery dataset sql

Last synced: 24 Jan 2025

https://github.com/stoqey/rasputia

Rasputia Latimore - The Big Data Bitch 💋

bigquery

Last synced: 19 Jan 2025

https://github.com/janaom/gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.

airflow bigquery bigqueryml cloud-functions cloud-run-functions composer data-engineering-project google-cloud-platform

Last synced: 26 Jan 2025

https://github.com/jasontanx/ridership-headline-project

This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.

bigquery data-engineering minio-server public-transport-ridership terraform

Last synced: 01 Feb 2025

https://github.com/richardbnk/data_tools

Python Library to Accelerate Creation of Data ETL Processes on multiple database systems.

bigquery etl gcp sql

Last synced: 02 Feb 2025

https://github.com/knands42/data-ingestion

Data Ingestion project to evaluate my Kotlin skill using concurrency

bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow

Last synced: 25 Jan 2025

https://github.com/oguzgn/firebase-ab-test-analysis-for-a-mobile-race-game

This repository showcases an infrastructure designed for analyzing A/B tests in mobile games. It leverages BigQuery to process Firebase and GA4-based event data and uses Looker Studio for dynamic visualization. The project simplifies A/B test comparisons, enabling stakeholders to view results directly through interactive dashboards.

ab-testing ab-testing-analysis bigquery event-based-tracking firebase looker-studio mobile-game-analytics race-game sql

Last synced: 26 Jan 2025

https://github.com/alessio-siciliano/google-cloud-python-class-wrapper

An example of several classes written in Python to interact with GCP

bigquery datatransfer gcp google-cloud

Last synced: 26 Jan 2025

https://github.com/sejalmankar1012/product_data_analyst_assessement

Analyzing the Impact of Business Hour Mismatch on Order Volume in the Food Delivery Industry: A Case Study of UEats and Ghub

assessment-project bigquery loop product-analyst sql-query

Last synced: 26 Jan 2025

https://github.com/lucashomuniz/project-22

[Dashboard] Data and Sustainability: Optimizing Green Flow's Fertilizer Portfolio

agrotech bigquery data-analytics data-structures data-visualization google-cloud-platform powerbi powerbi-visuals powerquery sql sustainability

Last synced: 25 Jan 2025

https://github.com/minhajuddin2510/bigquery_alerts

In today’s data-driven world, organisations heavily rely on timely alerts to monitor critical systems and make informed decisions. However, when working with BigQuery, a popular cloud-based data warehouse, there is no built-in functionality to generate alerts. In this article, we will explore how I recently built a cloud function to address this

alerting bigquery cloudfunctions monitoring-tool slack

Last synced: 31 Jan 2025

https://github.com/jasontanx/terraform-practice

Creating datasets and tables in Google BigQuery via Terraform

bigquery iac-terraform infrastructure-as-code terraform

Last synced: 01 Feb 2025

https://github.com/sintef/bigquery-postgresql-wire-proxy

A PostgreSQL wire protocol proxy server for BigQuery.

bigquery postgresql proxy

Last synced: 12 Jan 2025

https://github.com/alessio-siciliano/bigquery-advanced-utils

BigQuery-advanced-utils is a lightweight utility library that extends the official Google BigQuery Python client. It simplifies tasks like query management, data processing, and automation. Aimed at developers and data scientists, the project is open to contributions to improve and enhance its functionality.

bigquery datatransfer google-cloud python

Last synced: 01 Feb 2025

https://github.com/ansh-info/stockpulse

Real-time stock market analytics pipeline with live visualization dashboard. Built with Python and GCP, featuring automated data processing and interactive Streamlit analytics.

api big-data bigquery cloud cloud-computing cloud-native data-engineering data-pipeline docker docker-compose gcp gcp-automation-gitops gcp-cloud-run gcp-pubsub google-cloud-platform real-time realtime stock-market stocks streamlit

Last synced: 27 Dec 2024

https://github.com/sangnandar/insert-unique-record

This is Cloud Functions script to insert only unique records into BigQuery.

bigquery digital-marketing-analytics google-cloud-functions

Last synced: 29 Dec 2024

https://github.com/ivanildobarauna/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 29 Dec 2024

https://github.com/iht/bigquery-dataflow-cdc-example

A Dataflow streaming pipeline written in Java, reading data from Pubsub and recovering the sessions from potentially unordered data, and upserting the session data into BigQuery with no duplicates

apache-beam bigquery cdc dataflow google-cloud pubsub

Last synced: 29 Dec 2024

https://github.com/oliveroneill/wilt-cloud-functions

Wilt Google Cloud Functions

bigquery google-cloud-functions

Last synced: 07 Jan 2025

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 27 Jan 2025

BigQuery Awesome Lists
BigQuery Categories