Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/mikeghen/metadata

Pulls data from Socrata open data portals

bigquery python socrata

Last synced: 18 Feb 2025

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 27 Jan 2025

https://github.com/oliveroneill/wilt-cloud-functions

Wilt Google Cloud Functions

bigquery google-cloud-functions

Last synced: 07 Jan 2025

https://github.com/edumoraes1/spam_count_sfmc

Consulta de SQL com contagem de envios de email e spam dos ultimos 365 dias

bigquery marketing-cloud salesforce sql

Last synced: 20 Feb 2025

https://github.com/yu-iskw/bigquery-lineage

Visualize BigQuery data lineage graph

bigquery data-governance data-management visualization

Last synced: 10 Feb 2025

https://github.com/iht/bigquery-dataflow-cdc-example

A Dataflow streaming pipeline written in Java, reading data from Pubsub and recovering the sessions from potentially unordered data, and upserting the session data into BigQuery with no duplicates

apache-beam bigquery cdc dataflow google-cloud pubsub

Last synced: 19 Feb 2025

https://github.com/zaynabbug/end-to-end-fitness-data-pipeline-on-gcp

A cloud-native data pipeline that ingests, processes, and visualizes real-time and batch fitness data. Built with Pub/Sub, Airflow, BigQuery, dbt, Looker Studio, Terraform, and Docker to automate data workflows and provide actionable insights.

airflow bigquery dbt docker gcp looker-studio pubsub terraform

Last synced: 21 Feb 2025

https://github.com/andrewm4894/gcp-telemetry-example

Simple HTTP endpoint for telemetry data type events in GCP.

bigquery gcp-cloud-functions gcp-storage python terraform

Last synced: 01 Feb 2025

https://github.com/isaacmg/mimic_iv_bq_queries

Queries needed to recreate time series features for model training

bigquery mimic-iv sql

Last synced: 21 Jan 2025

https://github.com/raqssoriano/hha504_assignment_nosql_dbs

This task is part of my assignment focused on creating and configuring databases in different platforms, such as GCP's BigQuery, MongoDB Atlas, and Redis Cloud.

bigquery mongodb-atlas mongodbcompass redis redisinsight

Last synced: 10 Feb 2025

https://github.com/simhayn/genomics-cannabis-bigquery

BigQuery's Cannabis_Genomics Dataset Exploration using SQL in a Python Environment

big-data bigquery bioinformatics exploratory-data-analysis genomics python sql

Last synced: 22 Jan 2025

https://github.com/ansh-info/stockpulse

Real-time stock market analytics pipeline with live visualization dashboard. Built with Python and GCP, featuring automated data processing and interactive Streamlit analytics.

api big-data bigquery cloud cloud-computing cloud-native data-engineering data-pipeline docker docker-compose gcp gcp-automation-gitops gcp-cloud-run gcp-pubsub google-cloud-platform real-time realtime stock-market stocks streamlit

Last synced: 18 Feb 2025

https://github.com/smohanta23/uber_data-engineering_etl-project

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api

Last synced: 21 Jan 2025

https://github.com/rifa8/extract-load-demo

Learning Google Cloud Platform (GCP)

airbyte bigquery bucket gcp

Last synced: 27 Jan 2025

https://github.com/mchmarny/stocker

Using tweeter sentiment and stock market price signal correlation to predict next day closing price

bigquery ml prediction regression-models

Last synced: 21 Feb 2025

https://github.com/sayed-ashfaq/target-sql

In this project, I analyzed Target company's data using SQL in BigQuery, focusing on data extraction, manipulation, and performing various analytical queries to derive insights.

aggregation bigquery cte joins sql

Last synced: 15 Feb 2025

https://github.com/scraly/flume-bigquery-sink

An Apache Flume Sink implementation to publish data to Google BigQuery

bigquery flume sink

Last synced: 16 Feb 2025

https://github.com/valenthr/purchase_funnel

Google merch store sales analysis

bigquery product-analysis sql

Last synced: 27 Jan 2025

https://github.com/chiamakaukwuoma/portfolio

This repository contains various projects I've been privileged to work on outside of work.

aws-rds azure-fabric bigquery data-analysis docker-container elasticsearch excel grafana hadoop looker-studio mssql mysql postgresql powerbi python sql tableau

Last synced: 03 Feb 2025

https://github.com/xxmadkillerx10/data-engineering-zoomcamp

The Data Engineering Zoomcamp covers essential skills in containerization, workflow orchestration, data warehousing, analytics engineering, batch, and streaming processing. It includes tools like Docker, Terraform, BigQuery, dbt, Spark, Kafka, Kestra, Postgres, Google Data Studio, and Metabase.

airflow bigquery data-visualization dbt dbt-clickhouse docker-compose etl gcs google-cloud kafka postgresql spark sql streaming

Last synced: 03 Feb 2025

https://github.com/spacepatcher/google-workspace-gmail-collector

👁 App for collecting Gmail logs from your Google Workspace account and sending them to Kafka

bigquery gmail google-workspace security soc

Last synced: 23 Oct 2024

https://github.com/fakhri098/project-sql-bigquery

This project aims to analyze taxi trip data with a focus on trip duration patterns, popular routes, and trip costs. The study was conducted to gain in-depth insights into taxi travel behavior based on historical data.

bigquery sql

Last synced: 17 Jan 2025

https://github.com/topefolorunso/musicaly-project

An end-to-end data pipeline that ingests simulated music stream data, structures, cleans and models the raw data, and visualizes clean data.

airflow bigquery data-pipeline dbt google-cloud-platform kafka python spark-streaming

Last synced: 17 Feb 2025

https://github.com/francois-lenne/play-bq-gcp

Data pipeline in order to retrieve data from the playstation API to BigQuery

bigquery cicd data-engineering google-cloud python

Last synced: 13 Jan 2025

https://github.com/antbit96/dataform_poc

Template for basic data preparation

bigquery bigquery-dataform data-preparation

Last synced: 14 Dec 2024

https://github.com/prashhhant213/strategic-analysis-of-retail-brand-in-south-america-using-sql

Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services

bigquery database mysql-server sql

Last synced: 17 Feb 2025

https://github.com/lupusruber/music_analytics

This project processes real-time music event data using Kafka, Apache Spark on Google Cloud Dataproc, and stores the transformed data in BigQuery for analytics, all orchestrated by Airflow and managed with Terraform.

bigquery data-proc dimensional-modeling gcp-project kafka spark-structured-streaming

Last synced: 02 Feb 2025

https://github.com/epomatti/gcp-bigquery

Data sync via CDC from GCP Cloud SQL to Big Query using Datastream

bigquery cloud-sql datastream gcp

Last synced: 17 Jan 2025

https://github.com/scraly/bigquery

Google BigQuery AaaS tools, tips and fun

bigquery java

Last synced: 16 Feb 2025

https://github.com/tosh2230/cdc-rds-bq

Change data capture from Amazon RDS to Google BigQuery

bigquery changedatacapture rds

Last synced: 21 Jan 2025

https://github.com/janaom/gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.

airflow bigquery bigqueryml cloud-functions cloud-run-functions composer data-engineering-project google-cloud-platform

Last synced: 26 Jan 2025

https://github.com/adindasarianti/rakamin_kf_analytics

This repository contains my project as a Big Data Analytics intern at Kimia Farma, where I analyzed the performance of Kimia Farma from 2020 to 2023

bigquery dataanalytics lookerstudio

Last synced: 02 Jan 2025

https://github.com/anyesh/gbq-helpers

GBQ related helper functions and snippets.

bigquery google

Last synced: 10 Jan 2025

https://github.com/newtonmunene99/sec-filings

Simple golang app that crawls sec EDGAR filings and loads indices into Google BigQuery

bigquery cloudstorage gcp golang

Last synced: 21 Jan 2025

https://github.com/entur/terraform-aiven-kafka-connect-bigquery-sink

Terraform module for BigQuery sink connector on Aiven KafkaConnect cluster

aiven bigquery kafka-connect sink-connector terraform terraform-modules

Last synced: 17 Jan 2025

https://github.com/ngangawairimu/clv-rfm-and-customer-segmentation-analysis

This project performs cohort analysis to estimate Customer Lifetime Value (CLV) by analyzing weekly revenue and user registrations over 12 weeks, forecasting future revenue, and providing actionable insights for marketing and business strategy.

bigquery clv-analysis cohort-analysis customer-segmentation excel rfm-analysis

Last synced: 03 Jan 2025

https://github.com/shikanime/seeker

Data platform based on BigQuery

bigquery dataform google-cloud

Last synced: 04 Jan 2025

https://github.com/ivdatahub/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 21 Nov 2024

https://github.com/marceloneppel/map-to-bigquery-structs

Tool to convert a Golang map to a struct containing fields with types like bigquery.Null*.

bigquery golang map struct

Last synced: 30 Jan 2025

https://github.com/push-protocol/push-google-bigquery

The Power of Web3 Big Data: A Guide to Using Google BigQuery and Push Protocol for Data Communication and Analysis

bigquery data push push-notifications web3

Last synced: 31 Jan 2025

https://github.com/yasarsultan/taxi-trip-analysis

The NYC Taxi Trip Batch Data Pipeline automates processing of large-scale trip data using Apache Spark and Airflow, integrating AWS S3 and Google BigQuery for storage and analytics. It features scalable, containerized workflows with robust data validation.

airflow aws-s3 bash-script batch-processing bigquery data-lake data-warehouse docker python3 spark

Last synced: 11 Jan 2025

https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer

Mackerel plugin to post bigquery's query result

bigquery mackerel-plugin

Last synced: 16 Feb 2025

https://github.com/yu-iskw/homebrew-bigquery-to-datastore

A homebrew tap for bigquery-to-datastore

bigquery google-datastore homebrew

Last synced: 10 Feb 2025

https://github.com/paulveillard/cybersecurity-analytics

An ongoing collection of awesome software, libraries, learning tutorials, documents and books, technical resources and cool stuff about Analytics Engineering in Cybersecurity.

analytics bigdata bigquery cybernetics cybersecurity data data-engineering data-science encryption encryption-decryption seo seo-friendly seo-optimization

Last synced: 02 Feb 2025

https://github.com/santiago-giordano/aws-gcp-pipeline

Simple pipeline, downloads csv from aws bucket, does some transformations, creates tables in gcp bq, loads data, and runs queries

aws bigquery etl gcp jupyter pipeline python

Last synced: 12 Jan 2025

https://github.com/denisogr/kaggle-notebook-to-production

This is a study project. I get analytics/ML examples from Kaggle and use different technologies to re-implement them.

bigquery data-engineering gcp kaggle-competition kaggle-dataset python spark

Last synced: 12 Jan 2025

https://github.com/justinjsd/analytics-engineering

📊 A repository focusing on analytics engineering, particularly using dbt on the Northwind Sample dataset

analytics bigquery dbt engineering sql

Last synced: 12 Jan 2025

https://github.com/jasontanx/ridership-headline-project

This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.

bigquery data-engineering minio-server public-transport-ridership terraform

Last synced: 01 Feb 2025

https://github.com/itsubaki/hermes-lambda

Transfers AWS cost data to BigQuery

aws bigquery

Last synced: 07 Feb 2025

https://github.com/tuancamtbtx/dataform-utils

Bigquery Dataform Javascript Utils Package - Support Ads, Query Common, ...

bigquery dataform datawarehouse

Last synced: 17 Feb 2025

https://github.com/shvetsihorr/sql-projects

SQL and Google BigQuery-Portfolio Projects

azuredatastudio bigquery mssql postgresql sql

Last synced: 18 Jan 2025

https://github.com/richardbnk/data_tools

Python Library to Accelerate Creation of Data ETL Processes on multiple database systems.

bigquery etl gcp sql

Last synced: 02 Feb 2025

https://github.com/vigneshss-07/cloud-bigquery-and-sql---the-interview-guide

This deals with SQL commands, interview preparation and query questions and solutions in BigQuery

azuresql bigquery gcp sql sql-query sql-server sqlalchemy

Last synced: 15 Feb 2025

https://github.com/rrmcguinness/protoc-gen-bq-schema

A protocol buffer compiler (protoc) plugin for generating Google BigQuery JSON table definitions.

bigquery bigquery-schema protobuf

Last synced: 13 Jan 2025

https://github.com/rolandbende/python-bigquery-migrations

Python bigquery-migrations package is for creating and manipulating BigQuery databases easily.

bigquery google migration-automation migration-scripts migration-tool migrations python

Last synced: 24 Jan 2025

https://github.com/ddzikri/analisis-data-kimia-farma

Project Based Internship Kimia Farma Rakamin Academy

bigquery dataset sql

Last synced: 24 Jan 2025

https://github.com/stoqey/rasputia

Rasputia Latimore - The Big Data Bitch 💋

bigquery

Last synced: 19 Jan 2025

https://github.com/oguzgn/firebase-ab-test-analysis-for-a-mobile-race-game

This repository showcases an infrastructure designed for analyzing A/B tests in mobile games. It leverages BigQuery to process Firebase and GA4-based event data and uses Looker Studio for dynamic visualization. The project simplifies A/B test comparisons, enabling stakeholders to view results directly through interactive dashboards.

ab-testing ab-testing-analysis bigquery event-based-tracking firebase looker-studio mobile-game-analytics race-game sql

Last synced: 26 Jan 2025

https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery

Google Apps Script to streamline loading CSV data from Google Cloud Storage (GCS) into BigQuery.

bigquery csv-import google-apps-script google-cloud-storage

Last synced: 13 Jan 2025

https://github.com/siriospa/gcp-helpers-bigquery

Helpers for Google Cloud BigQuery.

bigquery gcp google-cloud-platform sirio

Last synced: 16 Feb 2025

https://github.com/knands42/data-ingestion

Data Ingestion project to evaluate my Kotlin skill using concurrency

bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow

Last synced: 25 Jan 2025

https://github.com/chdl17/nyc_green_taxis_peak_hour_analysis

This project analyzes GCP BigQuery data and uses Looker Studio to build a Peak Hour Analysis.

bigquery gcp google-cloud-platform looker-studio sql

Last synced: 21 Nov 2024

https://github.com/jmfeck/bigquery-local-framework

This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.

bigquery data-engineering ingestion pandas python

Last synced: 18 Jan 2025

https://github.com/nghiant3110/google_fiber_bi_5

This is a BI Capstone project based on the Google Fiber dataset from Google BI Course

bigquery google-sheets looker-studio sql

Last synced: 15 Feb 2025

https://github.com/amitkumarj441/mysql2bigquery

A script to load a MySQL table in BigQuery. Extracts schema and data as JSON.

bigquery docker mysql scala

Last synced: 26 Jan 2025

https://github.com/noospheracr/twilio-segment-configs

Integration of Twilio Segment with Google BigQuery, Looker/PowerBI, and Google VertexAI to create a data-driven marketing platform

bigquery google-cloud-platform looker-studio marketing noosphera power-bi twilio-segment vertex-ai

Last synced: 22 Jan 2025

https://github.com/thanhloc81/customer-segmentation

✨ Analyze customer segments of Adventure World dataset

bigquery google-cloud powerbi sql

Last synced: 22 Jan 2025

https://github.com/jasontanx/mas-international-arrivals

Code repository about international arrivals into Malaysia

bigquery data-analytics data-engineering etl-pipeline international-arrivals

Last synced: 22 Jan 2025

https://github.com/hanif-syazul/analyzing-kimia-farma-sales-performance-with-gcp

This repository contains the final project for the Rakamin Big Data Analytics Internship. It include a complete dashboard of Kimia Farma's sales performance analysis from 2020 to 2023.

big-data-analytics bigquery internship-project kimia-farma looker-studio rakamin sql

Last synced: 22 Jan 2025

https://github.com/zeinhasan/etl-using-airflow

Extract Transform Load Using Airflow

airflow bigquery etl

Last synced: 22 Jan 2025

https://github.com/loinguyen3108/sportify-music-analysis

Engineered the streaming crawler pipeline using Kafka to extract, transform, and load Spotify data into PostgreSQL and ClickHouse for real-time analytics. Additionally, developed an automated batching pipeline using Airflow and Spark to efficiently ETL crawled data into BigQuery.

airflow bigquery clickhouse kafka pyspark spotify

Last synced: 14 Feb 2025

https://github.com/yasarsultan/olist_datawarehouse

An end-to-end data pipeline that extracts data, processes it, and then loads it into the BigQuery data warehouse.

airflow bigquery data-warehouse docker

Last synced: 22 Jan 2025

https://github.com/thecodersstudio/node-native-test-runner

Code samples and test cases showcasing the power of Node.js's native test runner for streamlined and efficient testing.

bigquery mock nodejs nodejs-test nodenativetestrunner test

Last synced: 22 Jan 2025

https://github.com/ahbiels/chatbot_analize_avaliation

Um bot feito no dialogflow cx que permite ao usuário avaliar um determinado produto da empresa. Após a avaliação, o bot ira fazer uma análise de sentimentos na avaliação do usuário, e armazenar o resultado da avaliação (juntamente com o texto da avaliação, nome do usuário e produto) dentro de um dataset no BigQuery

bigquery chatbot dataset dialogflow dialogflow-cx documentation flask gcp google-cloud iterator language-model nlu nlu-chatbot python sql

Last synced: 22 Jan 2025

https://github.com/mutaharshaik/airflow_retail_project

Airflow retail project using pipeline with BigQuery, dbt, Soda

airflow astro-cli astro-sdk bigquery datamodeling dbt docker etl-pipeline gcp snowflake soda

Last synced: 22 Jan 2025

https://github.com/victorcezeh/end-to-end-elt-pipeline

An end-to-end ELT project using the Brazilian E-Commerce dataset from Kaggle. This project demonstrates the use of Python, PostgreSQL, Docker, Docker Compose, Airflow, dbt, and BigQuery to ingest, transform, and analyze data, providing insights into sales, delivery times, and order distributions.

airflow bigquery dbt-core docker docker-compose postgresql python

Last synced: 22 Jan 2025

https://github.com/lisabensoussan/bigdata_midterm

This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.

bigquery data-cleaning sql unix-command unix-shell

Last synced: 22 Jan 2025

https://github.com/mattwelke/charter-challenge-for-fair-voting-bot

Bot that web scrapes and logs in BigQuery the donations so far of the Charter Challenge for Fair Voting.

bigquery bot go openwhisk public-data

Last synced: 22 Jan 2025

https://github.com/fahmiaziz98/sql_agent

build sql agent using different pattern rag/self-correction/optimization

agent bigquery langchain sql sql-agent sqlite toolkit

Last synced: 22 Jan 2025

https://github.com/acardosolima/crypto-ethereum-tokens

This project aims to create a data pipeline using Airflow to ingest dataset from Google Bigquery to a PostgreSQL database. This stack will run in a local environment using Kubernetes.

airflow bigquery postgresql python

Last synced: 22 Jan 2025

https://github.com/nszoni/dbtgen

dbt: write nothing, generate (almost) everything.

analytics bigquery dbt documentation generative-ai github tooling

Last synced: 31 Jan 2025

https://github.com/aafaf-arharas/de-zoomcamp2025

This repository contains my work completed during the Data Engineering Zoomcamp 2025

bigquery docker gcp kestra python sql terraform

Last synced: 14 Feb 2025

https://github.com/robinnoiret/importcsv_zendeskbigquery

This project involves developing a Python script to import csv export from Zendesk to BigQuery. It is not intended for recurring use, but to enable an initial dump of historical data.

bigquery connector export-csvfile json zendesk

Last synced: 22 Jan 2025

BigQuery Awesome Lists
BigQuery Categories