Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/vigneshss-07/mastering-sql-and-bigquery-on-google-cloud-platform

Take your Data Analytics skills to the next level with this comprehensive playlist. Learn SQL from the basics to advanced techniques while mastering BigQuery on Google Cloud.

analytics bigquery gcp sql

Last synced: 05 Jan 2025

https://github.com/eddieatgoogle/sql-based-genai-data-pipeline

GenAI data pipeline that performs data preparation, management and performance evaluation tasks for RAG systems using SQL as the primary development language. Please feel free to use this as a starting point for your own projects.

bigquery bqml dataform embeddings gemini google-cloud-platform sql vector-search vertex-ai

Last synced: 08 Jan 2025

https://github.com/moeabbas6/bq_data_loader

A Python script for executing and logging batch SQL commands in Google BigQuery. Includes tracking of execution times, unique job and statement IDs, and automated logging to a specified BigQuery table.

bigquery data python

Last synced: 29 Jan 2025

https://github.com/newtonmunene99/sec-filings

Simple golang app that crawls sec EDGAR filings and loads indices into Google BigQuery

bigquery cloudstorage gcp golang

Last synced: 21 Jan 2025

https://github.com/giorgishengelia/bike-share-analysis-report

Help developing marketing strategy using data analytics to help convert casual riders into members

bigquery sql tableau

Last synced: 05 Feb 2025

https://github.com/khanovico/energy-data-analysis

This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.

big-data-processing bigquery docker google-app-engine jupyter-notebook mlflow python scikit-learn seaborn xgboost

Last synced: 09 Feb 2025

https://github.com/jmfeck/bigquery-local-framework

This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.

bigquery data-engineering ingestion pandas python

Last synced: 18 Jan 2025

https://github.com/marceloneppel/gcs-to-bigquery

WIP: Moving data from GCS to BigQuery.

bigquery gcs scala scio

Last synced: 30 Jan 2025

https://github.com/bsrikanth24/etl-pipeline-project-sales

This project implements a pipeline ETL to process fictitious sales data.

bigquery pandas-dataframe python

Last synced: 06 Feb 2025

https://github.com/chdl17/nyc_green_taxis_peak_hour_analysis

This project analyzes GCP BigQuery data and uses Looker Studio to build a Peak Hour Analysis.

bigquery gcp google-cloud-platform looker-studio sql

Last synced: 21 Nov 2024

https://github.com/simhayn/genomics-cannabis-bigquery

BigQuery's Cannabis_Genomics Dataset Exploration using SQL in a Python Environment

big-data bigquery bioinformatics exploratory-data-analysis genomics python sql

Last synced: 22 Jan 2025

https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt

Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.

apache-airflow bigquery csv dbt python3 soda

Last synced: 28 Jan 2025

https://github.com/rafal-kowalski-dev/selling-cars-analize

Hobby project for learning PySpark, AirFlow and BigQuery

airflow bigquery gcp pyspark python sqlalchemy

Last synced: 30 Jan 2025

https://github.com/seahrh/nyc-taxi-trips

REST API for the New York City Taxi Trips public dataset, implemented in Scala and Play Framework 2.7

bigquery nyc-taxi-dataset play-framework rest-api scala

Last synced: 03 Feb 2025

https://github.com/quipper/send-ci-result-to-bigquery-action

Send test results to BigQuery in GitHub Actions

bigquery github-actions google-bigquery junit-xml

Last synced: 09 Jan 2025

https://github.com/sintef/bigquery-postgresql-wire-proxy

A PostgreSQL wire protocol proxy server for BigQuery.

bigquery postgresql proxy

Last synced: 12 Jan 2025

https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer

Mackerel plugin to post bigquery's query result

bigquery mackerel-plugin

Last synced: 12 Oct 2024

https://github.com/francois-lenne/play-bq-gcp

Data pipeline in order to retrieve data from the playstation API to BigQuery

bigquery cicd data-engineering google-cloud python

Last synced: 13 Jan 2025

https://github.com/ackeecz/terraform-gcp-cloud-function_pubsub_to_bq

Cloud function subscribes itself to given topic and inserts each message to BigQuery table.

bigquery cloud-functions pubsub terraform-module

Last synced: 07 Jan 2025

https://github.com/alessio-siciliano/bigquery-advanced-utils

BigQuery-advanced-utils is a lightweight utility library that extends the official Google BigQuery Python client. It simplifies tasks like query management, data processing, and automation. Aimed at developers and data scientists, the project is open to contributions to improve and enhance its functionality.

bigquery datatransfer google-cloud python

Last synced: 01 Feb 2025

https://github.com/ivdatahub/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 21 Nov 2024

https://github.com/sayed-ashfaq/target-sql

In this project, I analyzed Target company's data using SQL in BigQuery, focusing on data extraction, manipulation, and performing various analytical queries to derive insights.

aggregation bigquery cte joins sql

Last synced: 23 Dec 2024

https://github.com/ackeecz/terraform-gcp-cloud-run_pubsub_to_bq

Cloud Run subscribes itself to given topic and inserts each message to BigQuery table.

bigquery gcp pubsub terraform

Last synced: 07 Jan 2025

https://github.com/allanreda/share-of-search-retrieval-and-visualization

Share of search analysis including data retrieval from Google Ads API, storing data in BigQuery and visualizing it in Looker Studio

bigquery google-ads-api looker-studio python share-of-search

Last synced: 28 Dec 2024

https://github.com/gabrieladados/analise-ecommerce

Análise SQL para E-commerce: Estratégias de Crescimento para Impulsionar Vendas

bigquery data-analysis ecommerce sql

Last synced: 06 Feb 2025

https://github.com/siriospa/gcp-helpers-bigquery

Helpers for Google Cloud BigQuery.

bigquery gcp google-cloud-platform sirio

Last synced: 12 Oct 2024

https://github.com/vaibhavs10/ml-on-gcp

The repository walks through a Data Scientist focused way of building and deploying Machine Learning models on Google Cloud

aiplatform bigquery googlecloudplatform ml

Last synced: 06 Feb 2025

https://github.com/smohanta23/uber_data-engineering_etl-project

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api

Last synced: 21 Jan 2025

https://github.com/machinelearningzuu/data-engineering-projects

This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.

airflow bigquery data-engineering data-science data-visualization data-warehouse

Last synced: 04 Feb 2025

https://github.com/ivanildobarauna-dev/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 11 Dec 2024

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 27 Jan 2025

https://github.com/mchmarny/stocker

Using tweeter sentiment and stock market price signal correlation to predict next day closing price

bigquery ml prediction regression-models

Last synced: 31 Dec 2024

https://github.com/antbit96/dataform_poc

Template for basic data preparation

bigquery bigquery-dataform data-preparation

Last synced: 14 Dec 2024

https://github.com/oliveroneill/wilt-cloud-functions

Wilt Google Cloud Functions

bigquery google-cloud-functions

Last synced: 07 Jan 2025

https://github.com/davidkhala/dwh-migration-tools

dwh-migration-tools: contribution fork

bigquery bq gcp

Last synced: 23 Jan 2025

https://github.com/kavyachippada/hva

Mini-Hackathon 1.0

bigquery excel pandas powerbi sql

Last synced: 13 Oct 2024

https://github.com/sangnandar/insert-unique-record

This is Cloud Functions script to insert only unique records into BigQuery.

bigquery digital-marketing-analytics google-cloud-functions

Last synced: 29 Dec 2024

https://github.com/ivanildobarauna/pypi-package-stats

Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more

bigquery cloud data-engineering data-warehouse gcp software-engineering

Last synced: 29 Dec 2024

https://github.com/iht/bigquery-dataflow-cdc-example

A Dataflow streaming pipeline written in Java, reading data from Pubsub and recovering the sessions from potentially unordered data, and upserting the session data into BigQuery with no duplicates

apache-beam bigquery cdc dataflow google-cloud pubsub

Last synced: 29 Dec 2024

https://github.com/marielachirinosr/nyc-taxi-trip-exploration-2019-2020

Explores passenger behavior & impact of COVID-19 on NYC taxi industry (Q1 2019-2020).

bigquery data data-analysis data-visualization python sql tableau

Last synced: 29 Dec 2024

https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery

Google Apps Script to streamline loading CSV data from Google Cloud Storage (GCS) into BigQuery.

bigquery csv-import google-apps-script google-cloud-storage

Last synced: 13 Jan 2025

https://github.com/yandex-cloud-examples/yc-bigquery-to-object-storage

Экспорт данных из Google Big Query через Google Storage в Object Storage Yandex Cloud.

bigquery object-storage python3 yandex-cloud yandexcloud

Last synced: 29 Dec 2024

https://github.com/rrmcguinness/protoc-gen-bq-schema

A protocol buffer compiler (protoc) plugin for generating Google BigQuery JSON table definitions.

bigquery bigquery-schema protobuf

Last synced: 13 Jan 2025

https://github.com/kevin-rsj/real-estate-investments

Sistema de scoring que clasifica ciudades francesas para inversión en segundas viviendas según perfil de riesgo(alto, moderado y bajo). Evalúa ratios clave en áreas como demanda, disponibilidad, infraestructura, demografía y precios.

bigquery data-analytics looker-studio numpy pandas python sklearn-library sql visualization

Last synced: 09 Feb 2025

https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds

Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow

apache-arrow apache-beam bigquery dataflow neo4j python

Last synced: 23 Dec 2024

https://github.com/isaacmg/mimic_iv_bq_queries

Queries needed to recreate time series features for model training

bigquery mimic-iv sql

Last synced: 21 Jan 2025

https://github.com/yu-iskw/bigquery-lineage

Visualize BigQuery data lineage graph

bigquery data-governance data-management visualization

Last synced: 10 Feb 2025

https://github.com/ansh-info/stockpulse

Real-time stock market analytics pipeline with live visualization dashboard. Built with Python and GCP, featuring automated data processing and interactive Streamlit analytics.

api big-data bigquery cloud cloud-computing cloud-native data-engineering data-pipeline docker docker-compose gcp gcp-automation-gitops gcp-cloud-run gcp-pubsub google-cloud-platform real-time realtime stock-market stocks streamlit

Last synced: 27 Dec 2024

https://github.com/nghiant3110/e_com_1

This is a DA project base on E-com Data set (Thelook_ecom) in Big Query from Google

bigquery looker-studio sql

Last synced: 24 Dec 2024

https://github.com/nghiant3110/google_fiber_bi_5

This is a BI Capstone project based on the Google Fiber dataset from Google BI Course

bigquery google-sheets looker-studio sql

Last synced: 24 Dec 2024

https://github.com/nghiant3110/b2b_crm_3

This is a DA project based on the B2B Sales CRM dataset from Maven Analytics

bigquery google-sheets looker-studio sql

Last synced: 24 Dec 2024

https://github.com/davelester/gharchive-bigquery-examples

Examples Using BigQuery to Analyze GH Archive Data

bigquery gharchive

Last synced: 01 Feb 2025

https://github.com/andrewm4894/gcp-telemetry-example

Simple HTTP endpoint for telemetry data type events in GCP.

bigquery gcp-cloud-functions gcp-storage python terraform

Last synced: 01 Feb 2025

https://github.com/raqssoriano/hha504_assignment_nosql_dbs

This task is part of my assignment focused on creating and configuring databases in different platforms, such as GCP's BigQuery, MongoDB Atlas, and Redis Cloud.

bigquery mongodb-atlas mongodbcompass redis redisinsight

Last synced: 18 Dec 2024

https://github.com/adadalshabab/data-engineering-gcp-project

An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.

bigquery data data-science data-visualization databases dataengineering-a engineering etl-pipeline looker-studio powerbi

Last synced: 19 Dec 2024

https://github.com/scraly/flume-bigquery-sink

An Apache Flume Sink implementation to publish data to Google BigQuery

bigquery flume sink

Last synced: 25 Dec 2024

https://github.com/francois-lenne/elt-mp4-quiberon

the goal of this project is to retrieve the video of the municipality of quiberon and see if a person is in or no

bigquery cicd data-engineering docker elt google-cloud-functions google-cloud-platform google-cloud-run google-cloud-storage pipeline python sql unstructured-data

Last synced: 25 Dec 2024

https://github.com/fakhri098/project-sql-bigquery

This project aims to analyze taxi trip data with a focus on trip duration patterns, popular routes, and trip costs. The study was conducted to gain in-depth insights into taxi travel behavior based on historical data.

bigquery sql

Last synced: 17 Jan 2025

https://github.com/celiason/coffee-funnel

webpage for visualizing sales projections of a small coffee business

bigquery prophet sales-analysis streamlit-webapp

Last synced: 26 Dec 2024

https://github.com/lupusruber/music_analytics

This project processes real-time music event data using Kafka, Apache Spark on Google Cloud Dataproc, and stores the transformed data in BigQuery for analytics, all orchestrated by Airflow and managed with Terraform.

bigquery data-proc dimensional-modeling gcp-project kafka spark-structured-streaming

Last synced: 02 Feb 2025

https://github.com/paulveillard/cybersecurity-analytics

An ongoing collection of awesome software, libraries, learning tutorials, documents and books, technical resources and cool stuff about Analytics Engineering in Cybersecurity.

analytics bigdata bigquery cybernetics cybersecurity data data-engineering data-science encryption encryption-decryption seo seo-friendly seo-optimization

Last synced: 02 Feb 2025

https://github.com/spacepatcher/google-workspace-gmail-collector

👁 App for collecting Gmail logs from your Google Workspace account and sending them to Kafka

bigquery gmail google-workspace security soc

Last synced: 23 Oct 2024

https://github.com/prashhhant213/strategic-analysis-of-retail-brand-in-south-america-using-sql

Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services

bigquery database mysql-server sql

Last synced: 26 Dec 2024

https://github.com/rifa8/extract-load-demo

Learning Google Cloud Platform (GCP)

airbyte bigquery bucket gcp

Last synced: 27 Jan 2025

https://github.com/epomatti/gcp-bigquery

Data sync via CDC from GCP Cloud SQL to Big Query using Datastream

bigquery cloud-sql datastream gcp

Last synced: 17 Jan 2025

https://github.com/scraly/bigquery

Google BigQuery AaaS tools, tips and fun

bigquery java

Last synced: 25 Dec 2024

https://github.com/anyesh/gbq-helpers

GBQ related helper functions and snippets.

bigquery google

Last synced: 10 Jan 2025

https://github.com/owox/sgtm-owox-ga4-bigquery

OWOX BI Streaming is an advanced tracking to get the most from existing Google Analytics 4 installed on your website

analytics bigquery ga4

Last synced: 20 Dec 2024

https://github.com/entur/terraform-aiven-kafka-connect-bigquery-sink

Terraform module for BigQuery sink connector on Aiven KafkaConnect cluster

aiven bigquery kafka-connect sink-connector terraform terraform-modules

Last synced: 17 Jan 2025

https://github.com/ngangawairimu/clv-rfm-and-customer-segmentation-analysis

This project performs cohort analysis to estimate Customer Lifetime Value (CLV) by analyzing weekly revenue and user registrations over 12 weeks, forecasting future revenue, and providing actionable insights for marketing and business strategy.

bigquery clv-analysis cohort-analysis customer-segmentation excel rfm-analysis

Last synced: 03 Jan 2025

https://github.com/shikanime/seeker

Data platform based on BigQuery

bigquery dataform google-cloud

Last synced: 04 Jan 2025

https://github.com/marceloneppel/map-to-bigquery-structs

Tool to convert a Golang map to a struct containing fields with types like bigquery.Null*.

bigquery golang map struct

Last synced: 30 Jan 2025

https://github.com/tosh2230/cdc-rds-bq

Change data capture from Amazon RDS to Google BigQuery

bigquery changedatacapture rds

Last synced: 21 Jan 2025

https://github.com/janaom/gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.

airflow bigquery bigqueryml cloud-functions cloud-run-functions composer data-engineering-project google-cloud-platform

Last synced: 26 Jan 2025

https://github.com/yasarsultan/taxi-trip-analysis

The NYC Taxi Trip Batch Data Pipeline automates processing of large-scale trip data using Apache Spark and Airflow, integrating AWS S3 and Google BigQuery for storage and analytics. It features scalable, containerized workflows with robust data validation.

airflow aws-s3 bash-script batch-processing bigquery data-lake data-warehouse docker python3 spark

Last synced: 11 Jan 2025

https://github.com/codingsancho/fastapi-bigquery

Learning exercise, Python backend, FastAPI, bigquery, React-JS frontend.

bigquery fastapi javascript python react

Last synced: 20 Dec 2024

https://github.com/valenthr/purchase_funnel

Google merch store sales analysis

bigquery product-analysis sql

Last synced: 27 Jan 2025

https://github.com/santiago-giordano/aws-gcp-pipeline

Simple pipeline, downloads csv from aws bucket, does some transformations, creates tables in gcp bq, loads data, and runs queries

aws bigquery etl gcp jupyter pipeline python

Last synced: 12 Jan 2025

https://github.com/denisogr/kaggle-notebook-to-production

This is a study project. I get analytics/ML examples from Kaggle and use different technologies to re-implement them.

bigquery data-engineering gcp kaggle-competition kaggle-dataset python spark

Last synced: 12 Jan 2025

https://github.com/justinjsd/analytics-engineering

📊 A repository focusing on analytics engineering, particularly using dbt on the Northwind Sample dataset

analytics bigquery dbt engineering sql

Last synced: 12 Jan 2025

https://github.com/itsubaki/hermes-lambda

Transfers AWS cost data to BigQuery

aws bigquery

Last synced: 07 Feb 2025

https://github.com/shvetsihorr/sql-projects

SQL and Google BigQuery-Portfolio Projects

azuredatastudio bigquery mssql postgresql sql

Last synced: 18 Jan 2025

https://github.com/jasontanx/ridership-headline-project

This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.

bigquery data-engineering minio-server public-transport-ridership terraform

Last synced: 01 Feb 2025

https://github.com/adindasarianti/rakamin_kf_analytics

This repository contains my project as a Big Data Analytics intern at Kimia Farma, where I analyzed the performance of Kimia Farma from 2020 to 2023

bigquery dataanalytics lookerstudio

Last synced: 02 Jan 2025

https://github.com/rolandbende/python-bigquery-migrations

Python bigquery-migrations package is for creating and manipulating BigQuery databases easily.

bigquery google migration-automation migration-scripts migration-tool migrations python

Last synced: 24 Jan 2025

BigQuery Awesome Lists
BigQuery Categories