Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/unytics/airbyte_serverless

Airbyte made simple (no UI, no database, no cluster)

airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline

Last synced: 19 Jan 2025

https://github.com/google/vscode-bigquery

A Visual Studio Code plugin for running BigQuery queries.

bigquery extension sql vscode

Last synced: 23 Jan 2025

https://github.com/sambacha/dune-snippets

dune snippets is a collection of sql queries for duneanalytics.com / Google BigQuery

analytics bigquery crypto defi dune eth ethereum orderbok solidity sql tick-data

Last synced: 22 Jan 2025

https://github.com/gojekfarm/beast

[Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose

beast bigquery dataops kafka warehouse

Last synced: 23 Jan 2025

https://github.com/scalefreecom/datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.

azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea

Last synced: 24 Jan 2025

https://github.com/cata-network/cata_database

CATA.Search. Blockchain database, cata metadata query

bigquery blockchain database drill

Last synced: 12 Oct 2024

https://github.com/google/megalista

First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

audience-targeting audiences bigquery conversions customermatch data-integration dataflow google googleads googleanalytics python

Last synced: 15 Jan 2025

https://github.com/kikinteractive/go-bqstreamer

Stream data into Google BigQuery concurrently using InsertAll()

bigquery go golang

Last synced: 23 Jan 2025

https://github.com/googlecloudplatform/bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

bigdata bigquery data-catalog data-governance data-lineage data-management dataflow zetasql

Last synced: 07 Oct 2024

https://github.com/embulk/embulk-output-bigquery

Embulk output plugin to load/insert data into Google BigQuery

bigquery embulk jruby

Last synced: 24 Jan 2025

https://github.com/ScalefreeCOM/datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.

azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea

Last synced: 13 Nov 2024

https://github.com/googlecloudplatform/dataproc-templates

Dataproc templates and pipelines for solving simple in-cloud data tasks

apache-spark bigquery gcp google-cloud google-cloud-platform jupyter-notebook pyspark

Last synced: 25 Jan 2025

https://github.com/allegro/bigflow

A Python framework for data processing on GCP.

airflow-dag beam bigquery composer dag dataflow dataproc gcp python python-framework workflows

Last synced: 25 Jan 2025

https://github.com/googlecloudplatform/bigquery-geo-viz

Visualize Google BigQuery geospatial data using Google Maps Platform APIs

bigquery data-visualization examples gis

Last synced: 19 Jan 2025

https://github.com/datainsider-co/rocket-bi

A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica

analytics bigdata bigquery bussiness-intelligence clickhouse dashboard data etl hacktoberfest hacktoberfest2023 ingestion mysql postgresql vertica

Last synced: 20 Jan 2025

https://github.com/mikeghen/airflow-tutorial

Use Airflow to move data from multiple MySQL databases to BigQuery

airflow bigquery mysql-database

Last synced: 11 Oct 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 26 Jan 2025

https://github.com/gabfl/bigquery_fdw

BigQuery Foreign Data Wrapper for PostgreSQL

bigquery fdw postgresql postgresql-extension

Last synced: 10 Jan 2025

https://github.com/autotraderuk/dbt-dry-run

Dry run capability for dbt projects using BigQuery

bigquery dbt testing

Last synced: 22 Jan 2025

https://github.com/googlecloudplatform/dlp-dataflow-deidentification

Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP

beam bigquery data dataflow dlp pii tokenization

Last synced: 20 Jan 2025

https://github.com/servian/bigquery-view-analyzer

A command-line tool for managing permissions and dependencies for BigQuery authorized views

bigquery google-cloud iam python

Last synced: 19 Dec 2024

https://github.com/shinichi-takii/ddlparse

DDL parase and Convert to BigQuery JSON schema and DDL statements

bigquery ddl-parse ddl-parser maria mysql oracle postgresql python redshift sql

Last synced: 21 Jan 2025

https://github.com/ExpediaGroup/circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.

big-data bigquery hive hive-metastore hive-table replicate-data replication s3

Last synced: 18 Nov 2024

https://github.com/expediagroup/circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.

big-data bigquery hive hive-metastore hive-table replicate-data replication s3

Last synced: 19 Dec 2024

https://github.com/google/fhir-py

Python utilities for working with FHIR, including libraries to build simple, flat FHIR views in BigQuery.

bigquery fhir python

Last synced: 22 Jan 2025

https://github.com/snowplow/sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake

bigquery postgresql redshift snowflake snowplow sql-runner

Last synced: 09 Nov 2024

https://github.com/rupurt/odbc-scanner-duckdb-extension

A DuckDB extension to read data directly from databases supporting the ODBC interface

analytics bigquery columnar-database cpp data-engineering db2 duckdb mariadb mssql mysql nix odbc olap oracle postgres snowflake vector-engine

Last synced: 12 Oct 2024

https://github.com/starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery data-engineering data-integration data-pipeline etl hdfs redshift snowflake spark synapse

Last synced: 22 Jan 2025

https://github.com/samelamin/spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

bigquery data-frame schema spark

Last synced: 12 Oct 2024

https://github.com/empower-ai/sql-agent

Ai Agent that helps you do data analytics with natural language.

analytics bigquery chatgpt chatgpt-bot data data-analytics data-science mysql postgresql slack slack-bot slackbot

Last synced: 14 Nov 2024

https://github.com/nevillelyh/shapeless-datatype

Shapeless utilities for common data types

avro bigquery datastore google-cloud scala shapeless tensorflow

Last synced: 20 Jan 2025

https://github.com/bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

analytics bigquery data-analysis data-modeling data-pipelines data-transformation python snowflake sql

Last synced: 23 Jan 2025

https://github.com/HTTPArchive/bigquery

BigQuery import and processing pipelines

bigquery

Last synced: 02 Nov 2024

https://github.com/httparchive/bigquery

BigQuery import and processing pipelines

bigquery

Last synced: 20 Dec 2024

https://github.com/stanford-esrg/gps

GPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.

bigquery internet-wide-scanning ipv4 network port-scan port-scanner port-scanning scanning security security-scanner security-tools zgrab zmap

Last synced: 30 Nov 2024

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 12 Oct 2024

https://github.com/tiboun/python-bigquery-test-kit

BigQuery test kit is a framework written in python that allows you to be more confident in your SQL and check that they are ready to prod. Rendering SQL template is part of this framework as well if you rely, for instance, on Airflow to orchestrate your jobs and their macros.

bigquery bq-test-kit framework integration-testing templates testing testing-tools tests

Last synced: 29 Oct 2024

https://github.com/googlecloudplatform/dlp-pdf-redaction

This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.

bigquery cloud cloudfunctions cloudrun cloudstorage cloudworkflows datalossprevention dlp documents gcp mask ocr pdf redaction serverless terraform tesseract workflows

Last synced: 23 Jan 2025

https://github.com/vigneshss-07/cloud-ai-analytics

This Repo contain details related to Data Engineering tech stacks in GCP

apachebeam bigdata bigquery clouddataflow cloudsql datalab google-cloud-platform spark

Last synced: 20 Jan 2025

https://github.com/squashql/squashql

Official repository of SquashQL, the SQL query engine for multi-dimensional and hierarchical analysis that empowers your SQL database

bigquery clickhouse database duckdb java jdbc query querybuilder snowflake spark sql typescript

Last synced: 14 Dec 2024

https://github.com/googlecloudplatform/datacatalog-tag-engine

Tag Engine automates the process of creating, updating, deleting, and populating metadata in bulk with the Google Cloud services Data Catalog and Dataplex. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.

bigquery cloud-run cloud-storage data-catalog dataplex firestore

Last synced: 23 Jan 2025

https://github.com/einride/protobuf-bigquery-go

Seamlessly save and load protocol buffers to and from BigQuery using Go.

bigquery go golang google-cloud protobuf protobufs protocol-buffers

Last synced: 23 Jan 2025

https://github.com/urish/bigtsquery

Search Engine for TypeScript Code using AST Queries

angular bigquery search tsquery typescript

Last synced: 02 Nov 2024

https://github.com/alvarowolfx/weather-station-gcp-mongoose-os

A Weather station made with an ESP32, sending data through Google Cloud IoT Core and storing in BigQuery

bigquery cloud-iot firebase google-cloud iot iot-core mongoose-os serverless

Last synced: 10 Jan 2025

https://github.com/openbridge/ob_google-bigquery

This service is meant to simplify running Google Cloud operations, especially BigQuery tasks. This means you do not have to worry about installation, configuration or ongoing maintenance related to an SDK environment. This can be helpful to those who would prefer to not to be responsible for those activities.

bigquery database google-analytics google-cloud google-cloud-platform google-cloud-sdk sql

Last synced: 14 Nov 2024

https://github.com/mchmarny/github-activity-counter

Cloud Run service for GitHub event Webhook to monitor repo or org activity in real-time in Stackdriver and analyze activity through ad-hoc SQL queries in BigQuery

bigquery cloudrun dataflow github pubsub stackdriver webhook

Last synced: 08 Nov 2024

https://github.com/mundipagg/amora-data-build-tool

Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.

analytics analytics-dashboard analytics-engineering bigquery business-intelligence data-engineering data-modeling datacleaning dataquality elt machine-learning python transformation

Last synced: 18 Nov 2024

https://github.com/kbhattac/coolretailer

Microservices with Istio, gRPC, Redis, BigQuery, Spring Boot, Spring Cloud and Stackdriver

bigquery google-cloud google-kubernetes-engine grafana grpc istio kiali locust microservices redis spring-boot spring-cloud zipkin

Last synced: 30 Oct 2024

https://github.com/m-lab/prometheus-bigquery-exporter

An exporter for converting BigQuery results into Prometheus metrics

bigquery exporter monitoring prometheus

Last synced: 17 Dec 2024

https://github.com/winwiz1/crisp-bigquery

Starter project with full stack BigQuery. Allows to overcome customisation restrictions imposed by pre-built dashboards and control data usage. Deploy your own cloud website hydrated by sample BigQuery data in 15 min without installing any development software.

bigquery boilerplate containerization docker express fullstack google-bigquery nodejs react typescript

Last synced: 26 Oct 2024

https://github.com/minodisk/bigquery-runner

An extension to query BigQuery directly and view the results in VSCode.

bigquery extension query sql standard-sql vscode vscode-extension

Last synced: 19 Oct 2024

https://github.com/tharwaninitin/etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.

aws bigquery dataproc etl etl-framework etl-pipeline gcp gcs redis s3 scala spark zio

Last synced: 22 Jan 2025

https://github.com/data-integrations/google-cloud

A collection of Google Cloud Platform (GCP) plugins

bigquery cdap cdap-plugin gcs google pubsub

Last synced: 06 Nov 2024

https://github.com/jroakes/gsc-logger

Google Search Console Logger for Google App Engine

bigquery google google-appengine search-engine

Last synced: 23 Dec 2024

https://github.com/cloudyr/bigQueryR

R Interface with Google BigQuery

api bigquery cloudyr google googleauthr r

Last synced: 04 Dec 2024

https://github.com/snowplow/data-models

⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.

bigquery redshift snowflake snowplow sql

Last synced: 09 Nov 2024

https://github.com/qnighy/bqpb

BigQuery UDF to parse protobuf messages

bigquery javascript protobuf protocol-buffers

Last synced: 22 Oct 2024

https://github.com/googlecloudplatform/spark-on-k8s-gcp-examples

Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub

bigquery cloud-pubsub gcs gcs-connector kubernetes spark

Last synced: 22 Jan 2025

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 12 Oct 2024

https://github.com/alexolivier/flight2bq

RTLSDR ADS-B dump1090 to Google BigQuery

ads-b aircraft bigquery google-bigquery raspberry-pi rtl-sdr rtlsdr

Last synced: 08 Nov 2024

https://github.com/z3z1ma/target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.

bigquery data meltano pipelines singer

Last synced: 19 Jan 2025

https://github.com/starlake-ai/jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

bigquery databricks duckdb java query redshift rewrite snowflake transpiler

Last synced: 19 Nov 2024

https://github.com/debussy-labs/debussy_concert

Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.

airflow airflow-operators airflow-plugin big-data-platform bigquery data-architecture data-engineering data-pipeline dataform dataproc dbt gcp google-cloud mssql mysql postgresql spark sql workflow

Last synced: 10 Jan 2025

https://github.com/google/data-quality-monitor

Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.

bigquery cloudstorage data-quality-checks gcp google-cloud-platform python terraform

Last synced: 04 Dec 2024

https://github.com/thdk/team-timesheets

Time tracking web app built as a replacement for old school timesheets.

bigquery firebase firebase-firestore mobx mobx-react time-registration time-tracker time-tracking timesheets typescript

Last synced: 07 Nov 2024

https://github.com/bolcom/hive_compared_bq

hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.

bigquery data-quality hive python validation

Last synced: 15 Dec 2024

https://github.com/tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

bigquery data-catalog data-discovery data-engineering data-governance data-lineage data-management data-ops dbt gcs lineage redash s3 sql

Last synced: 19 Nov 2024

https://github.com/johannes-berggren/firestore-to-bigquery-export

NPM package for copying and converting Cloud Firestore data to BigQuery.

bigquery bigquery-export datasets firestore firestore-collections schema

Last synced: 28 Oct 2024

https://github.com/sungchun12/serverless-data-pipeline-gcp

:factory: Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pub/sub

bigquery bigquery-schema cicd-promote-to-production cloud-build cloud-functions cloud-scheduler etl-pipeline google-cloud-platform python3 sql stackdriver-trace

Last synced: 28 Oct 2024

https://github.com/mercari/dataflowtemplates

Convenient Dataflow pipelines for transforming data between cloud data sources

apache-beam bigquery dataflow dataflow-templates spanner

Last synced: 09 Nov 2024

https://github.com/googlecloudplatform/deviceconnect

https://deviceconnect.readthedocs.io/

bigquery cloudrun fitbit gcp

Last synced: 07 Oct 2024

https://github.com/bankyadam/not-so-bigquery

An emulator for the Google BigQuery, that can be run locally, backed by PostgreSQL.

bigquery development devtool emulator sql

Last synced: 15 Nov 2024

https://github.com/banditml/faucetml

High speed mini-batch data reading & preprocessing from BigQuery.

bigquery feature-engineering features machine-learning ml preprocessing pytorch

Last synced: 03 Dec 2024

https://github.com/nownabe/go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

bigquery etl golang google-cloud google-cloud-functions google-cloud-storage

Last synced: 01 Dec 2024

https://github.com/mlr-org/mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases

bigquery data-backend database duckdb machine-learning mariadb mlr3 mysql odbc postgresql r r-package spark sqlite

Last synced: 14 Oct 2024

https://github.com/googlecloudplatform/bq-utilization-alerts

A serverless bot which periodically checks configured BigQuery capacity commitments, reservations and assignments against actual slot consumption of running jobs and reports findings to Slack/Google Chat.

bigquery bot chat-ops cloud-run cloud-scheduler google-chat google-cloud serverless slack slots

Last synced: 07 Oct 2024

https://github.com/ronoaldo/aetools

Utilities to build and manage Google App Engine apps

bigquery datastore go

Last synced: 23 Jan 2025

https://github.com/googleclouddataproc/hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

apache bigquery gcp google hadoop hive

Last synced: 05 Nov 2024

https://github.com/miraisolutions/sparkbq

Sparklyr extension package to connect to Google BigQuery

bigquery r spark sparklyr

Last synced: 18 Nov 2024

https://github.com/nodefluent/bigquery-kafka-connect

:cloud: nodejs kafka connect connector for Google BigQuery

big-data bigquery connect etl google-cloud kafka kafka-connect nodejs

Last synced: 11 Nov 2024

https://github.com/googlecloudplatform/bigquery-dlp-remote-function

Use Remote Functions to tokenize data with DLP in BigQuery using SQL

bigquery cloud-run data-loss-prevention dlp google-cloud

Last synced: 07 Oct 2024

https://github.com/ocadaruma/scalikejdbc-bigquery

ScalikeJDBC extension for Google BigQuery

bigquery scala scalikejdbc

Last synced: 12 Oct 2024

BigQuery Awesome Lists
BigQuery Categories