Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/googlecloudplatform/cortex-data-foundation

Data Foundation - Google Cloud Cortex Framework

airflow bigquery cloud google googlecloud salesforce sap

Last synced: 07 Oct 2024

https://github.com/gojekfarm/beast

[Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose

beast bigquery dataops kafka warehouse

Last synced: 29 Sep 2024

https://github.com/sambacha/dune-snippets

dune snippets is a collection of sql queries for duneanalytics.com / Google BigQuery

analytics bigquery crypto defi dune eth ethereum orderbok solidity sql tick-data

Last synced: 12 Oct 2024

https://github.com/unytics/airbyte_serverless

Airbyte made simple (no UI, no database, no cluster)

airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline

Last synced: 12 Nov 2024

https://github.com/cata-network/cata_database

CATA.Search. Blockchain database, cata metadata query

bigquery blockchain database drill

Last synced: 12 Oct 2024

https://github.com/rounds/go-bqstreamer

Stream data into Google BigQuery concurrently using InsertAll()

bigquery go golang

Last synced: 06 Aug 2024

https://github.com/kikinteractive/go-bqstreamer

Stream data into Google BigQuery concurrently using InsertAll()

bigquery go golang

Last synced: 29 Sep 2024

https://github.com/googlecloudplatform/bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

bigdata bigquery data-catalog data-governance data-lineage data-management dataflow zetasql

Last synced: 07 Oct 2024

https://github.com/google/megalista

First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).

audience-targeting audiences bigquery conversions customermatch data-integration dataflow google googleads googleanalytics python

Last synced: 30 Oct 2024

https://github.com/embulk/embulk-output-bigquery

Embulk output plugin to load/insert data into Google BigQuery

bigquery embulk jruby

Last synced: 12 Oct 2024

https://github.com/ScalefreeCOM/datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.

azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea

Last synced: 02 Aug 2024

https://github.com/scalefreecom/datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.

azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea

Last synced: 30 Oct 2024

https://github.com/googlecloudplatform/dataproc-templates

Dataproc templates and pipelines for solving simple in-cloud data tasks

apache-spark bigquery gcp google-cloud google-cloud-platform jupyter-notebook pyspark

Last synced: 07 Oct 2024

https://github.com/allegro/bigflow

A Python framework for data processing on GCP.

airflow-dag beam bigquery composer dag dataflow dataproc gcp python python-framework workflows

Last synced: 01 Nov 2024

https://github.com/googlecloudplatform/bigquery-geo-viz

Visualize Google BigQuery geospatial data using Google Maps Platform APIs

bigquery data-visualization examples gis

Last synced: 07 Oct 2024

https://github.com/mikeghen/airflow-tutorial

Use Airflow to move data from multiple MySQL databases to BigQuery

airflow bigquery mysql-database

Last synced: 11 Oct 2024

https://github.com/datainsider-co/rocket-bi

A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica

analytics bigdata bigquery bussiness-intelligence clickhouse dashboard data etl hacktoberfest hacktoberfest2023 ingestion mysql postgresql vertica

Last synced: 13 Oct 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 13 Oct 2024

https://github.com/gabfl/bigquery_fdw

BigQuery Foreign Data Wrapper for PostgreSQL

bigquery fdw postgresql postgresql-extension

Last synced: 13 Oct 2024

https://github.com/servian/bigquery-view-analyzer

A command-line tool for managing permissions and dependencies for BigQuery authorized views

bigquery google-cloud iam python

Last synced: 13 Oct 2024

https://github.com/ExpediaGroup/circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.

big-data bigquery hive hive-metastore hive-table replicate-data replication s3

Last synced: 04 Aug 2024

https://github.com/shinichi-takii/ddlparse

DDL parase and Convert to BigQuery JSON schema and DDL statements

bigquery ddl-parse ddl-parser maria mysql oracle postgresql python redshift sql

Last synced: 31 Oct 2024

https://github.com/expediagroup/circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.

big-data bigquery hive hive-metastore hive-table replicate-data replication s3

Last synced: 13 Oct 2024

https://github.com/autotraderuk/dbt-dry-run

Dry run capability for dbt projects using BigQuery

bigquery dbt testing

Last synced: 05 Nov 2024

https://github.com/googlecloudplatform/dlp-dataflow-deidentification

Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP

beam bigquery data dataflow dlp pii tokenization

Last synced: 07 Oct 2024

https://github.com/snowplow/sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake

bigquery postgresql redshift snowflake snowplow sql-runner

Last synced: 09 Nov 2024

https://github.com/google/fhir-py

Python utilities for working with FHIR, including libraries to build simple, flat FHIR views in BigQuery.

bigquery fhir python

Last synced: 30 Oct 2024

https://github.com/rupurt/odbc-scanner-duckdb-extension

A DuckDB extension to read data directly from databases supporting the ODBC interface

analytics bigquery columnar-database cpp data-engineering db2 duckdb mariadb mssql mysql nix odbc olap oracle postgres snowflake vector-engine

Last synced: 12 Oct 2024

https://github.com/samelamin/spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

bigquery data-frame schema spark

Last synced: 12 Oct 2024

https://github.com/nevillelyh/shapeless-datatype

Shapeless utilities for common data types

avro bigquery datastore google-cloud scala shapeless tensorflow

Last synced: 30 Oct 2024

https://github.com/HTTPArchive/bigquery

BigQuery import and processing pipelines

bigquery

Last synced: 02 Nov 2024

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 12 Oct 2024

https://github.com/tiboun/python-bigquery-test-kit

BigQuery test kit is a framework written in python that allows you to be more confident in your SQL and check that they are ready to prod. Rendering SQL template is part of this framework as well if you rely, for instance, on Airflow to orchestrate your jobs and their macros.

bigquery bq-test-kit framework integration-testing templates testing testing-tools tests

Last synced: 29 Oct 2024

https://github.com/stanford-esrg/gps

GPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.

bigquery internet-wide-scanning ipv4 network port-scan port-scanner port-scanning scanning security security-scanner security-tools zgrab zmap

Last synced: 04 Aug 2024

https://github.com/starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery data-engineering data-integration data-pipeline etl hdfs redshift snowflake spark synapse

Last synced: 29 Oct 2024

https://github.com/urish/bigtsquery

Search Engine for TypeScript Code using AST Queries

angular bigquery search tsquery typescript

Last synced: 02 Nov 2024

https://github.com/einride/protobuf-bigquery-go

Seamlessly save and load protocol buffers to and from BigQuery using Go.

bigquery go golang google-cloud protobuf protobufs protocol-buffers

Last synced: 12 Oct 2024

https://github.com/googlecloudplatform/datacatalog-tag-engine

Tag Engine automates the process of creating, updating, deleting, and populating metadata in bulk with Google Cloud's Data Catalog. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.

bigquery cloud-run cloud-storage data-catalog firestore

Last synced: 07 Oct 2024

https://github.com/mchmarny/github-activity-counter

Cloud Run service for GitHub event Webhook to monitor repo or org activity in real-time in Stackdriver and analyze activity through ad-hoc SQL queries in BigQuery

bigquery cloudrun dataflow github pubsub stackdriver webhook

Last synced: 08 Nov 2024

https://github.com/winwiz1/crisp-bigquery

Starter project with full stack BigQuery. Allows to overcome customisation restrictions imposed by pre-built dashboards and control data usage. Deploy your own cloud website hydrated by sample BigQuery data in 15 min without installing any development software.

bigquery boilerplate containerization docker express fullstack google-bigquery nodejs react typescript

Last synced: 26 Oct 2024

https://github.com/minodisk/bigquery-runner

An extension to query BigQuery directly and view the results in VSCode.

bigquery extension query sql standard-sql vscode vscode-extension

Last synced: 19 Oct 2024

https://github.com/kbhattac/coolretailer

Microservices with Istio, gRPC, Redis, BigQuery, Spring Boot, Spring Cloud and Stackdriver

bigquery google-cloud google-kubernetes-engine grafana grpc istio kiali locust microservices redis spring-boot spring-cloud zipkin

Last synced: 30 Oct 2024

https://github.com/tharwaninitin/etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.

aws bigquery dataproc etl etl-framework etl-pipeline gcp gcs redis s3 scala spark zio

Last synced: 12 Oct 2024

https://github.com/data-integrations/google-cloud

A collection of Google Cloud Platform (GCP) plugins

bigquery cdap cdap-plugin gcs google pubsub

Last synced: 06 Nov 2024

https://github.com/cloudyr/bigQueryR

R Interface with Google BigQuery

api bigquery cloudyr google googleauthr r

Last synced: 13 Aug 2024

https://github.com/snowplow/data-models

⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.

bigquery redshift snowflake snowplow sql

Last synced: 09 Nov 2024

https://github.com/qnighy/bqpb

BigQuery UDF to parse protobuf messages

bigquery javascript protobuf protocol-buffers

Last synced: 22 Oct 2024

https://github.com/googlecloudplatform/spark-on-k8s-gcp-examples

Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub

bigquery cloud-pubsub gcs gcs-connector kubernetes spark

Last synced: 28 Sep 2024

https://github.com/googlecloudplatform/dlp-pdf-redaction

This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.

bigquery cloud cloudfunctions cloudrun cloudstorage cloudworkflows datalossprevention dlp documents gcp mask ocr pdf redaction serverless terraform tesseract workflows

Last synced: 07 Oct 2024

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 12 Oct 2024

https://github.com/alexolivier/flight2bq

RTLSDR ADS-B dump1090 to Google BigQuery

ads-b aircraft bigquery google-bigquery raspberry-pi rtl-sdr rtlsdr

Last synced: 08 Nov 2024

https://github.com/thdk/team-timesheets

Time tracking web app built as a replacement for old school timesheets.

bigquery firebase firebase-firestore mobx mobx-react time-registration time-tracker time-tracking timesheets typescript

Last synced: 07 Nov 2024

https://github.com/z3z1ma/target-bigquery

target-bigquery is a Singer target for BigQuery. It supports storage write, GCS, streaming, and batch load methods. Built with the Meltano SDK.

bigquery data meltano pipelines singer

Last synced: 26 Oct 2024

https://github.com/google/data-quality-monitor

Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.

bigquery cloudstorage data-quality-checks gcp google-cloud-platform python terraform

Last synced: 13 Aug 2024

https://github.com/sungchun12/serverless-data-pipeline-gcp

:factory: Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pub/sub

bigquery bigquery-schema cicd-promote-to-production cloud-build cloud-functions cloud-scheduler etl-pipeline google-cloud-platform python3 sql stackdriver-trace

Last synced: 28 Oct 2024

https://github.com/johannes-berggren/firestore-to-bigquery-export

NPM package for copying and converting Cloud Firestore data to BigQuery.

bigquery bigquery-export datasets firestore firestore-collections schema

Last synced: 28 Oct 2024

https://github.com/mercari/dataflowtemplates

Convenient Dataflow pipelines for transforming data between cloud data sources

apache-beam bigquery dataflow dataflow-templates spanner

Last synced: 09 Nov 2024

https://github.com/googlecloudplatform/deviceconnect

https://deviceconnect.readthedocs.io/

bigquery cloudrun fitbit gcp

Last synced: 07 Oct 2024

https://github.com/banditml/faucetml

High speed mini-batch data reading & preprocessing from BigQuery.

bigquery feature-engineering features machine-learning ml preprocessing pytorch

Last synced: 01 Nov 2024

https://github.com/mlr-org/mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases

bigquery data-backend database duckdb machine-learning mariadb mlr3 mysql odbc postgresql r r-package spark sqlite

Last synced: 14 Oct 2024

https://github.com/googlecloudplatform/bq-utilization-alerts

A serverless bot which periodically checks configured BigQuery capacity commitments, reservations and assignments against actual slot consumption of running jobs and reports findings to Slack/Google Chat.

bigquery bot chat-ops cloud-run cloud-scheduler google-chat google-cloud serverless slack slots

Last synced: 07 Oct 2024

https://github.com/miraisolutions/sparkbq

Sparklyr extension package to connect to Google BigQuery

bigquery r spark sparklyr

Last synced: 03 Aug 2024

https://github.com/ronoaldo/aetools

Utilities to build and manage Google App Engine apps

bigquery datastore go

Last synced: 29 Sep 2024

https://github.com/googleclouddataproc/hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

apache bigquery gcp google hadoop hive

Last synced: 05 Nov 2024

https://github.com/googlecloudplatform/bigquery-dlp-remote-function

Use Remote Functions to tokenize data with DLP in BigQuery using SQL

bigquery cloud-run data-loss-prevention dlp google-cloud

Last synced: 07 Oct 2024

https://github.com/nodefluent/bigquery-kafka-connect

:cloud: nodejs kafka connect connector for Google BigQuery

big-data bigquery connect etl google-cloud kafka kafka-connect nodejs

Last synced: 11 Nov 2024

https://github.com/ocadaruma/scalikejdbc-bigquery

ScalikeJDBC extension for Google BigQuery

bigquery scala scalikejdbc

Last synced: 12 Oct 2024

https://github.com/hackersandslackers/bigquery-sqlalchemy-tutorial

:bar_chart: :arrow_right: :floppy_disk: ETL script to migrate data from BigQuery to SQL.

bigquery bigquery-sqlalchemy-tutorial databases etl mysql postgres python sql sqlalchemy tutorial

Last synced: 09 Nov 2024

https://github.com/hirosassa/bqvalid

SQL linter tool for BigQuery GoogleSQL (formerly known as StandardSQL).

bigquery google linter sql

Last synced: 02 Nov 2024

https://github.com/digitalghost-dev/stock-data-pipeline

Visualizing S&P 500 data on a webpage with Python.

bigquery google-cloud-platform python

Last synced: 06 Nov 2024

https://github.com/shinichi-takii/vscode-language-sql-bigquery

Syntax highlighting and code snippets for BigQuery SQL in Visual Studio Code

bigquery grammar snippets sql syntax-highlighting vscode vscode-extension

Last synced: 31 Oct 2024

https://github.com/naseemkullah/gcp-accountant

A tool to identify high cost resources in GCP at a granular level

bigquery cost cost-engineering cost-resources gcp gcp-accountant

Last synced: 09 Nov 2024

https://github.com/medjed/embulk-input-bigquery

BigQuery input plugin for Embulk loads records from BigQuery

bigquery embulk

Last synced: 12 Oct 2024

https://github.com/yoheimuta/dbq

CLI tool to easily Decorate BigQuery table name

bigquery bq cli golang table-decorator

Last synced: 13 Oct 2024

https://github.com/mesmacosta/bq-fake-pii-table-creator

Library for creating BigQuery tables with fake PII data

bigquery fake-data faker governance-dapps metadata piidata piii

Last synced: 11 Nov 2024

https://github.com/modataconsulting/dbt_ga4_project

This project uses Google Analytics 4 BigQuery Exports as its source data, and offers useful base transformations to provide report-ready dimension & fact models that can be used for reporting purposes, blending with other data, and/or feature engineering for ML models.

bigquery bq data-build-tool dbt ga4 google-analytics-4 sql

Last synced: 12 Oct 2024

https://github.com/googlecloudplatform/datacatalog-tag-history

Historical metadata of your data warehouse is a treasure trove to discover not just insights about changing data patterns, but also quality and user behaviour. This solution creates Data Catalog Tags history in BigQuery since Data Catalog keeps only the latest version of metadata for fast searchability.

analytics bigquery data-catalog data-governance metadata-management

Last synced: 28 Sep 2024

https://github.com/googlecloudplatform/cloud-composer-mssql-dataflow-bigquery

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.

airflow bigquery cloud-composer dataflow microsoft-sql-server

Last synced: 07 Oct 2024

https://github.com/livebook-dev/req_bigquery

Conveniences for querying Google BigQuery with Req

bigquery req

Last synced: 11 Nov 2024

https://github.com/sungchun12/schedule-python-script-using-google-cloud

:clock4: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job

appengine-python bigquery chicago-traffic cron google-cloud python-script

Last synced: 28 Oct 2024

https://github.com/oliveroneill/bigqueryswift

BigQuery client for Swift

bigquery google-cloud-platform swift

Last synced: 11 Oct 2024

https://github.com/data-tools/big-data-types

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

apache-spark bigquery bigquery-tables cassandra circe database-types scala schemas spark typeclass typeclass-derivation typesafe

Last synced: 12 Oct 2024

BigQuery Awesome Lists
BigQuery Categories