Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
BigQuery
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
- GitHub: https://github.com/topics/bigquery
- Wikipedia: https://en.wikipedia.org/wiki/BigQuery/
- Repo: https://github.com/GoogleCloudPlatform/bigquery-utils/
- Released: May 19, 2010
- Related Topics: cloud-computing,
- Aliases: bq,
- Last updated: 2024-11-12 00:03:12 UTC
- JSON Representation
https://github.com/googlecloudplatform/cortex-data-foundation
Data Foundation - Google Cloud Cortex Framework
airflow bigquery cloud google googlecloud salesforce sap
Last synced: 07 Oct 2024
https://github.com/unytics/airbyte_serverless
Airbyte made simple (no UI, no database, no cluster)
airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline
Last synced: 12 Nov 2024
https://github.com/cata-network/cata_database
CATA.Search. Blockchain database, cata metadata query
bigquery blockchain database drill
Last synced: 12 Oct 2024
https://github.com/googlecloudplatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
airflow bigquery cloud-composer cloud-native cloud-storage data-architecture data-engineering data-pipelines datasets google-cloud open-data
Last synced: 07 Oct 2024
https://github.com/googlegenomics/gcp-variant-transforms
GCP Variant Transforms
beam bigquery dataflow vcf-files
Last synced: 12 Oct 2024
https://github.com/rounds/go-bqstreamer
Stream data into Google BigQuery concurrently using InsertAll()
Last synced: 06 Aug 2024
https://github.com/kikinteractive/go-bqstreamer
Stream data into Google BigQuery concurrently using InsertAll()
Last synced: 29 Sep 2024
https://github.com/googlecloudplatform/bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
bigdata bigquery data-catalog data-governance data-lineage data-management dataflow zetasql
Last synced: 07 Oct 2024
https://github.com/google/megalista
First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).
audience-targeting audiences bigquery conversions customermatch data-integration dataflow google googleads googleanalytics python
Last synced: 30 Oct 2024
https://github.com/embulk/embulk-output-bigquery
Embulk output plugin to load/insert data into Google BigQuery
Last synced: 12 Oct 2024
https://github.com/ScalefreeCOM/datavault4dbt
Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.
azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea
Last synced: 02 Aug 2024
https://github.com/scalefreecom/datavault4dbt
Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.
azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea
Last synced: 30 Oct 2024
https://github.com/googlecloudplatform/dataproc-templates
Dataproc templates and pipelines for solving simple in-cloud data tasks
apache-spark bigquery gcp google-cloud google-cloud-platform jupyter-notebook pyspark
Last synced: 07 Oct 2024
https://github.com/allegro/bigflow
A Python framework for data processing on GCP.
airflow-dag beam bigquery composer dag dataflow dataproc gcp python python-framework workflows
Last synced: 01 Nov 2024
https://github.com/googlecloudplatform/bigquery-geo-viz
Visualize Google BigQuery geospatial data using Google Maps Platform APIs
bigquery data-visualization examples gis
Last synced: 07 Oct 2024
https://github.com/mikeghen/airflow-tutorial
Use Airflow to move data from multiple MySQL databases to BigQuery
airflow bigquery mysql-database
Last synced: 11 Oct 2024
https://github.com/datainsider-co/rocket-bi
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
analytics bigdata bigquery bussiness-intelligence clickhouse dashboard data etl hacktoberfest hacktoberfest2023 ingestion mysql postgresql vertica
Last synced: 13 Oct 2024
https://github.com/blockchain-etl/polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon
Last synced: 13 Oct 2024
https://github.com/gabfl/bigquery_fdw
BigQuery Foreign Data Wrapper for PostgreSQL
bigquery fdw postgresql postgresql-extension
Last synced: 13 Oct 2024
https://github.com/servian/bigquery-view-analyzer
A command-line tool for managing permissions and dependencies for BigQuery authorized views
bigquery google-cloud iam python
Last synced: 13 Oct 2024
https://github.com/ExpediaGroup/circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
big-data bigquery hive hive-metastore hive-table replicate-data replication s3
Last synced: 04 Aug 2024
https://github.com/shinichi-takii/ddlparse
DDL parase and Convert to BigQuery JSON schema and DDL statements
bigquery ddl-parse ddl-parser maria mysql oracle postgresql python redshift sql
Last synced: 31 Oct 2024
https://github.com/expediagroup/circus-train
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
big-data bigquery hive hive-metastore hive-table replicate-data replication s3
Last synced: 13 Oct 2024
https://github.com/autotraderuk/dbt-dry-run
Dry run capability for dbt projects using BigQuery
Last synced: 05 Nov 2024
https://github.com/googlecloudplatform/datashare-toolkit
DIY commercial datasets on Google Cloud Platform
bigquery fsi gcp gcp-cloud-functions gcp-marketplace-listing gcp-pubsub gcp-storage google-cloud google-cloud-platform google-cloud-pubsub google-cloud-storage google-marketplace marketplace pubsub sharing sharing-data sharing-economy sharing-information sharing-platform
Last synced: 07 Oct 2024
https://github.com/googlecloudplatform/dlp-dataflow-deidentification
Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP
beam bigquery data dataflow dlp pii tokenization
Last synced: 07 Oct 2024
https://github.com/snowplow/sql-runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
bigquery postgresql redshift snowflake snowplow sql-runner
Last synced: 09 Nov 2024
https://github.com/google/fhir-py
Python utilities for working with FHIR, including libraries to build simple, flat FHIR views in BigQuery.
Last synced: 30 Oct 2024
https://github.com/rupurt/odbc-scanner-duckdb-extension
A DuckDB extension to read data directly from databases supporting the ODBC interface
analytics bigquery columnar-database cpp data-engineering db2 duckdb mariadb mssql mysql nix odbc olap oracle postgres snowflake vector-engine
Last synced: 12 Oct 2024
https://github.com/doitintl/iris3
An upgraded and improved version of the Iris automatic GCP-labeling project
bigquery cloud-cost cloud-storage cloudsql cost-control gce gce-instance gcp gcp-projects gcp-pubsub google-cloud google-cloud-platform google-cloud-pubsub google-cloud-sql google-pubsub labeling organization-administrator pubsub set-labels
Last synced: 12 Nov 2024
https://github.com/samelamin/spark-bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
bigquery data-frame schema spark
Last synced: 12 Oct 2024
https://github.com/nevillelyh/shapeless-datatype
Shapeless utilities for common data types
avro bigquery datastore google-cloud scala shapeless tensorflow
Last synced: 30 Oct 2024
https://github.com/HTTPArchive/bigquery
BigQuery import and processing pipelines
Last synced: 02 Nov 2024
https://github.com/google/grizzly
End-to-end DataOps platform deployed by Terraform.
airflow bigquery cloud-sql cloud-storage composer data-catalog data-lineage data-loss-prevention dataflow dataops dataops-platform gcp git google-cloud google-cloud-platform pubsub spanner terraform
Last synced: 11 Nov 2024
https://github.com/zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx
Last synced: 12 Oct 2024
https://github.com/tiboun/python-bigquery-test-kit
BigQuery test kit is a framework written in python that allows you to be more confident in your SQL and check that they are ready to prod. Rendering SQL template is part of this framework as well if you rely, for instance, on Airflow to orchestrate your jobs and their macros.
bigquery bq-test-kit framework integration-testing templates testing testing-tools tests
Last synced: 29 Oct 2024
https://github.com/stanford-esrg/gps
GPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.
bigquery internet-wide-scanning ipv4 network port-scan port-scanner port-scanning scanning security security-scanner security-tools zgrab zmap
Last synced: 04 Aug 2024
https://github.com/starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
bigquery data-engineering data-integration data-pipeline etl hdfs redshift snowflake spark synapse
Last synced: 29 Oct 2024
https://github.com/urish/bigtsquery
Search Engine for TypeScript Code using AST Queries
angular bigquery search tsquery typescript
Last synced: 02 Nov 2024
https://github.com/einride/protobuf-bigquery-go
Seamlessly save and load protocol buffers to and from BigQuery using Go.
bigquery go golang google-cloud protobuf protobufs protocol-buffers
Last synced: 12 Oct 2024
https://github.com/googlecloudplatform/datacatalog-tag-engine
Tag Engine automates the process of creating, updating, deleting, and populating metadata in bulk with Google Cloud's Data Catalog. Tag Engine is licensed under the Apache 2 license terms. Please make sure to read, understand and agree to the terms of the LICENSE and CONTRIBUTING files before proceeding.
bigquery cloud-run cloud-storage data-catalog firestore
Last synced: 07 Oct 2024
https://github.com/mchmarny/github-activity-counter
Cloud Run service for GitHub event Webhook to monitor repo or org activity in real-time in Stackdriver and analyze activity through ad-hoc SQL queries in BigQuery
bigquery cloudrun dataflow github pubsub stackdriver webhook
Last synced: 08 Nov 2024
https://github.com/winwiz1/crisp-bigquery
Starter project with full stack BigQuery. Allows to overcome customisation restrictions imposed by pre-built dashboards and control data usage. Deploy your own cloud website hydrated by sample BigQuery data in 15 min without installing any development software.
bigquery boilerplate containerization docker express fullstack google-bigquery nodejs react typescript
Last synced: 26 Oct 2024
https://github.com/minodisk/bigquery-runner
An extension to query BigQuery directly and view the results in VSCode.
bigquery extension query sql standard-sql vscode vscode-extension
Last synced: 19 Oct 2024
https://github.com/kbhattac/coolretailer
Microservices with Istio, gRPC, Redis, BigQuery, Spring Boot, Spring Cloud and Stackdriver
bigquery google-cloud google-kubernetes-engine grafana grpc istio kiali locust microservices redis spring-boot spring-cloud zipkin
Last synced: 30 Oct 2024
https://github.com/tharwaninitin/etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
aws bigquery dataproc etl etl-framework etl-pipeline gcp gcs redis s3 scala spark zio
Last synced: 12 Oct 2024
https://github.com/data-integrations/google-cloud
A collection of Google Cloud Platform (GCP) plugins
bigquery cdap cdap-plugin gcs google pubsub
Last synced: 06 Nov 2024
https://github.com/cloudyr/bigQueryR
R Interface with Google BigQuery
api bigquery cloudyr google googleauthr r
Last synced: 13 Aug 2024
https://github.com/noahgift/managed_ml_systems_and_iot
Managed Machine Learning Systems and Internet of Things Live Lesson
ai automl bigquery cpu deep-learning deeplense edge-computing fpga iot machine-learning managed ml movidius python safari sagemaker tpu tutorial
Last synced: 28 Oct 2024
https://github.com/qnighy/bqpb
BigQuery UDF to parse protobuf messages
bigquery javascript protobuf protocol-buffers
Last synced: 22 Oct 2024
https://github.com/googlecloudplatform/spark-on-k8s-gcp-examples
Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub
bigquery cloud-pubsub gcs gcs-connector kubernetes spark
Last synced: 28 Sep 2024
https://github.com/evidence-dev/sqltools-bigquery-driver
Query and Explore BigQuery from VSCode
bigquery sql sqltools sqltools-driver vscode-extension
Last synced: 10 Nov 2024
https://github.com/googlecloudplatform/dlp-pdf-redaction
This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.
bigquery cloud cloudfunctions cloudrun cloudstorage cloudworkflows datalossprevention dlp documents gcp mask ocr pdf redaction serverless terraform tesseract workflows
Last synced: 07 Oct 2024
https://github.com/getstrm/pace
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake
Last synced: 12 Oct 2024
https://github.com/googlecloudplatform/market-data-transcoder
ffmpeg for market data
automation avro bigquery binary binaryencoding devops exchanges fix fixprotocol google-cloud-platform itch json marketdata pubsub sbe schema simple-binary-encoding trading transcoding
Last synced: 07 Oct 2024
https://github.com/alexolivier/flight2bq
RTLSDR ADS-B dump1090 to Google BigQuery
ads-b aircraft bigquery google-bigquery raspberry-pi rtl-sdr rtlsdr
Last synced: 08 Nov 2024
https://github.com/rana/gcp.jl
GCP BigQuery APIs in Julia.
api-client bigquery google google-api google-cloud google-cloud-platform julia julia-language julia-package julialang
Last synced: 12 Oct 2024
https://github.com/florianwilhelm/wald-stack-demo
🌳 WALD Stack Demo 🏎️
airbyte bigquery data-analysis dbt lightdash python snowflake snowpark
Last synced: 08 Nov 2024
https://github.com/pacuna/snowplow-pipeline
End-to-end Snowplow Analytics Pipeline for real time events
analytics big-data bigquery docker docker-compose kafka kubernetes production real-time snowplow snowplowanalytics streaming
Last synced: 08 Nov 2024
https://github.com/thdk/team-timesheets
Time tracking web app built as a replacement for old school timesheets.
bigquery firebase firebase-firestore mobx mobx-react time-registration time-tracker time-tracking timesheets typescript
Last synced: 07 Nov 2024
https://github.com/google/data-quality-monitor
Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.
bigquery cloudstorage data-quality-checks gcp google-cloud-platform python terraform
Last synced: 13 Aug 2024
https://github.com/sungchun12/serverless-data-pipeline-gcp
:factory: Schedule a data pipeline in Google Cloud using cloud function, BigQuery, cloud storage, cloud scheduler, stack trace, cloud build, and pub/sub
bigquery bigquery-schema cicd-promote-to-production cloud-build cloud-functions cloud-scheduler etl-pipeline google-cloud-platform python3 sql stackdriver-trace
Last synced: 28 Oct 2024
https://github.com/johannes-berggren/firestore-to-bigquery-export
NPM package for copying and converting Cloud Firestore data to BigQuery.
bigquery bigquery-export datasets firestore firestore-collections schema
Last synced: 28 Oct 2024
https://github.com/mercari/dataflowtemplates
Convenient Dataflow pipelines for transforming data between cloud data sources
apache-beam bigquery dataflow dataflow-templates spanner
Last synced: 09 Nov 2024
https://github.com/hayatoy/dataflow-tutorial
Cloud Dataflow Tutorial for Beginners
bigquery cloud-dataflow google-cloud-platform google-cloud-storage
Last synced: 09 Nov 2024
https://github.com/wittline/pydag
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
big-data bigquery cloud dag data-engineering data-pipeline dataengineering dataproc dataproc-cluster directed-acyclic-graph google-cloud google-cloud-platform parallel-processing task-scheduler task-scheduling workflow-engine
Last synced: 14 Oct 2024
https://github.com/googlecloudplatform/deviceconnect
https://deviceconnect.readthedocs.io/
Last synced: 07 Oct 2024
https://github.com/banditml/faucetml
High speed mini-batch data reading & preprocessing from BigQuery.
bigquery feature-engineering features machine-learning ml preprocessing pytorch
Last synced: 01 Nov 2024
https://github.com/InosRahul/f1-data-pipeline
F1 Data Pipeline
bigquery data-engineering-pipeline dbt gcs looker prefect python terraform
Last synced: 02 Aug 2024
https://github.com/captaincodeman/datastore-mapper
Appengine Datastore Mapper in Go
appengine bigquery cloud-storage datastore datastore-entities datastore-mapper go map-reduce shards
Last synced: 11 Nov 2024
https://github.com/mlr-org/mlr3db
Data Backends to let mlr3 work transparently with (remote) data bases
bigquery data-backend database duckdb machine-learning mariadb mlr3 mysql odbc postgresql r r-package spark sqlite
Last synced: 14 Oct 2024
https://github.com/googlecloudplatform/bq-utilization-alerts
A serverless bot which periodically checks configured BigQuery capacity commitments, reservations and assignments against actual slot consumption of running jobs and reports findings to Slack/Google Chat.
bigquery bot chat-ops cloud-run cloud-scheduler google-chat google-cloud serverless slack slots
Last synced: 07 Oct 2024
https://github.com/miraisolutions/sparkbq
Sparklyr extension package to connect to Google BigQuery
Last synced: 03 Aug 2024
https://github.com/ronoaldo/aetools
Utilities to build and manage Google App Engine apps
Last synced: 29 Sep 2024
https://github.com/googlecloudplatform/bigquery-dlp-remote-function
Use Remote Functions to tokenize data with DLP in BigQuery using SQL
bigquery cloud-run data-loss-prevention dlp google-cloud
Last synced: 07 Oct 2024
https://github.com/nodefluent/bigquery-kafka-connect
:cloud: nodejs kafka connect connector for Google BigQuery
big-data bigquery connect etl google-cloud kafka kafka-connect nodejs
Last synced: 11 Nov 2024
https://github.com/Canner/vulcan-sql-examples
Curated VulcanSQL show cases
analytics api-builder bigquery data data-lake data-warehouse database duckdb examples postgresql reporting restful-api sql vulcan-sql vulcansql
Last synced: 07 Nov 2024
https://github.com/ocadaruma/scalikejdbc-bigquery
ScalikeJDBC extension for Google BigQuery
Last synced: 12 Oct 2024
https://github.com/hackersandslackers/bigquery-sqlalchemy-tutorial
:bar_chart: :arrow_right: :floppy_disk: ETL script to migrate data from BigQuery to SQL.
bigquery bigquery-sqlalchemy-tutorial databases etl mysql postgres python sql sqlalchemy tutorial
Last synced: 09 Nov 2024
https://github.com/hirosassa/bqvalid
SQL linter tool for BigQuery GoogleSQL (formerly known as StandardSQL).
Last synced: 02 Nov 2024
https://github.com/digitalghost-dev/stock-data-pipeline
Visualizing S&P 500 data on a webpage with Python.
bigquery google-cloud-platform python
Last synced: 06 Nov 2024
https://github.com/shinichi-takii/vscode-language-sql-bigquery
Syntax highlighting and code snippets for BigQuery SQL in Visual Studio Code
bigquery grammar snippets sql syntax-highlighting vscode vscode-extension
Last synced: 31 Oct 2024
https://github.com/naseemkullah/gcp-accountant
A tool to identify high cost resources in GCP at a granular level
bigquery cost cost-engineering cost-resources gcp gcp-accountant
Last synced: 09 Nov 2024
https://github.com/medjed/embulk-input-bigquery
BigQuery input plugin for Embulk loads records from BigQuery
Last synced: 12 Oct 2024
https://github.com/yoheimuta/dbq
CLI tool to easily Decorate BigQuery table name
bigquery bq cli golang table-decorator
Last synced: 13 Oct 2024
https://github.com/mesmacosta/bq-fake-pii-table-creator
Library for creating BigQuery tables with fake PII data
bigquery fake-data faker governance-dapps metadata piidata piii
Last synced: 11 Nov 2024
https://github.com/modataconsulting/dbt_ga4_project
This project uses Google Analytics 4 BigQuery Exports as its source data, and offers useful base transformations to provide report-ready dimension & fact models that can be used for reporting purposes, blending with other data, and/or feature engineering for ML models.
bigquery bq data-build-tool dbt ga4 google-analytics-4 sql
Last synced: 12 Oct 2024
https://github.com/unytics/catalog_builder
Data Catalogs Made Easy
bigquery data-catalog data-discovery databricks dbt redshift snowflake
Last synced: 07 Nov 2024
https://github.com/googlecloudplatform/datacatalog-tag-history
Historical metadata of your data warehouse is a treasure trove to discover not just insights about changing data patterns, but also quality and user behaviour. This solution creates Data Catalog Tags history in BigQuery since Data Catalog keeps only the latest version of metadata for fast searchability.
analytics bigquery data-catalog data-governance metadata-management
Last synced: 28 Sep 2024
https://github.com/googlecloudplatform/cloud-composer-mssql-dataflow-bigquery
This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.
airflow bigquery cloud-composer dataflow microsoft-sql-server
Last synced: 07 Oct 2024
https://github.com/livebook-dev/req_bigquery
Conveniences for querying Google BigQuery with Req
Last synced: 11 Nov 2024
https://github.com/sungchun12/schedule-python-script-using-google-cloud
:clock4: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job
appengine-python bigquery chicago-traffic cron google-cloud python-script
Last synced: 28 Oct 2024
https://github.com/oliveroneill/bigqueryswift
BigQuery client for Swift
bigquery google-cloud-platform swift
Last synced: 11 Oct 2024
https://github.com/data-tools/big-data-types
A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.
apache-spark bigquery bigquery-tables cassandra circe database-types scala schemas spark typeclass typeclass-derivation typesafe
Last synced: 12 Oct 2024