Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/googlecloudplatform/cloud-composer-mssql-dataflow-bigquery

This repository contains an example of how to leverage Cloud Composer and Cloud Dataflow to move data from a Microsoft SQL Server to BigQuery. The diagrams below demonstrate the workflow pipeline.

airflow bigquery cloud-composer dataflow microsoft-sql-server

Last synced: 07 Oct 2024

https://github.com/sungchun12/schedule-python-script-using-google-cloud

:clock4: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job

appengine-python bigquery chicago-traffic cron google-cloud python-script

Last synced: 28 Oct 2024

https://github.com/oliveroneill/bigqueryswift

BigQuery client for Swift

bigquery google-cloud-platform swift

Last synced: 11 Oct 2024

https://github.com/data-tools/big-data-types

A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.

apache-spark bigquery bigquery-tables cassandra circe database-types scala schemas spark typeclass typeclass-derivation typesafe

Last synced: 12 Oct 2024

https://github.com/openbridge/ob_datastash

Stream your CSV files to an HTTP API

aws bigquery csv csv-files logstash parquet redshift

Last synced: 14 Nov 2024

https://github.com/badal-io/gcp-airflow-foundations

Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery data warehouse

airflow apache-airflow bigquery dags data-engineering data-pipeline etl-pipeline

Last synced: 29 Oct 2024

https://github.com/vickyjkwan/sqlanalyzer

A SQL parser and analyzer for sql flavors including MySQL, PostgreSQL, BigQuery Standard SQL, Presto SQL and Hive SQL.

athena bigquery hiveql metastore presto sqlparser standardsql

Last synced: 12 Oct 2024

https://github.com/googlecloudplatform/google-cloud-abap

ABAP SDK for Google Cloud and BigQuery Connector for SAP enable customers to easily consume Google Products and Services natively from their SAP Landscape.

abap abap-development abapsdk abapsdkforgcp bigquery google-cloud-platform google-generative-ai google-maps-api vertex-ai

Last synced: 07 Oct 2024

https://github.com/orisano/bqspec

SQL testing tool for Google BigQuery.

bigquery cli python test yaml

Last synced: 09 Nov 2024

https://github.com/kesin11/ts-junit2json

Convert JUnit XML format to JSON with TypeScript

bigquery junit-xml

Last synced: 10 Nov 2024

https://github.com/hackersandslackers/bigquery-python-tutorial

:bar_chart: :snake: Create tables in Google BigQuery, auto-generate their schemas, and retrieve said schemas.

bigquery data-warehouse gcs google-bigquery google-cloud google-cloud-sdk google-cloud-storage python tutorial

Last synced: 09 Nov 2024

https://github.com/pcorbel/metaquery

An API to analyze BigQuery metadata

bigquery golang gorm vue-router vuejs vuetifyjs vuex

Last synced: 10 Nov 2024

https://github.com/jashparekh/bigquery-action

This Github action can be used to deploy tables/views schemas to BigQuery.

actions bigquery gbq github-actions google google-bigquery google-cloud-platform hacktoberfest

Last synced: 23 Oct 2024

https://github.com/minodisk/zoq

Convert Zod to BigQuery Schema

bigquery bigquery-schema bigquery-schema-converter zod

Last synced: 19 Oct 2024

https://github.com/urish/nn-function-generator

Experimenting with automatic generation of TS function bodies using ANN models

bigquery tensorflow tsquery typescript

Last synced: 12 Nov 2024

https://github.com/tufin/espresso

A framework for writing testable BigQuery queries

bigquery sql testing

Last synced: 29 Sep 2024

https://github.com/armanbilge/gcp4s

Cross-platform JVM/JS Google Cloud Platform integrations for fs2 and friends

bigquery google-cloud scalajs

Last synced: 12 Oct 2024

https://github.com/edgarrmondragon/meltano-dogfood

Personal dogfood Meltano project

bigquery dbt dogfood elt evidence-dev meltano

Last synced: 15 Oct 2024

https://github.com/mchmarny/pubsub-to-bigquery-pump

Simple utility combining Cloud Run and Stackdriver metrics to drain JSON messages from PubSub topic into BigQuery table

bigquery cloudrun events golang metrics pubsub stackdriver

Last synced: 18 Oct 2024

https://github.com/wintermi/imdb-dataform

An example Dataform project to load and transform the publicly available dataset from IMDB.

bigquery dataform google-cloud google-cloud-platform

Last synced: 09 Nov 2024

https://github.com/memsjava/bigquery-helper

A helper package for Google BigQuery operations

bigquery google pandas-dataframe

Last synced: 14 Oct 2024

https://github.com/nodefluent/purpur

:diamond_shape_with_a_dot_inside: kafka-connectors as a service | ETL :purple_heart:

bigquery connectors etl gcloud kafka kafka-connect mysql nodejs redis saas

Last synced: 29 Sep 2024

https://github.com/tomayac/http-archive-progressive-web-apps

Different approaches to estimate the number of Progressive Web Apps in the HTTP Archive

bigquery httparchive

Last synced: 16 Oct 2024

https://github.com/sigpwned/litecene

A simple cross-data store full-text search language for Java 8+

bigquery full-text-search java query-language search

Last synced: 14 Oct 2024

https://github.com/doitintl/terraform-bq-scheduled-queries

This is a demo project to use Terraform to manage BigQuery scheduled queries with Cloud Build CI/CD

bigquery cicd cloudbuild terraform

Last synced: 12 Nov 2024

https://github.com/corneliusweig/krew-index-tracker

Saves download statistics of `krew.dev` plugins to BigQuery

bigquery history krew krew-index statistics

Last synced: 18 Oct 2024

https://github.com/gr8distance/blanton

BigQuery API wrapped by Elixir

bigquery bigquery-schema elixir

Last synced: 29 Oct 2024

https://github.com/kitagry/bqls

WIP: BigQuery language server

bigquery language-server

Last synced: 02 Nov 2024

https://github.com/tobked/fetch-apache-ga-stats

Repository to make "snapshots" of GitHub Action queue for later analysis

bigquery gcp github github-actions

Last synced: 15 Oct 2024

https://github.com/kellyjadams/run-sql-in-python

Scripts to connect python to BigQuery or a PostgreSQL database.

bigquery postgresql python

Last synced: 13 Oct 2024

https://github.com/k1low/tbls-meta

tbls-meta is an external subcommand of tbls for applying metadata managed by tbls to the datasource.

bigquery data-catalog-management

Last synced: 12 Oct 2024

https://github.com/wayfair-incubator/gbq

Python wrapper for interacting with Google BigQuery.

bigquery gbq google google-bigquery google-cloud-platform hacktoberfest python

Last synced: 12 Oct 2024

https://github.com/wintermi/movielens-dataform

An example Dataform project which will use the publicly available Movielens dataset to demonstrate how to upload your product catalog and user events into either the Google Cloud Retail API or Google Cloud Discovery Engine and train a personalised product recommendation model.

bigquery dataform google-cloud google-cloud-platform vertex-ai

Last synced: 09 Nov 2024

https://github.com/shnewto/bqjson

bqjson - Serialize/Deserialzie BigQuery TableResults to/from JSON

bigquery java json maven serde serde-json serialization serializer tableresult testing tests

Last synced: 27 Oct 2024

https://github.com/pierrec1024/airflow-provider-bigquery-reservation

Airflow provider for bigquery reservation operators.

airflow bigquery reservation

Last synced: 12 Oct 2024

https://github.com/trocco-io/embulk-output-bigquery_java

Java flavor faster Embulk output plugin to load/insert data into Google BigQuery

bigquery embulk etl java

Last synced: 12 Nov 2024

https://github.com/cata-network/cadence-docs

cadence document, Chinese version

bigquery

Last synced: 07 Nov 2024

https://github.com/badal-io/dataflow-timeseries-iot-gas-demo

Dataflow code for integration with GCP Core IoT and FogLamp

bigquery dataflow foglamp

Last synced: 11 Nov 2024

https://github.com/wintermi/bqe-dataform

A Dataform project which aggregates BigQuery system metadata for the purpose of analysing the slot usage and storage within an organization by project.

bigquery dataform google-cloud google-cloud-platform

Last synced: 09 Nov 2024

https://github.com/rittmananalytics/ra_dbt_to_dataform

An open-source tool that partially automates the migration of dbt packages to Dataform

bigquery dataform dbt dbt-core migration-tool

Last synced: 13 Oct 2024

https://github.com/sukanyabag/gcp-ai-notebooks

This repository contains all practice notebooks with which I performed hands-on labs in Google Cloud Training Program's "Cloud ML-AI Track"

bigquery cloudml-samples data-science dataprep tensorflow-tutorials

Last synced: 03 Nov 2024

https://github.com/adam-cowley/neo4j-bigquery

Yo dawg, I heard you like queries so we put some BigQuery in your query so you can query BigQuery from your query

bigquery cypher neo4j neo4j-procedures

Last synced: 30 Oct 2024

https://github.com/lin-jun-xiang/pyga4

📊Python Google Analytics 4 (GA4) Data Extraction and Analysis Toolkit

bigquery free ga ga4 google-analytics google-analytics-python-api python

Last synced: 27 Oct 2024

https://github.com/bzzt/alchemy_table

Opinionated framework for working with Bigtable and BigQuery

bigquery bigtable database elixir gcp googlecloud googlecloudplatform

Last synced: 19 Oct 2024

https://github.com/takegue/bqmake

BigQuery Powered Data Build Suite.

bigquery sql

Last synced: 12 Oct 2024

https://github.com/ymyzk/prom2bq

Copy data from Prometheus to BigQuery

bigquery go prometheus

Last synced: 27 Oct 2024

https://github.com/wintermi/bqwrite-test

A command line application designed to provide a method to test the BigQuery Streaming API or BigQuery Storage Write API, allowing you to get a view of the potential throughput available via a given host.

bigquery google-cloud google-cloud-platform

Last synced: 09 Nov 2024

https://github.com/wintermi/fashion-dataform

An example Dataform project to load and transform the publicly available dataset from H&M Group into a format which could be imported into Discovery AI for Retail or Vertex AI Search and Conversation, , allowing you to train a retail recommendations model.

bigquery dataform google-cloud google-cloud-platform vertex-ai

Last synced: 09 Nov 2024

https://github.com/vigneshss-07/google-cloud-professional-data-engineer-acompleteguide

This Repo contains all study, lab and supportive materials for Udemy course on "Google Cloud Professional Data Engineer - A Complete Guide".

big-data bigquery cloud-computing dataengineering elt-pipeline etl-framework gcp-services gcp-storage google-cloud machine-learning

Last synced: 12 Oct 2024

https://github.com/gjbae1212/go-bqworker

go-esworker is an async worker that data can bulk insert, update to the BigQuery.

async bigquery bigquery-bulk gcp go golang parallel worker

Last synced: 06 Nov 2024

https://github.com/wintermi/bq2csv

A command line application designed to provide a simple method to execute a BigQuery SQL script from "stdin", outputting all results to "stdout" in CSV format. A detailed log is output to the console "stderr" providing you with the available execution statistics.

bigquery google-cloud google-cloud-platform

Last synced: 12 Oct 2024

https://github.com/stkchan/googleanalytics4-publicdataset-ecommerce-dashboard-powerbi

This dashboard uses Power BI Desktop as a visualization tool by extracting data from Google BigQuery.

analytics bigquery dashboard portfolio portfolio-project powerbi sql

Last synced: 13 Oct 2024

https://github.com/42digital/bqtools

Python Tools for BigQuery

bigquery bigquery-schema migrations python

Last synced: 12 Oct 2024

https://github.com/jolares/example-gcp-dataform

Example end-to-end ELT data pipeline using GCP Dataform.

bigquery dataform etl-pipeline

Last synced: 14 Oct 2024

https://github.com/wintermi/bqrunner

A command line application designed to provide a simple method to execute one or more SQL queries against a given dataset in BigQuery. A detailed log is output to the console providing you with the available execution statistics.

bigquery google-cloud google-cloud-platform

Last synced: 24 Oct 2024

https://github.com/greenpeace/gpes-bigquery-recipes

Google Big Query recipes to Analyse our data.

bigquery database-management sql

Last synced: 03 Aug 2024

https://github.com/yashika-malhotra/strategic-analysis-of-retail-brand-in-south-america-using-sql

Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services

bigquery data-analysis data-science database database-schema google-bigquery mysql-server sql

Last synced: 14 Nov 2024

https://github.com/dmytrovoytko/data-engineering-amazon-reviews

Data Engineering project for ZoomCamp`24: JSONL -> PostgreSQL/BigQuery + Metabase + Mage.AI

bash-script bigquery codespaces data-analysis data-visualization etl metabase pipeline python-script

Last synced: 14 Nov 2024

https://github.com/deepraj1729/gcp-cloud-billing-api

Cloud Billing - Cost Monitoring and Alerting API for Google Cloud (Billing Exports)

bigquery fastapi gcp python redis

Last synced: 12 Nov 2024

https://github.com/evry-ace/statsbot

Slack Bot to forward message statistics to BigQuery

bigquery slack slack-bot slackbot

Last synced: 12 Oct 2024

https://github.com/brews/bucket2bq

Create an inventory of objects in GCS Bucket with metadata and upload to Big Query

bigquery gcp golang google-cloud-storage

Last synced: 20 Oct 2024

https://github.com/olajideolagunju/gcp_mage_data_pipeline

An end-to-end data pipeline solution to process and analyze Maintenance Work Orders using Mage, Google BigQuery, Cloud SQL, and Looker Studio. Features a seamless integration of cloud tools for scalable data storage, transformation, and visualization.

automation bigquery cloud cloud-sql compute-engine data data-engineering database database-schema docker-compose excel gcp mage-ai maintenance orchestration python sql virtual-machine visualization-dashboard work-orders

Last synced: 21 Oct 2024

https://github.com/triglav-dataflow/triglav-agent-bigquery

BigQuery agent for Triglav, data-driven workflow tool

bigquery ruby triglav-agent

Last synced: 12 Oct 2024

https://github.com/airscholar/dbt-bigquery-crash-course

A deep dive into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering.

bigquery data-engineering dbt google-cloud

Last synced: 14 Nov 2024

https://github.com/takegue/bigquery-porter

BigQuery Deployment and Metadata Management tool

bigquery

Last synced: 12 Oct 2024

https://github.com/wintermi/tmdb-dataform

An example Dataform project to load and transform the publicly available dataset from The Movie Database into a format which could be imported into Vertex AI Search for Media, allowing you to build a search engine for movies.

bigquery dataform google-cloud google-cloud-platform

Last synced: 12 Oct 2024

https://github.com/rezuankassim/bqanalytic

Laravel package to use analytic data imported to Big Query from Firebase Analytic

bigquery firebase-analytics laravel

Last synced: 12 Oct 2024

https://github.com/leereilly/wee-queries

Query sets for Google Cloud Platform's BigQuery :mag:

bigquery

Last synced: 13 Oct 2024

https://github.com/dav009/dbtmock

end to end unit tests for dbt ( Data build tool ) pipelines

bigquery data-build-tool dbt mock pipelines test testing unittest unittesting

Last synced: 24 Oct 2024

https://github.com/leandronasx/agro-data

Projeto final da formação de analista de dados e dashboard da SoulCode Academy.

bigquery data-analysis gcp looker pandas powerbi python

Last synced: 12 Oct 2024

https://github.com/jehiah/socrata_to_bigquery

A tool to copy public data to BigQuery

bigquery opendata socrata

Last synced: 23 Oct 2024

https://github.com/phstudy/postgresql-zetasketch

ZetaSketch HLL++ functions for PostgreSQL

bigquery hll java postgresql postgresql-extension zetasketch

Last synced: 12 Oct 2024

https://github.com/kellyjadams/bigquery-python-weekly-report

A script to automate a weekly report that runs BigQuery in Python.

bigquery python

Last synced: 13 Oct 2024

https://github.com/fpopic/bigquery-schema-select

(Script) Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.

bigquery bigquery-schema scala sql

Last synced: 12 Oct 2024

https://github.com/pedrocarmona/big_query_adapter

An ActiveRecord Google BigQuery adapter

activerecord bigquery gem ruby-on-rails

Last synced: 13 Oct 2024

https://github.com/mchmarny/sbomer

Generates daily SBOM and vulnerability reports for container images and saves resulting files into GCS bucket and data into BigQuery tables.

bigquery gcp gcs grype report sbom syft vex vulnerability

Last synced: 08 Nov 2024

https://github.com/mchmarny/xstreams

Stream processing using Cloud PubSub and Dataflow SQL in BigQuery

bigquery dataflow gce gcp golang pubsub stream

Last synced: 08 Nov 2024

https://github.com/mchmarny/automodel

BigQuery automatic model rebuild based on r2 score deviation

bigquery gcp iot ml model

Last synced: 08 Nov 2024

https://github.com/analyticace/data-engineering-projects

Collection of Open Source Data Engineering Projects

aws big-data bigquery data docker engineering etl oracle-database pipeline sql

Last synced: 05 Nov 2024

https://github.com/ostrokach/uniparc_xml_parser

UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).

bigquery bioinformatics csv-files parquet-files protein-domains protein-sequences

Last synced: 12 Oct 2024

BigQuery Awesome Lists
BigQuery Categories