Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

BigQuery

Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.

📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.

https://github.com/hasura/graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.

access-control api automatic-api bigquery graphql graphql-api graphql-server haskell hasura mongodb postgres rest-api sql-server subgraph supergraph

Last synced: 28 Oct 2024

https://github.com/getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization

Last synced: 28 Oct 2024

https://github.com/cube-js/cube

📊 Cube — The Semantic Layer for Building Data Applications

analytics bigquery cube headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless sql

Last synced: 01 Nov 2024

https://github.com/cube-js/cube.js

📊 Cube — The Semantic Layer for Building Data Applications

analytics bigquery cube headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless sql

Last synced: 05 Aug 2024

https://github.com/beekeeper-studio/beekeeper-studio

Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.

bigquery cassandra cockroachdb database electron firebird linux-app mac-app mariadb mssql mysql postgresql sql sql-server sqlite windows-app

Last synced: 28 Oct 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 28 Oct 2024

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 28 Oct 2024

https://github.com/apache/incubator-doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 04 Aug 2024

https://github.com/HVF/franchise

🍟 a notebook sql client. what you get when have a lot of sequels.

bigquery database mysql postgresql sql

Last synced: 29 Oct 2024

https://github.com/hvf/franchise

🍟 a notebook sql client. what you get when have a lot of sequels.

bigquery database mysql postgresql sql

Last synced: 13 Oct 2024

https://github.com/jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake

Last synced: 11 Oct 2024

https://github.com/briefercloud/briefer

Dashboards and notebooks in a single place. Create powerful and flexible dashboards using code, or build beautiful Notion-like notebooks and share them with your team.

analytics bi bigquery briefer business-intelligence businessintelligence dashboard data-analysis data-visualization jupyter notebook postgres postgresql reporting visualization

Last synced: 29 Oct 2024

https://github.com/blockchain-etl/ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

aws bigquery blockchain-analytics csv erc20 erc20-tokens erc721 ethereum etl export gcp google-cloud sql transaction

Last synced: 29 Oct 2024

https://github.com/googlecloudplatform/professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.

bigquery examples gke google-cloud-compute google-cloud-dataflow google-cloud-ml google-cloud-platform solutions tools

Last synced: 07 Oct 2024

https://github.com/GoogleCloudPlatform/professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.

bigquery examples gke google-cloud-compute google-cloud-dataflow google-cloud-ml google-cloud-platform solutions tools

Last synced: 25 Oct 2024

https://github.com/spotify/scio

A Scala API for Apache Beam and Google Cloud Dataflow.

batch beam bigquery data dataflow google-cloud ml scala scio streaming

Last synced: 29 Oct 2024

https://github.com/bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake

Last synced: 12 Oct 2024

https://github.com/PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing

Last synced: 31 Oct 2024

https://github.com/peerdb-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing

Last synced: 09 Oct 2024

https://github.com/elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake

Last synced: 15 Oct 2024

https://github.com/EvgSkv/logica

Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.

bigquery datalog language logic-programming logica postgresql presto prolog prolog-implementation sql sqlite trino

Last synced: 27 Oct 2024

https://github.com/evgskv/logica

Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.

bigquery datalog language logic-programming logica postgresql presto prolog prolog-implementation sql sqlite trino

Last synced: 15 Oct 2024

https://github.com/swirlai/swirl-search

SWIRL AI Connect: AI infrastructure software that powers your Search & Retrieval Augmented Generation (RAG) applications. Simplify and enhance your AI pipelines with seamless integration of large language models (LLMs) and data sources.

ai-search bigquery django federated-query federated-search gpt large-language-models metasearch python rag relevancy search search-engine unified-search

Last synced: 13 Oct 2024

https://github.com/Canner/WrenAI

Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.

agent ai bigquery duckdb fastapi gpt llm nextjs nlp openai postgresql python rag sql text-to-sql typescript

Last synced: 06 Aug 2024

https://github.com/canner/wrenai

Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.

agent ai bigquery duckdb fastapi gpt llm nextjs nlp openai postgresql python rag sql text-to-sql typescript

Last synced: 11 Oct 2024

https://github.com/GoogleCloudPlatform/bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.

bigquery data-warehouse google-cloud-platform sql utilities

Last synced: 13 Nov 2024

https://github.com/googlecloudplatform/bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.

bigquery data-warehouse google-cloud-platform sql utilities

Last synced: 07 Oct 2024

https://github.com/goccy/bigquery-emulator

BigQuery emulator server implemented in Go

bigquery emulator gcp go golang google-cloud google-cloud-platform

Last synced: 13 Oct 2024

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 13 Oct 2024

https://github.com/multiwoven/multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.

bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript

Last synced: 12 Oct 2024

https://github.com/httparchive/almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community

bigquery http-archive web-almanac

Last synced: 29 Oct 2024

https://github.com/HTTPArchive/almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community

bigquery http-archive web-almanac

Last synced: 03 Aug 2024

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 12 Oct 2024

https://github.com/dbt-checkpoint/dbt-checkpoint

:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

bigquery business-intelligence dbt pre-commit pre-commit-hook quality-assurance snowflake sql

Last synced: 12 Nov 2024

https://github.com/r-dbi/bigrquery

An interface to Google's BigQuery from R.

bigquery database r

Last synced: 12 Oct 2024

https://github.com/googleapis/nodejs-bigquery

Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.

bigquery database nodejs sql

Last synced: 12 Oct 2024

https://github.com/tylertreat/bigquery-python

Simple Python client for interacting with Google BigQuery.

bigquery google-bigquery python

Last synced: 12 Oct 2024

https://github.com/googleapis/python-bigquery-pandas

Google BigQuery connector for pandas

bigquery data pandas

Last synced: 13 Oct 2024

https://github.com/ofek/pypinfo

Easily view PyPI download statistics via Google's BigQuery.

bigquery pypi python statistics

Last synced: 13 Oct 2024

https://github.com/basedosdados/sdk

⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 13 Nov 2024

https://github.com/HariSekhon/SQL-scripts

100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS Athena. DBA, Analytics, DevOps, performance engineering. Google BigQuery ML machine learning classification.

athena aws aws-athena bigquery bigquery-ml dba devops gcp google-bigquery google-cloud-sql google-cloudsql-mysql machine-learning mariadb mysql oracle performance postgres postgresql rds sql

Last synced: 07 Nov 2024

https://github.com/basedosdados/mais

⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 13 Oct 2024

https://github.com/harisekhon/sql-scripts

100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DBA, Analytics, DevOps, performance engineering. Google BigQuery ML machine learning classification.

athena aws aws-athena bigquery bigquery-ml dba devops gcp google-bigquery google-cloud-sql google-cloudsql-mysql hacktoberfest machine-learning mariadb mysql performance postgres postgresql rds sql

Last synced: 11 Oct 2024

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 29 Oct 2024

https://github.com/googleclouddataproc/spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

bigquery bigquery-storage-api google-bigquery google-cloud google-cloud-dataproc spark

Last synced: 11 Oct 2024

https://github.com/GoogleCloudDataproc/spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

bigquery bigquery-storage-api google-bigquery google-cloud google-cloud-dataproc spark

Last synced: 30 Sep 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Oct 2024

https://github.com/spotify/ratatool

A tool for data sampling, data generation, and data diffing

avro bigquery parquet protobuf scala scalacheck

Last synced: 13 Oct 2024

https://github.com/machine-learning-apps/issue-label-bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 29 Sep 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 25 Oct 2024

https://github.com/mprove-io/mprove

Open Source Self-service Business Intelligence with Version Control :tada:

analytics bigquery business-intelligence clickhouse dashboard data-visualization looker metrics postgresql snowflake

Last synced: 13 Oct 2024

https://github.com/raystack/firehose

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

apache-kafka bigquery dataops firehose influxdb kafka postgresql prometheus sink streaming

Last synced: 13 Oct 2024

https://github.com/scale8/scale8-tag-manager-and-analytics

Website analytics, JavaScript error tracking + analytics, tag manager, data ingest endpoint creation (tracking pixels). GDPR + CCPA compliant.

advertising analytics app bigquery charts clickhouse cloud cmp gdpr google-analytics google-tag-manager marketing metrics privacy scale8 statistics tag-manager typescript website

Last synced: 29 Sep 2024

https://github.com/googleclouddataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs

Last synced: 12 Oct 2024

https://github.com/GoogleCloudDataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs

Last synced: 25 Oct 2024

https://wix.github.io/quix

Quix Notebook Manager

athena bigquery notebook-manager presto trino

Last synced: 01 Nov 2024

https://github.com/wix-incubator/quix

Quix Notebook Manager

athena bigquery notebook-manager presto trino

Last synced: 29 Oct 2024

https://github.com/bxparks/bigquery-schema-generator

Generates the BigQuery schema from newline-delimited JSON or CSV data records.

bigquery bigquery-schema google-bigquery python3

Last synced: 29 Oct 2024

https://github.com/cuebook/cueobserve

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

Last synced: 12 Oct 2024

https://github.com/thinkingmachines/geomancer

Automated feature engineering for geospatial data

bigquery feature-engineering geospatial machine-learning openstreetmap

Last synced: 29 Sep 2024

https://github.com/cuebook/CueObserve

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

Last synced: 03 Aug 2024

https://github.com/datacoves/dbt-coves

CLI tool for dbt users to simplify creation of staging models (yml and sql) files

analytics bigquery datacoves dbt elt etl jinja python redshift snowflake sql

Last synced: 30 Oct 2024

https://github.com/yoshidan/google-cloud-rust

Google Cloud Client Libraries for Rust.

bigquery gcp gcs google-cloud-platform pubsub rust spanner

Last synced: 12 Oct 2024

https://github.com/googlecloudplatform/data-analytics-golden-demo

An end to end demo of Google's Cloud data and analytic stack.

bigdata bigquery composer dataflow dataproc gcp

Last synced: 07 Oct 2024

https://github.com/digitalghost-dev/premier-league

A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.

bigquery cloud-run data-engineer data-pipeline data-visualization docker firestore go google-cloud prefect python streamlit

Last synced: 26 Sep 2024

https://github.com/lots-of-things/gpt2-bert-reddit-bot

a bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models

bert bigquery colab-notebook gpt-2 praw

Last synced: 12 Oct 2024

https://github.com/cartodb/analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql

Last synced: 13 Nov 2024

https://github.com/omnata-labs/dbt-ml-preprocessing

A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

bigquery dbt redshift scikit-learn snowflake

Last synced: 12 Oct 2024

https://github.com/googlecloudplatform/fraudfinder

Fraudfinder: A comprehensive lab series on how to build a real-time fraud detection system on Google Cloud

bigquery bigquery-ml dataflow google-cloud-platform machine-learning mlops mlpipelines vertex-ai

Last synced: 07 Oct 2024

https://github.com/mara/mara-example-project-2

An example mini data warehouse for python project stats, template for new projects

bigquery data-integration etl pypi sql

Last synced: 12 Oct 2024

https://github.com/xnuinside/simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.

bigquery columns ddl ddl-parser ddls hive hql mssql mysql oracle-database oracle-db parser postgresql redshift schemas snowflake sql sql-parser tsql types

Last synced: 12 Oct 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 29 Sep 2024

https://github.com/google/vscode-bigquery

A Visual Studio Code plugin for running BigQuery queries.

bigquery extension sql vscode

Last synced: 29 Sep 2024

BigQuery Awesome Lists
BigQuery Categories