Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/raystack/firehose

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

apache-kafka bigquery dataops firehose influxdb kafka postgresql prometheus sink streaming

Last synced: 01 Jul 2024

https://github.com/scale8/scale8-tag-manager-and-analytics

Website analytics, JavaScript error tracking + analytics, tag manager, data ingest endpoint creation (tracking pixels). GDPR + CCPA compliant.

advertising analytics app bigquery charts clickhouse cloud cmp gdpr google-analytics google-tag-manager marketing metrics privacy scale8 statistics tag-manager typescript website

Last synced: 27 Jun 2024

https://github.com/winwiz1/crisp-bigquery

Starter project with full stack BigQuery. Allows to overcome customisation restrictions imposed by pre-built dashboards and control data usage. Deploy your own cloud website hydrated by sample BigQuery data in 15 min without installing any development software.

bigquery boilerplate containerization docker express fullstack google-bigquery nodejs react typescript

Last synced: 24 Jun 2024

https://github.com/HTTPArchive/almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community

bigquery http-archive web-almanac

Last synced: 24 Jun 2024

https://github.com/GoogleCloudPlatform/bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.

bigquery data-warehouse google-cloud-platform sql utilities

Last synced: 22 Jun 2024

https://github.com/r-dbi/bigrquery

An interface to Google's BigQuery from R.

bigquery database r

Last synced: 22 Jun 2024

https://github.com/Canner/WrenAI

WrenAI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.

agent ai bigquery duckdb fastapi llm nextjs nlidb openai postgresql python rag sql text-to-sql typescript

Last synced: 21 Jun 2024

https://github.com/tellery/tellery

Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql

Last synced: 21 Jun 2024

https://github.com/CartoDB/analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql

Last synced: 21 Jun 2024

https://github.com/goccy/bigquery-emulator

BigQuery emulator server implemented in Go

bigquery emulator gcp go golang google-cloud google-cloud-platform

Last synced: 20 Jun 2024

https://github.com/google/data-quality-monitor

Data Quality Monitor (DQM) - Continuously validate your data with easy, customizable rules.

bigquery cloudstorage data-quality-checks gcp google-cloud-platform python terraform

Last synced: 16 Jun 2024

https://github.com/bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake

Last synced: 15 Jun 2024

https://github.com/mchmarny/github-activity-counter

Cloud Run service for GitHub event Webhook to monitor repo or org activity in real-time in Stackdriver and analyze activity through ad-hoc SQL queries in BigQuery

bigquery cloudrun dataflow github pubsub stackdriver webhook

Last synced: 14 Jun 2024

https://github.com/starlake-ai/jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

bigquery databricks duckdb java query redshift rewrite snowflake transpiler

Last synced: 13 Jun 2024

https://github.com/blockchain-etl/polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

airflow bigquery cryptocurrency data-engineering etl gcp matic-network maticnetwork polygon

Last synced: 11 Jun 2024

https://github.com/HariSekhon/SQL-scripts

100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DevOps / DBA / Analytics / performance engineering. Google BigQuery ML machine learning classification.

athena aws aws-athena bigquery bigquery-ml dba devops gcp google-bigquery google-cloud-sql google-cloudsql-mysql hacktoberfest machine-learning mariadb mysql performance postgres postgresql rds sql

Last synced: 11 Jun 2024

https://github.com/openbridge/ob_datastash

Stream your CSV files to an HTTP API

aws bigquery csv csv-files logstash parquet redshift

Last synced: 10 Jun 2024

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 07 Jun 2024

https://github.com/PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage

bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing

Last synced: 07 Jun 2024

https://github.com/Multiwoven/multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.

bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript

Last synced: 05 Jun 2024

https://github.com/cloudyr/bigQueryR

R Interface with Google BigQuery

api bigquery cloudyr google googleauthr r

Last synced: 04 Jun 2024

https://github.com/hackersandslackers/bigquery-sqlalchemy-tutorial

:bar_chart: :arrow_right: :floppy_disk: ETL script to migrate data from BigQuery to SQL.

bigquery bigquery-sqlalchemy-tutorial databases etl mysql postgres python sql sqlalchemy tutorial

Last synced: 03 Jun 2024

https://github.com/urish/bigtsquery

Search Engine for TypeScript Code using AST Queries

angular bigquery search tsquery typescript

Last synced: 03 Jun 2024

https://github.com/HTTPArchive/bigquery

BigQuery import and processing pipelines

bigquery

Last synced: 03 Jun 2024

https://github.com/HVF/franchise

🍟 a notebook sql client. what you get when have a lot of sequels.

bigquery database mysql postgresql sql

Last synced: 03 Jun 2024

https://github.com/naseemkullah/gcp-accountant

A tool to identify high cost resources in GCP at a granular level

bigquery cost cost-engineering cost-resources gcp gcp-accountant

Last synced: 02 Jun 2024

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 01 Jun 2024

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 31 May 2024

https://github.com/swirlai/swirl-search

Swirl is open-source software that uses AI to simultaneously search multiple content and data sources, finds the best results using a reader LLM, then prompts Generative AI, enabling you to get answers from your own data.

ai-search bigquery django federated-query federated-search gpt large-language-models metasearch python rag relevancy search search-engine unified-search

Last synced: 31 May 2024

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 27 May 2024

https://github.com/gabfl/bigquery_fdw

BigQuery Foreign Data Wrapper for PostgreSQL

bigquery fdw postgresql postgresql-extension

Last synced: 27 May 2024

https://github.com/ExpediaGroup/circus-train

Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.

big-data bigquery hive hive-metastore hive-table replicate-data replication s3

Last synced: 26 May 2024

https://github.com/cube-js/cube

📊 Cube — The Semantic Layer for Building Data Applications

analytics bigquery cube headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless sql

Last synced: 16 May 2024

https://github.com/getredash/redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization

Last synced: 16 May 2024

https://github.com/omnata-labs/dbt-ml-preprocessing

A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

bigquery dbt redshift scikit-learn snowflake

Last synced: 13 May 2024

https://github.com/ScalefreeCOM/datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.

azure-synapse bigquery datavault dbt dbt-packages exasol google-bigquery hubs links pits postgresql redshift satellites scalefree snapshots snowflake sourcemarts stagingarea

Last synced: 13 May 2024

https://github.com/dbt-checkpoint/dbt-checkpoint

:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

bigquery business-intelligence dbt pre-commit pre-commit-hook quality-assurance snowflake sql

Last synced: 13 May 2024

https://github.com/elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake

Last synced: 13 May 2024

https://github.com/EvgSkv/logica

Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.

bigquery datalog language logic-programming logica postgresql presto prolog prolog-implementation sql sqlite trino

Last synced: 10 May 2024

https://github.com/hasura/graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.

access-control api automatic-api bigquery graphql graphql-api graphql-server haskell hasura mongodb postgres rest-api sql-server subgraph supergraph

Last synced: 10 May 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 May 2024

https://github.com/cuebook/CueObserve

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

Last synced: 09 May 2024

https://github.com/mara/mara-example-project-2

An example mini data warehouse for python project stats, template for new projects

bigquery data-integration etl pypi sql

Last synced: 04 May 2024

https://github.com/spotify/scio

A Scala API for Apache Beam and Google Cloud Dataflow.

batch beam bigquery data dataflow google-cloud ml scala scio streaming

Last synced: 30 Apr 2024

https://github.com/googleapis/python-bigquery-pandas

Google BigQuery connector for pandas

bigquery data pandas

Last synced: 28 Apr 2024

https://github.com/kikinteractive/go-bqstreamer

Stream data into Google BigQuery concurrently using InsertAll()

bigquery go golang

Last synced: 26 Apr 2024

https://github.com/googleapis/nodejs-bigquery

Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.

bigquery database nodejs sql

Last synced: 25 Apr 2024

https://github.com/miraisolutions/sparkbq

Sparklyr extension package to connect to Google BigQuery

bigquery r spark sparklyr

Last synced: 25 Apr 2024

https://github.com/blockchain-etl/ethereum-etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

aws bigquery blockchain-analytics csv erc20 erc20-tokens erc721 ethereum etl export gcp google-cloud sql transaction

Last synced: 25 Apr 2024

https://github.com/beekeeper-studio/beekeeper-studio

Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.

bigquery cassandra cockroachdb database electron firebird linux-app mac-app mariadb mssql mysql postgresql sql sql-server sqlite windows-app

Last synced: 25 Apr 2024

https://github.com/greenpeace/gpes-bigquery-recipes

Google Big Query recipes to Analyse our data.

bigquery database-management sql

Last synced: 25 Apr 2024

https://github.com/greenpeace/gpes-old-en-petitions-api-emulator

Emulates the deprecated EN petition's API. Useful if you have legacy microsites with petitions.

bigquery mysql petitions sqlite3

Last synced: 25 Apr 2024

https://github.com/rupurt/odbc-scanner-duckdb-extension

A DuckDB extension to read data directly from databases supporting the ODBC interface

analytics bigquery columnar-database cpp data-engineering db2 duckdb mariadb mssql mysql nix odbc olap oracle postgres snowflake vector-engine

Last synced: 20 Apr 2024

https://github.com/jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake

Last synced: 18 Apr 2024

https://github.com/GoogleCloudPlatform/security-analytics

Community Security Analytics provides a set of community-driven audit & threat queries for Google Cloud

bigquery chronicle cloud-security-command-center gcp google-cloud security

Last synced: 17 Apr 2024

https://github.com/getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery, with definitions imported from Collibra, Datahub, ODD and the like.

bigquery data-catalog data-contracts data-governance data-processing databricks policy-enforcement snowflake

Last synced: 11 Apr 2024

https://github.com/stanford-esrg/gps

GPS is a scanning platform that learns and predicts the location of IPv4 services across all 65K ports.

bigquery internet-wide-scanning ipv4 network port-scan port-scanner port-scanning scanning security security-scanner security-tools zgrab zmap

Last synced: 07 Apr 2024

https://github.com/google/starthinker

Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."

airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows

Last synced: 03 Apr 2024

https://github.com/ofek/pypinfo

Easily view PyPI download statistics via Google's BigQuery.

bigquery pypi python statistics

Last synced: 03 Apr 2024

https://github.com/spotify/ratatool

A tool for data sampling, data generation, and data diffing

avro bigquery parquet protobuf scala scalacheck

Last synced: 31 Mar 2024

https://github.com/GoogleCloudPlatform/professional-services

Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.

bigquery examples gke google-cloud-compute google-cloud-dataflow google-cloud-ml google-cloud-platform solutions tools

Last synced: 27 Mar 2024

https://github.com/nodefluent/bigquery-kafka-connect

:cloud: nodejs kafka connect connector for Google BigQuery

big-data bigquery connect etl google-cloud kafka kafka-connect nodejs

Last synced: 26 Mar 2024

https://github.com/gojekfarm/beast

[Deprecated] Load data from Kafka to any data warehouse. BQ sink is being supported in Firehose now. https://github.com/odpf/firehose

beast bigquery dataops kafka warehouse

Last synced: 26 Mar 2024

https://wix.github.io/quix

Quix Notebook Manager

athena bigquery notebook-manager presto trino

Last synced: 25 Mar 2024

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 24 Mar 2024

https://github.com/evidence-dev/sqltools-bigquery-driver

BigQuery Driver for SQLTools

bigquery sql sqltools sqltools-driver

Last synced: 21 Mar 2024

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 17 Mar 2024

https://github.com/data-integrations/google-cloud

A collection of Google Cloud Platform (GCP) plugins

bigquery cdap cdap-plugin gcs google pubsub

Last synced: 17 Mar 2024

https://github.com/kbhattac/coolretailer

Microservices with Istio, gRPC, Redis, BigQuery, Spring Boot, Spring Cloud and Stackdriver

bigquery google-cloud google-kubernetes-engine grafana grpc istio kiali locust microservices redis spring-boot spring-cloud zipkin

Last synced: 13 Mar 2024