Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
BigQuery
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
- GitHub: https://github.com/topics/bigquery
- Wikipedia: https://en.wikipedia.org/wiki/BigQuery/
- Repo: https://github.com/GoogleCloudPlatform/bigquery-utils/
- Released: May 19, 2010
- Related Topics: cloud-computing,
- Aliases: bq,
- Last updated: 2024-11-08 00:03:11 UTC
- JSON Representation
https://github.com/hasura/graphql-engine
Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
access-control api automatic-api bigquery graphql graphql-api graphql-server haskell hasura mongodb postgres rest-api sql-server subgraph supergraph
Last synced: 28 Oct 2024
https://github.com/getredash/redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
analytics athena bi bigquery business-intelligence dashboard databricks hacktoberfest javascript mysql postgresql python redash redshift spark spark-sql visualization
Last synced: 28 Oct 2024
https://github.com/cube-js/cube
📊 Cube — The Semantic Layer for Building Data Applications
analytics bigquery cube headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless sql
Last synced: 01 Nov 2024
https://github.com/cube-js/cube.js
📊 Cube — The Semantic Layer for Building Data Applications
analytics bigquery cube headless-bi hive microservice mysql postgresql presto rust semantic-layer serverless sql
Last synced: 05 Aug 2024
https://github.com/beekeeper-studio/beekeeper-studio
Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.
bigquery cassandra cockroachdb database electron firebird linux-app mac-app mariadb mssql mysql postgresql sql sql-server sqlite windows-app
Last synced: 28 Oct 2024
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 28 Oct 2024
https://github.com/growthbook/growthbook
Open Source Feature Flagging and A/B Testing Platform
ab-testing abtest abtesting analytics bigquery clickhouse continuous-delivery data-analysis data-engineering data-science experimentation feature-flagging feature-flags mixpanel redshift remote-config snowflake split-testing statistics
Last synced: 28 Oct 2024
https://github.com/cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 11 Oct 2024
https://github.com/ibis-project/ibis
the portable Python dataframe library
bigquery clickhouse dask database datafusion duckdb impala mssql mysql pandas polars postgresql pyarrow pyspark python snowflake sql sqlalchemy sqlite trino
Last synced: 28 Oct 2024
https://github.com/rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
bigquery cdp customer-data customer-data-lake customer-data-pipeline customer-data-platform data-engineering data-integration data-pipeline data-synchronization data-warehouse elt etl event-streaming privacy redshift segment-alternative snowflake warehouse-management warehouse-native
Last synced: 29 Oct 2024
https://github.com/HVF/franchise
🍟 a notebook sql client. what you get when have a lot of sequels.
bigquery database mysql postgresql sql
Last synced: 29 Oct 2024
https://github.com/hvf/franchise
🍟 a notebook sql client. what you get when have a lot of sequels.
bigquery database mysql postgresql sql
Last synced: 13 Oct 2024
https://github.com/jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake
Last synced: 11 Oct 2024
https://github.com/k1low/tbls
tbls is a CI-Friendly tool for document a database, written in Go.
bigquery continuous-integration database-document database-schema documentation-tool dynamodb er-diagram excel hacktoberfest mariadb markdown mermaid mysql plantuml postgresql redshift snowflake spanner sqlite sqlserver
Last synced: 11 Oct 2024
https://github.com/briefercloud/briefer
Dashboards and notebooks in a single place. Create powerful and flexible dashboards using code, or build beautiful Notion-like notebooks and share them with your team.
analytics bi bigquery briefer business-intelligence businessintelligence dashboard data-analysis data-visualization jupyter notebook postgres postgresql reporting visualization
Last synced: 29 Oct 2024
https://github.com/k1LoW/tbls
tbls is a CI-Friendly tool for document a database, written in Go.
bigquery continuous-integration database-document database-schema documentation-tool dynamodb er-diagram excel hacktoberfest mariadb markdown mermaid mysql plantuml postgresql redshift snowflake spanner sqlite sqlserver
Last synced: 29 Oct 2024
https://github.com/blockchain-etl/ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
aws bigquery blockchain-analytics csv erc20 erc20-tokens erc721 ethereum etl export gcp google-cloud sql transaction
Last synced: 29 Oct 2024
https://github.com/GoogleCloudPlatform/professional-services
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
bigquery examples gke google-cloud-compute google-cloud-dataflow google-cloud-ml google-cloud-platform solutions tools
Last synced: 25 Oct 2024
https://github.com/googlecloudplatform/professional-services
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
bigquery examples gke google-cloud-compute google-cloud-dataflow google-cloud-ml google-cloud-platform solutions tools
Last synced: 07 Oct 2024
https://github.com/bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
bigquery copy-database data-ingestion data-integration data-pipeline duckdb ingestion-pipeline mssql postgresql snowflake
Last synced: 12 Oct 2024
https://github.com/PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing
Last synced: 31 Oct 2024
https://github.com/peerdb-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
bigquery cdc clickhouse cloud-native distributed-systems etl eventhubs kafka postgres postgresql realtime rust s3 snowflake sql stream-processing
Last synced: 09 Oct 2024
https://github.com/elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake
Last synced: 15 Oct 2024
https://github.com/evgskv/logica
Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.
bigquery datalog language logic-programming logica postgresql presto prolog prolog-implementation sql sqlite trino
Last synced: 15 Oct 2024
https://github.com/EvgSkv/logica
Logica is a logic programming language that compiles to SQL. It runs on Google BigQuery, PostgreSQL and SQLite.
bigquery datalog language logic-programming logica postgresql presto prolog prolog-implementation sql sqlite trino
Last synced: 27 Oct 2024
https://github.com/swirlai/swirl-search
SWIRL AI Connect: AI infrastructure software that powers your Search & Retrieval Augmented Generation (RAG) applications. Simplify and enhance your AI pipelines with seamless integration of large language models (LLMs) and data sources.
ai-search bigquery django federated-query federated-search gpt large-language-models metasearch python rag relevancy search search-engine unified-search
Last synced: 13 Oct 2024
https://github.com/Multiwoven/multiwoven
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript
Last synced: 02 Nov 2024
https://github.com/canner/wrenai
Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.
agent ai bigquery duckdb fastapi gpt llm nextjs nlp openai postgresql python rag sql text-to-sql typescript
Last synced: 11 Oct 2024
https://github.com/Canner/WrenAI
Wren AI makes your database RAG-ready. Implement Text-to-SQL more accurately and securely.
agent ai bigquery duckdb fastapi gpt llm nextjs nlp openai postgresql python rag sql text-to-sql typescript
Last synced: 06 Aug 2024
https://github.com/GoogleCloudPlatform/DataflowTemplates
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
apache-beam bigquery bigtable dataflow-templates google-cloud-dataflow google-cloud-spanner google-cloud-storage
Last synced: 05 Nov 2024
https://github.com/googlecloudplatform/dataflowtemplates
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
apache-beam bigquery bigtable dataflow-templates google-cloud-dataflow google-cloud-spanner google-cloud-storage
Last synced: 07 Oct 2024
https://github.com/scratchdata/scratchdata
Scratch is a swiss army knife for big data.
bigquery clickhouse data-warehouse duckdb hacktoberfest motherduck olap redshift snowflake
Last synced: 09 Oct 2024
https://github.com/GoogleCloudPlatform/bigquery-utils
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
bigquery data-warehouse google-cloud-platform sql utilities
Last synced: 02 Aug 2024
https://github.com/googlecloudplatform/bigquery-utils
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
bigquery data-warehouse google-cloud-platform sql utilities
Last synced: 07 Oct 2024
https://github.com/madnight/githut
Github Language Statistics
bigquery dataset functional-reactive-programming github-language-statistics github-pages-website jamstack languages programming-languages react react-hooks serverless sql-query statistics
Last synced: 29 Oct 2024
https://github.com/goccy/bigquery-emulator
BigQuery emulator server implemented in Go
bigquery emulator gcp go golang google-cloud google-cloud-platform
Last synced: 13 Oct 2024
https://github.com/raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows
Last synced: 13 Oct 2024
https://github.com/canner/vulcan-sql
Data API Framework for AI Agents and Data Apps
ai ai-agent analytics api-builder bigquery clickhouse data-lake data-warehouse database duckdb ksqldb postgresql reporting restful-api snowflake spreadsheet sql typescript vulcan-sql vulcansql
Last synced: 01 Nov 2024
https://github.com/Canner/vulcan-sql
Data API Framework for AI Agents and Data Apps
ai ai-agent analytics api-builder bigquery clickhouse data-lake data-warehouse database duckdb ksqldb postgresql reporting restful-api snowflake spreadsheet sql typescript vulcan-sql vulcansql
Last synced: 07 Nov 2024
https://github.com/multiwoven/multiwoven
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.
bigquery cdp customer-data-platform data-activation data-engineering data-pipeline data-warehouse databricks dbt etl hacktoberfest open-source postresql react redshift reverse-etl ruby self-hosted snowflake typescript
Last synced: 12 Oct 2024
https://github.com/HTTPArchive/almanac.httparchive.org
HTTP Archive's annual "State of the Web" report made by the web community
bigquery http-archive web-almanac
Last synced: 03 Aug 2024
https://github.com/httparchive/almanac.httparchive.org
HTTP Archive's annual "State of the Web" report made by the web community
bigquery http-archive web-almanac
Last synced: 29 Oct 2024
https://github.com/artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake
Last synced: 12 Oct 2024
https://github.com/ploomber/jupysql
Better SQL in Jupyter. 📊
bigquery clickhouse data-engineering data-science duckdb hive jupyter mysql polars postgres presto python redshift snowflake spark-sql sql sqlite trino tsql
Last synced: 29 Sep 2024
https://github.com/unytics/bigfunctions
Supercharge BigQuery with BigFunctions
bigquery data data-analytics data-engineering data-visualization data-warehouse
Last synced: 12 Oct 2024
https://github.com/dbt-checkpoint/dbt-checkpoint
:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
bigquery business-intelligence dbt pre-commit pre-commit-hook quality-assurance snowflake sql
Last synced: 30 Oct 2024
https://github.com/r-dbi/bigrquery
An interface to Google's BigQuery from R.
Last synced: 12 Oct 2024
https://github.com/synmetrix/synmetrix
Synmetrix – production-ready open source semantic layer on Cube
big-data bigquery business-intelligence clickhouse cube cubejs data-engineering databricks dremio druid firebolt llm prestodb redshift semantic-layer snowflake vertica
Last synced: 12 Oct 2024
https://github.com/googleapis/nodejs-bigquery
Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
Last synced: 12 Oct 2024
https://github.com/tylertreat/bigquery-python
Simple Python client for interacting with Google BigQuery.
bigquery google-bigquery python
Last synced: 12 Oct 2024
https://github.com/googleapis/python-bigquery-pandas
Google BigQuery connector for pandas
Last synced: 13 Oct 2024
https://github.com/ofek/pypinfo
Easily view PyPI download statistics via Google's BigQuery.
bigquery pypi python statistics
Last synced: 13 Oct 2024
https://github.com/basedosdados/mais
⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/
bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia
Last synced: 13 Oct 2024
https://github.com/HariSekhon/SQL-scripts
100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS Athena. DBA, Analytics, DevOps, performance engineering. Google BigQuery ML machine learning classification.
athena aws aws-athena bigquery bigquery-ml dba devops gcp google-bigquery google-cloud-sql google-cloudsql-mysql machine-learning mariadb mysql oracle performance postgres postgresql rds sql
Last synced: 07 Nov 2024
https://github.com/harisekhon/sql-scripts
100+ SQL Scripts - PostgreSQL, MySQL, Google BigQuery, MariaDB, AWS Athena. DBA, Analytics, DevOps, performance engineering. Google BigQuery ML machine learning classification.
athena aws aws-athena bigquery bigquery-ml dba devops gcp google-bigquery google-cloud-sql google-cloudsql-mysql hacktoberfest machine-learning mariadb mysql performance postgres postgresql rds sql
Last synced: 11 Oct 2024
https://github.com/tellery/tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
analytics bigquery business-intelligence collaboration dashboard data-analytics data-modeling data-science data-visualization database dbt notebook self-hosted sql
Last synced: 29 Oct 2024
https://github.com/GoogleCloudDataproc/spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
bigquery bigquery-storage-api google-bigquery google-cloud google-cloud-dataproc spark
Last synced: 30 Sep 2024
https://github.com/googleclouddataproc/spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
bigquery bigquery-storage-api google-bigquery google-cloud google-cloud-dataproc spark
Last synced: 11 Oct 2024
https://github.com/astronomer/astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows
Last synced: 13 Oct 2024
https://github.com/spotify/ratatool
A tool for data sampling, data generation, and data diffing
avro bigquery parquet protobuf scala scalacheck
Last synced: 13 Oct 2024
https://github.com/machine-learning-apps/issue-label-bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"
bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow
Last synced: 29 Sep 2024
https://github.com/machine-learning-apps/Issue-Label-Bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"
bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow
Last synced: 25 Oct 2024
https://github.com/mprove-io/mprove
Open Source Self-service Business Intelligence with Version Control :tada:
analytics bigquery business-intelligence clickhouse dashboard data-visualization looker metrics postgresql snowflake
Last synced: 13 Oct 2024
https://github.com/raystack/firehose
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
apache-kafka bigquery dataops firehose influxdb kafka postgresql prometheus sink streaming
Last synced: 13 Oct 2024
https://github.com/scale8/scale8-tag-manager-and-analytics
Website analytics, JavaScript error tracking + analytics, tag manager, data ingest endpoint creation (tracking pixels). GDPR + CCPA compliant.
advertising analytics app bigquery charts clickhouse cloud cmp gdpr google-analytics google-tag-manager marketing metrics privacy scale8 statistics tag-manager typescript website
Last synced: 29 Sep 2024
https://github.com/data-drift/data-drift
Metrics Observability & Troubleshooting
analytics bigquery context data-diffing data-governance data-lineage data-monitoring data-observability data-quality data-reliability data-version-control dbt dbt-metrics dbt-packages drill-down metrics reconciliation redshift semantic-layer snowflake
Last synced: 30 Oct 2024
https://github.com/GoogleCloudPlatform/security-analytics
Community Security Analytics provides a set of community-driven audit & threat queries for Google Cloud
audit-logs bigquery chronicle cloud-security-command-center gcp google-cloud log-analytics logging network-analysis network-logs security security-operations threat-detection
Last synced: 02 Nov 2024
https://github.com/googlecloudplatform/security-analytics
Community Security Analytics provides a set of community-driven audit & threat queries for Google Cloud
audit-logs bigquery chronicle cloud-security-command-center gcp google-cloud log-analytics logging network-analysis network-logs security security-operations threat-detection
Last synced: 07 Oct 2024
https://github.com/GoogleCloudDataproc/hadoop-connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs
Last synced: 25 Oct 2024
https://github.com/googleclouddataproc/hadoop-connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
bigquery google-cloud-dataproc hadoop hadoop-filesystem hadoop-hcfs
Last synced: 12 Oct 2024
https://wix.github.io/quix
Quix Notebook Manager
athena bigquery notebook-manager presto trino
Last synced: 01 Nov 2024
https://github.com/wix-incubator/quix
Quix Notebook Manager
athena bigquery notebook-manager presto trino
Last synced: 29 Oct 2024
https://github.com/doitintl/bigquery-grafana
Google BigQuery Datasource Plugin for Grafana. (NO LONGER MAINTAINED)
bigquery bigquery-datasource google-bigquery google-cloud-platform grafana grafana-bigquery grafana-bigquery-datasource grafana-datasource metrics monitoring typescript
Last synced: 29 Sep 2024
https://github.com/lynnlangit/gcp-essentials
Sample code and notes for my GCP courses on LinkedIn Learning
bigquery gce gcloud gcp gcs gemini gke google-cloud google-cloud-functions google-cloud-platform google-cloud-run google-cloud-storage tensorflow vertex-ai
Last synced: 30 Oct 2024
https://github.com/bxparks/bigquery-schema-generator
Generates the BigQuery schema from newline-delimited JSON or CSV data records.
bigquery bigquery-schema google-bigquery python3
Last synced: 29 Oct 2024
https://github.com/cuebook/cueobserve
Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting
Last synced: 12 Oct 2024
https://github.com/thinkingmachines/geomancer
Automated feature engineering for geospatial data
bigquery feature-engineering geospatial machine-learning openstreetmap
Last synced: 29 Sep 2024
https://github.com/cuebook/CueObserve
Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases
anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting
Last synced: 03 Aug 2024
https://github.com/yoshidan/google-cloud-rust
Google Cloud Client Libraries for Rust.
bigquery gcp gcs google-cloud-platform pubsub rust spanner
Last synced: 12 Oct 2024
https://github.com/hongbo-miao/hongbomiao.com
A personal research and development (R&D) lab that facilitates the sharing of knowledge.
aerospace bigquery cloud-native continuous-machine-learning distributed-tracing embedded freertos graphql infrastructure-as-code kubeflow kubernetes matlab mlops national-instruments neural-network pytorch quantum-computing robot-operating-system service-mesh veristand
Last synced: 30 Oct 2024
https://github.com/digitalghost-dev/premier-league
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
bigquery cloud-run data-engineer data-pipeline data-visualization docker firestore go google-cloud prefect python streamlit
Last synced: 26 Sep 2024
https://github.com/lots-of-things/gpt2-bert-reddit-bot
a bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models
bert bigquery colab-notebook gpt-2 praw
Last synced: 12 Oct 2024
https://github.com/cartodb/analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
analytics-toolbox bigquery carto databricks geospatial gis postgres redshift snowflake sql
Last synced: 30 Oct 2024
https://github.com/tuva-health/tuva
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
analytics-engineering bigquery data-analytics data-governance data-lineage data-pipelines data-warehouse dbt dbt-packages healthcare healthcare-analysis healthcare-data open-source redshift snowflake sql terminology
Last synced: 30 Oct 2024
https://github.com/omnata-labs/dbt-ml-preprocessing
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
bigquery dbt redshift scikit-learn snowflake
Last synced: 12 Oct 2024
https://github.com/googlecloudplatform/fraudfinder
Fraudfinder: A comprehensive lab series on how to build a real-time fraud detection system on Google Cloud
bigquery bigquery-ml dataflow google-cloud-platform machine-learning mlops mlpipelines vertex-ai
Last synced: 07 Oct 2024
https://github.com/mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
bigquery data-integration etl pypi sql
Last synced: 12 Oct 2024
https://github.com/xnuinside/simple-ddl-parser
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
bigquery columns ddl ddl-parser ddls hive hql mssql mysql oracle-database oracle-db parser postgresql redshift schemas snowflake sql sql-parser tsql types
Last synced: 12 Oct 2024
https://github.com/google/starthinker
Reference framework for building data workflows provided by Google. Accelerates authentication, logging, scheduling, and deployment of solutions using GCP. To borrow a tagline.. "The framework for professionals with deadlines."
airflow app-engine automation bigquery cloud-functions cm360 colab-notebook data-science django dv360 google-ads google-analytics logger python scheduler ui workflows
Last synced: 29 Sep 2024
https://github.com/spotify/magnolify
A collection of Magnolia add-on modules
avro bigquery bigtable cats datastore guava magnolia neo4j parquet protobuf scala scalacheck tensorflow
Last synced: 10 Oct 2024
https://github.com/googleapis/python-bigquery-dataframes
BigQuery DataFrames
bigquery data-science machine-learning python
Last synced: 12 Oct 2024
https://github.com/google/vscode-bigquery
A Visual Studio Code plugin for running BigQuery queries.
Last synced: 29 Sep 2024
https://github.com/googlecloudplatform/cortex-data-foundation
Data Foundation - Google Cloud Cortex Framework
airflow bigquery cloud google googlecloud salesforce sap
Last synced: 07 Oct 2024