Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
BigQuery
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
- GitHub: https://github.com/topics/bigquery
- Wikipedia: https://en.wikipedia.org/wiki/BigQuery/
- Repo: https://github.com/GoogleCloudPlatform/bigquery-utils/
- Released: May 19, 2010
- Related Topics: cloud-computing,
- Aliases: bq,
- Last updated: 2024-11-15 00:03:21 UTC
- JSON Representation
https://github.com/romange/puma
Bigquery-like engine for processing structured json-like records
Last synced: 13 Oct 2024
https://github.com/pmhalvor/whale-speech
A pipeline to map whale sightings to hydrophone audio
beam bigquery gcs mle model-as-a-service python tensorflow2
Last synced: 21 Oct 2024
https://github.com/anilkhichar/bq-table-copy-automation
Copy table from one dataset to another in google big query using bash script
automation bash bash-script big-query bigquery bigquery-cp gcp google
Last synced: 07 Nov 2024
https://github.com/chukwuemekaaham/uber-gcp-etl-project
Data Engineering Zoomcamp Final Project
bigquery cloud-storage csv docker-compose gcp jupyter-notebook looker-studio mageai python spark spreadsheets terraform
Last synced: 11 Nov 2024
https://github.com/elithrar/finding-bugs-with-bigquery
A talk on using BigQuery, the GitHub Public Data & some elbow grease to find bugs in OSS projects.
big-data bigquery bugs github golang open-source
Last synced: 13 Oct 2024
https://github.com/chukwuemekaaham/data-engineering-zoomcamp
Datatalks Club Free Data Engineering Zoomcamp Project
bigquery dbt docker-compose duckdb gcp gcp-cloud-storage github-actions jupyter-notebook kafka linux looker-studio mageai pandas postgresql prefect python redpanda risingwave spark terraform
Last synced: 11 Oct 2024
https://github.com/shinichi-takii/atom-language-sql-bigquery
BigQuery SQL language support in Atom
atom atom-package bigquery grammar snippets sql syntax-highlighting
Last synced: 31 Oct 2024
https://github.com/teraearlywine/sample_sql
The following repo contains samples of SQL code that can be referenced by future clients or employers.
Last synced: 12 Oct 2024
https://github.com/morphl-ai/morphl-model-publishers-churning-users-bigquery
BigQuery connector, pre-processor and model for predicting churning users for digital publishers using Google Analytics 360
bigquery google-analytics machine-learning morphl-platform pipeline preprocessor pyspark
Last synced: 12 Nov 2024
https://github.com/fpopic/bigquery-schema-select
(Script) Generates SQL query that selects all fields (recursively for nested fields) from the provided BigQuery schema file.
bigquery bigquery-schema scala sql
Last synced: 12 Oct 2024
https://github.com/dataform-co/bigquery-ml-pipeline
An example of machine pipeline on Bigquery ML using Dataform
bigquery bigquery-ml dataform machine-learning-pip sql
Last synced: 13 Nov 2024
https://github.com/chandanpasunoori/event-sync
Event Sync is for syncing events from multiple sources to multiple destinations, targetted for adhoc events, where sources support acknowledgement functionality.
bigquery golang-tools google-cloud-platform pubsub
Last synced: 15 Oct 2024
https://github.com/nguyendangxuanlinh/newyorkbike-rental-trip-time-prediction-model-googlebigquery
The ML project uses Linear Regression to predict the trip time of a bike rental for a new prediction system in new mobile application. The ML datasets have been collected and stored in a BigQuery public dataset
bigquery linear-regression machine-learning
Last synced: 12 Oct 2024
https://github.com/stkchan/web-scraping-with-selenium
bigquery pandas python selenium-webdriver webscraping
Last synced: 12 Oct 2024
https://github.com/jaehyeon-kim/dbt-cicd-demo
DBT CI/CD Demo
bigquery cicd dataengineering dbt gcp github-actions
Last synced: 13 Oct 2024
https://github.com/moh-ayman/stripeapi-to-bq---cfunc-etl
Google Cloud Function built to perform an ETL Job to Collect StripeAPI Data and Transform it to be able to Import it to Bigquery.
bigquery dataengineering etl-pipeline gcp gcp-cloud-functions pandas-dataframe python stripe-api
Last synced: 15 Nov 2024
https://github.com/squidmin/java17-spring-gradle-bigquery-reference
Java v17⋅ Spring v3 ⋅ Gradle ⋅ BigQuery
bigquery gradle java java-17-gradle java17 java17-spring-boot spring-boot-3
Last synced: 27 Oct 2024
https://github.com/ostrokach/uniparc_xml_parser
UniParc dataset describing ~300 million protein sequences converted into relational tables accessible through Google BigQuery (and as Parquet files).
bigquery bioinformatics csv-files parquet-files protein-domains protein-sequences
Last synced: 12 Oct 2024
https://github.com/essien1990/etl_pipeline_airflow
Creating pipelines using Python3 and Apache Airflow to load tables into Google Big Query Dataware House
airflow airflow-dags airflow-operators bash bigquery bq datawarehouse etl-pipeline python3
Last synced: 12 Oct 2024
https://github.com/icarusso/bigqueryexporter
Export query data from google bigquery to local machine
Last synced: 12 Oct 2024
https://github.com/alterra-greeve/de-capstone
Capstone Project SIB Batch 6 x Alterra Academy - Data Engineer
bigquery cloud-function data-engineering docker googlefirebase looker-studio python
Last synced: 12 Oct 2024
https://github.com/mattwelke/packt-book-bot
Bot that tweets and logs the Packt free eBook of the day in BigQuery daily.
bigquery bot ebooks ibm-cloud-functions java openwhisk
Last synced: 13 Oct 2024
https://github.com/poogles/pytest-bq
pytest fixtures for a local bigquery suitable for local development.
bigquery bigquery-emulator pytest
Last synced: 12 Oct 2024
https://github.com/tuancamtbtx/gcp-udfs-example
Google BigQuery Javascript UDF Function Examples
bigquery gcp javascript nodejs npm udf
Last synced: 09 Nov 2024
https://github.com/mlabarrere/pygquery
🐷 Multitread your data with Google BigQuery
bigquery dataframe google-bigquery multithreading pandas python
Last synced: 12 Oct 2024
https://github.com/analyticace/data-engineering-projects
Collection of Open Source Data Engineering Projects
aws big-data bigquery data docker engineering etl oracle-database pipeline sql
Last synced: 05 Nov 2024
https://github.com/yu-iskw/terraform-google-copy-bq-datasets
A terraform module to copy BigQuery datasets across regions
bigquery data-engineering google-cloud terraform
Last synced: 27 Oct 2024
https://github.com/miguelapp10/api_simpliroute_urbano
extraer datos de la API de SimpliRoute y Urbano en un rango de fechas específico y procesarlos para su análisis y almacenamiento en Google BigQuery
api-client bigquery pandas python
Last synced: 12 Oct 2024
https://github.com/misszeferino/sql-projects
bigquery data-analysis mysql queries sql sqlite3
Last synced: 12 Oct 2024
https://github.com/benitomartin/benitomartin
Personal profile 😎
anaconda artificial-intelligence aws bash-script bigquery data-science gcp lambda-functions large-language-models linux machine-learning python pytorch retrieval-augmented-generation sagemaker scikit-learn tensorflow terraform
Last synced: 08 Nov 2024
https://github.com/mchmarny/sbomer
Generates daily SBOM and vulnerability reports for container images and saves resulting files into GCS bucket and data into BigQuery tables.
bigquery gcp gcs grype report sbom syft vex vulnerability
Last synced: 08 Nov 2024
https://github.com/pedrocarmona/big_query_adapter
An ActiveRecord Google BigQuery adapter
activerecord bigquery gem ruby-on-rails
Last synced: 13 Oct 2024
https://github.com/sigpwned/jdbq
JDBI-inspired Database Access Framework for Java + BigQuery
bigquery data-access-framework data-access-layer data-access-library data-lake java persistence persistence-framework persistence-layer
Last synced: 12 Oct 2024
https://github.com/justinjsd/analytics-engineering
📊 A repository focusing on analytics engineering, particularly using dbt on the Northwind Sample dataset
analytics bigquery dbt engineering sql
Last synced: 13 Nov 2024
https://github.com/rsachdeva/illuminatingdeposits-gcp-trigger
Terraform usage in the context of Google Cloud Platform GCP based Trigger of Resources applied to Cloud Functions. Both resource creation and destruction is through Terraform.
bigquery bigquery-table cloud-events functions-framework gcp go golang golangci-lint google-cloud google-cloud-function-pubsub-trigger google-cloud-functions google-cloud-pubsub google-cloud-sdk google-cloud-storage google-cloud-terraform sendgrid terraform
Last synced: 12 Oct 2024
https://github.com/gdbecker/dbtlabslearning
Learn the foundational steps of transforming data in dbt Cloud. Start by connecting dbt Cloud to a data warehouse and Git repository, then explore key concepts like modeling, sources, testing, documentation, and deployment. Get hands-on by building a model and running tests in dbt Cloud.
analytics-engineering bigquery dbt dbt-cloud jinja macros models packages sql testing
Last synced: 13 Oct 2024
https://github.com/xlfe/pyjdbq
The easiest way to ship journald logs to Google BigQuery
bigquery journald journald-logs logging security
Last synced: 12 Oct 2024
https://github.com/cch0/data-engineering-zoomcamp-2024-project
2024 project
bigquery cicd cloud-storage-application cloudstorage gcp mage pipelines terraform
Last synced: 01 Nov 2024
https://github.com/esanchezros/bigquery-maven-plugin
Maven plugin for managing BigQuery datasets, tables and views
bigquery java maven maven-plugin
Last synced: 28 Sep 2024
https://github.com/ackeecz/terraform-gcp-dataflow_pubsub_to_bq
Dataflow job subscriber to PubSub subscription. It takes message from subscription and push it into BigQuery table.
bigquery dataflow pubsub terraform-module
Last synced: 10 Nov 2024
https://github.com/kartikeya443/automated-data-pipeline-gcp
This project showcases the integration of various Google Cloud Platform services to build an efficient and automated data pipeline for sales data.
bigquery cloud data-engineering flask gcp google-cloud-platform looker-studio pipeline python sql
Last synced: 12 Oct 2024
https://github.com/mchmarny/stocker
Using tweeter sentiment and stock market price signal correlation to predict next day closing price
bigquery ml prediction regression-models
Last synced: 08 Nov 2024
https://github.com/kevin-rsj/real-estate-investments
Sistema de scoring que clasifica ciudades francesas para inversión en segundas viviendas según perfil de riesgo(alto, moderado y bajo). Evalúa ratios clave en áreas como demanda, disponibilidad, infraestructura, demografía y precios.
bigquery data-analytics looker-studio numpy pandas python sklearn-library sql visualization
Last synced: 29 Oct 2024
https://github.com/yeha98555/google-maps-analysis-pipeline
Taiwan Travel Attractions Analysis Data Pipeline
airflow bigquery cloudfunctions docker gcp gcs googlemaps googlesheets python terraform
Last synced: 29 Sep 2024
https://github.com/night-fury-me/real-time-vehicle-data-processing
A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.
bigquery cpp data-engineering data-streaming flink grpc kafka python real-time-data-processing
Last synced: 13 Oct 2024
https://github.com/adadalshabab/data-engineering-gcp-project
An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.
bigquery data data-science data-visualization databases dataengineering-a engineering etl-pipeline looker-studio powerbi
Last synced: 31 Oct 2024
https://github.com/knands42/data-ingestion
Data Ingestion project to evaluate my Kotlin skill using concurrency
bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow
Last synced: 31 Oct 2024
https://github.com/squidmin/java11-spring-gradle-bigquery-reference
Java v11 ⋅ Spring v2 ⋅ Gradle ⋅ BigQuery
bigquery gradle gradle-java java java-gradle java11 java11-spring-boot spring spring-boot-2 spring-mvc spring-rest
Last synced: 13 Oct 2024
https://github.com/victorcezeh/end-to-end-elt-pipeline
An end-to-end ELT project using the Brazilian E-Commerce dataset from Kaggle. This project demonstrates the use of Python, PostgreSQL, Docker, Docker Compose, Airflow, dbt, and BigQuery to ingest, transform, and analyze data, providing insights into sales, delivery times, and order distributions.
airflow bigquery dbt-core docker docker-compose postgresql python
Last synced: 13 Oct 2024
https://github.com/yu-iskw/bigquery-lineage
Visualize BigQuery data lineage graph
bigquery data-governance data-management visualization
Last synced: 30 Oct 2024
https://github.com/galois1915/google-ml-engineer
This program provides the skills you need to advance your career and provides training to support your preparation for the industry-recognized Google Cloud Professional Machine Learning Engineer certification.
api automl bigquery keras mlops-workflow tensorflow2 vertex-ai
Last synced: 13 Oct 2024
https://github.com/panagiotischaviaropoulos/google-data-analytics-case-study
bigquery data-visualization sql
Last synced: 13 Oct 2024
https://github.com/juldrixx/bigquery-avro-schema-converter
Website to convert a schema from one format to another between BigQuery and Avro
avro avro-schema bigquery bigquery-schema converter schema
Last synced: 13 Oct 2024
https://github.com/mehmoodulhaq570/bigquery_machine_learning_project
This project develops a machine learning model to predict incident groups based on data from the London Fire Brigade service calls. Using Python and the Google Colab environment, the model utilizes a Gradient Boosting Classifier to categorize incidents, improving resource allocation and incident response within the London Fire Brigade.
bigquery bigquery-dataset cloud colabs database database-project google-colab ipnyb jupyter-notebook machine-learning prediction-algorithm prediction-model python
Last synced: 05 Nov 2024
https://github.com/anyesh/gbq-helpers
GBQ related helper functions and snippets.
Last synced: 12 Nov 2024
https://github.com/brpy/nyc-trips
Data engineering | Zoomcamp journey on nyc trip data with gcp stack
Last synced: 05 Nov 2024
https://github.com/themihirmathur/uber-data-analytics
The goal of this project is to perform comprehensive data analytics on Uber trip data using a modern data engineering stack on Google Cloud Platform (GCP).
bigquery data-analysis data-engineering etl-pipeline google-cloud-platform looker python
Last synced: 12 Oct 2024
https://github.com/simhayn/genomics-cannabis-bigquery
BigQuery's Cannabis_Genomics Dataset Exploration using SQL in a Python Environment
big-data bigquery bioinformatics exploratory-data-analysis genomics python sql
Last synced: 13 Oct 2024
https://github.com/alexgenovese/machine-learning-bigquery-gcp
These SQL are based on available ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery.
bigquery google google-cloud-platform purchase sql visitors
Last synced: 07 Nov 2024
https://github.com/scraly/flume-bigquery-sink
An Apache Flume Sink implementation to publish data to Google BigQuery
Last synced: 06 Nov 2024
https://github.com/scraly/bigquery
Google BigQuery AaaS tools, tips and fun
Last synced: 06 Nov 2024
https://github.com/azapeti/bigquery-python-bash-automation
Since you're using the free version, you can only get data from your website through the Google Analytics API for the last 60 days. I would like to demonstrate in this repository how to run BigQuery queries in Python and automate it using bash and crontab for collecting historical data.
analytics automation bash bigquery cronjob crontab ga4 python python3
Last synced: 13 Oct 2024
https://github.com/arhea/go-mock-bigquery
Creates a mock BigQuery client based on the bigquery-emulator for testing in Golang projects.
bigquery golang golang-module google-bigquery google-cloud-platform testcontainers-go testing
Last synced: 12 Oct 2024
https://github.com/andre-gitdev/stocks-functions
This project is for EDA related to stock trading.
alpaca alpaca-trading-api bigquery google-cloud portfolio-optimization robinhood-api robinhood-portfolio stock-analysis stock-data stock-price-prediction stocks-api stocks-trading
Last synced: 13 Oct 2024
https://github.com/karencofre/riesgorelativo-lookerstudio
proyecto de análisis de datos y análisis perdicitvo en looker studio y google colab
bigquery data-analysis data-science machine-learning matplotlib python sklearn sql
Last synced: 13 Oct 2024
https://github.com/prashhhant213/strategic-analysis-of-retail-brand-in-south-america-using-sql
Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services
bigquery database mysql-server sql
Last synced: 07 Nov 2024
https://github.com/ivanildobarauna/ivanildobarauna
Special Repository to Make README
ai airflow big-data bigquery data-engineering gcp python
Last synced: 13 Oct 2024
https://github.com/yandex-cloud-examples/yc-bigquery-to-object-storage
Экспорт данных из Google Big Query через Google Storage в Object Storage Yandex Cloud.
bigquery object-storage python3 yandex-cloud yandexcloud
Last synced: 07 Nov 2024
https://github.com/marielachirinosr/nyc-taxi-trip-exploration-2019-2020
Explores passenger behavior & impact of COVID-19 on NYC taxi industry (Q1 2019-2020).
bigquery data data-analysis data-visualization python sql tableau
Last synced: 07 Nov 2024
https://github.com/djdhairya/uber-data-analytics
Mage Vm
aiml api bigdata bigquery deep-learning docker google-maps-api ml python3 sql ssh vmware
Last synced: 10 Nov 2024
https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer
Mackerel plugin to post bigquery's query result
Last synced: 12 Oct 2024
https://github.com/davidkhala/gcp-collections
Notebooks for GCP services
bigquery bq databricks datastore firestore google-cloud-platform
Last synced: 12 Oct 2024
https://github.com/nlgtuankiet/bq-noti
BigQuery notification
bigquery bq notification notifier
Last synced: 12 Oct 2024
https://github.com/edumoraes1/spam_count_sfmc
Consulta de SQL com contagem de envios de email e spam dos ultimos 365 dias
bigquery marketing-cloud salesforce sql
Last synced: 08 Nov 2024
https://github.com/manesioz/airflow-without-code
Dynamically generate DAGs to ingest SQL files into BigQuery with one line of "code"
airflow airflow-plugin bigquery python sql
Last synced: 09 Nov 2024
https://github.com/lambdamusic/dimschema
CLI to retrieve SQL schema information about the Dimensions on Google BigQuery dataset.
bigquery dimensions python scholarly-metadata
Last synced: 13 Nov 2024
https://github.com/greatwoman23/car_insurance_analysis
The Car Insurance Analysis project aims to provide a comprehensive examination of a car insurance portfolio using advanced data analytics tools. The analysis offers valuable insights into policy demographics, claims patterns, and financial metrics, helping stakeholders make informed decisions.
bigquery data data-science dataanalytics insurance-claims looker-studio tableau
Last synced: 12 Oct 2024
https://github.com/sintef/bigquery-postgresql-wire-proxy
A PostgreSQL wire protocol proxy server for BigQuery.
Last synced: 13 Nov 2024
https://github.com/ayresgneto/use-case-gcp-etl
ELT pipeline GCP. Tecnologias utilizadas: Postgresql, GCP Storage, Airflow (local), Pyspark (local), BigQuery
airflow big-data bigquery data data-engineering etl gcp pipeline postgresql programming-oriented-object pyspark python spark
Last synced: 12 Oct 2024
https://github.com/davidkhala/dwh-migration-tools
dwh-migration-tools: contribution fork
Last synced: 29 Sep 2024
https://github.com/vedantwalia/google-data-analytics-capstone-case-study
This is a repository of my work on data analysis as a part of the Google Data Analytics Capstone
bigquery data data-viz datavisualization-project divvy-bikes google googledataanalytics sql tableau tableau-public
Last synced: 12 Oct 2024
https://github.com/ngangawairimu/clv-rfm-and-customer-segmentation-analysis
This project performs cohort analysis to estimate Customer Lifetime Value (CLV) by analyzing weekly revenue and user registrations over 12 weeks, forecasting future revenue, and providing actionable insights for marketing and business strategy.
bigquery clv-analysis cohort-analysis customer-segmentation excel rfm-analysis
Last synced: 09 Nov 2024
https://github.com/rubnsbarbosa/nasa-asteroids-extractor
ETL asteroids data extractor using some Google Cloud services
bigquery bucket cloud-storage google-cloud nasa-api-neows
Last synced: 12 Oct 2024
https://github.com/francois-lenne/play-bq-gcp
Data pipeline in order to retrieve data from the playstation API to BigQuery
bigquery cicd data-engineering google-cloud python
Last synced: 14 Nov 2024
https://github.com/ackeecz/terraform-gcp-cloud-run_pubsub_to_bq
Cloud Run subscribes itself to given topic and inserts each message to BigQuery table.
Last synced: 10 Nov 2024
https://github.com/ackeecz/terraform-gcp-cloud-function_pubsub_to_bq
Cloud function subscribes itself to given topic and inserts each message to BigQuery table.
bigquery cloud-functions pubsub terraform-module
Last synced: 10 Nov 2024
https://github.com/george-nyamao/gcp_etl_project
An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.
airflow bigquery cloudcomposer data-fusion gcs-bucket looker python3 wrangler
Last synced: 12 Oct 2024
https://github.com/hrialan/dataform-prune
An open-source tool for automating the cleanup of outdated objects in Dataform configurations, optimizing data workflows with seamless CI/CD integration.
automation bigquery data-analytics dataform
Last synced: 12 Oct 2024
https://github.com/branb97/jobstreet-data-eng-project
Building a data pipeline to deliver job listing data from Jobstreet for analysis.
airflow bigquery data-engineering etl-pipeline google-cloud looker-studio python sql
Last synced: 13 Oct 2024