Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
BigQuery
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference. Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
- GitHub: https://github.com/topics/bigquery
- Wikipedia: https://en.wikipedia.org/wiki/BigQuery/
- Repo: https://github.com/GoogleCloudPlatform/bigquery-utils/
- Released: May 19, 2010
- Related Topics: cloud-computing,
- Aliases: bq,
- Last updated: 2025-02-04 00:03:36 UTC
- JSON Representation
https://github.com/niteshchawla/nc-sql-business-case
A Leading Retail chain brand and a prominent retailer in the United States. It makes itself a preferred shopping destination by offering outstanding value, inspiration, innovation and an exceptional guest experience that no other retailer can deliver.
bigquery retail sql supermarket
Last synced: 21 Jan 2025
https://github.com/erik-ingwersen-ey/iowa_sales_forecast
Iowa Liquor Sales Forecast Model
arima bigquery bigquery-ml google-cloud sales-forecast
Last synced: 21 Nov 2024
https://github.com/hrialan/dataform-prune
An open-source tool for automating the cleanup of outdated objects in Dataform configurations, optimizing data workflows with seamless CI/CD integration.
automation bigquery data-analytics dataform
Last synced: 21 Nov 2024
https://github.com/sahilmb/employee-churn-da
A data analysis project on employee churn rate using Google Bigquery, Looker, Pycaret and Colab
bigquery looker-studio pycaret
Last synced: 21 Nov 2024
https://github.com/tomgorb/some-data-monitoring
fully functional DAG using Airflow 2 and minikube (locally) to help monitor GCP billing
airflow2 bigquery gcp minikube
Last synced: 21 Jan 2025
https://github.com/stoqey/rasputia
Rasputia Latimore - The Big Data Bitch 💋
Last synced: 19 Jan 2025
https://github.com/ket0825/v1-gcp-preview
Preview 서비스를 위한 GCP 레포 / Manage GCP src for preview services
bigquery cloud-functions cloud-run cloudbuild gcp logging pubsub
Last synced: 21 Nov 2024
https://github.com/ka-zo/booking-data-analysis
Booking data analysis
airline-booking apache-beam bigquery google-cloud looker-studio python3
Last synced: 21 Nov 2024
https://github.com/marielachirinosr/analysis-urgencias-hospital-pitalito
This project involves analyzing emergency room admission data from the E.S.E Hospital Departamental de Pitalito using a star schema model.
bigquery data data-analysis etl-pipeline tableau
Last synced: 21 Nov 2024
https://github.com/themihirmathur/uber-data-analytics
The goal of this project is to perform comprehensive data analytics on Uber trip data using a modern data engineering stack on Google Cloud Platform (GCP).
bigquery data-analysis data-engineering etl-pipeline google-cloud-platform looker python
Last synced: 21 Nov 2024
https://github.com/thanhloc81/sql-project-bicycles-practise
✨ Utilizing SQL to extract data following a simulated task involving the Sales and Product modules
adventureworks bicycle bigquery google-cloud sql
Last synced: 21 Jan 2025
https://github.com/sintef/bigquery-postgresql-wire-proxy
A PostgreSQL wire protocol proxy server for BigQuery.
Last synced: 12 Jan 2025
https://github.com/davidkhala/gcp-collections
Notebooks for GCP services
bigquery bq databricks datastore firestore google-cloud-platform
Last synced: 21 Nov 2024
https://github.com/kartikeya443/automated-data-pipeline-gcp
This project showcases the integration of various Google Cloud Platform services to build an efficient and automated data pipeline for sales data.
bigquery cloud data-engineering flask gcp google-cloud-platform looker-studio pipeline python sql
Last synced: 21 Jan 2025
https://github.com/abdelnaem2002/ecommerce-analysis-dbt
Ecommerce Analysis Using Dbt
bigquery dbt dbt-cloud github looker-studio sql
Last synced: 29 Jan 2025
https://github.com/vidyadnina/other-sql-projects-and-queries
Other SQL projects and queries.
Last synced: 21 Jan 2025
https://github.com/vigneshss-07/mastering-sql-and-bigquery-on-google-cloud-platform
Take your Data Analytics skills to the next level with this comprehensive playlist. Learn SQL from the basics to advanced techniques while mastering BigQuery on Google Cloud.
Last synced: 05 Jan 2025
https://github.com/khanovico/energy-data-analysis
This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.
big-data-processing bigquery docker google-app-engine jupyter-notebook mlflow python scikit-learn seaborn xgboost
Last synced: 10 Oct 2024
https://github.com/vikasgupta1812/google_cloudml_scripts
https://goo.gl/dFjFQf
bigquery google-cloud-ml python-language tensorflow-tutorial
Last synced: 20 Jan 2025
https://github.com/ivanildobarauna/ivanildobarauna
Special Repository to Make README
ai airflow big-data bigquery data-engineering gcp python
Last synced: 22 Jan 2025
https://github.com/simhayn/genomics-cannabis-bigquery
BigQuery's Cannabis_Genomics Dataset Exploration using SQL in a Python Environment
big-data bigquery bioinformatics exploratory-data-analysis genomics python sql
Last synced: 22 Jan 2025
https://github.com/juldrixx/bigquery-avro-schema-converter
Website to convert a schema from one format to another between BigQuery and Avro
avro avro-schema bigquery bigquery-schema converter schema
Last synced: 22 Jan 2025
https://github.com/branb97/jobstreet-data-eng-project
Building a data pipeline to deliver job listing data from Jobstreet for analysis.
airflow bigquery data-engineering etl-pipeline google-cloud looker-studio python sql
Last synced: 22 Jan 2025
https://github.com/karencofre/marketing-segmentacion-en-powerbi
Proyecto prueba de hipótesis en powerbi y python
bigquery google-colab powerbi python sql statsmodels
Last synced: 21 Jan 2025
https://github.com/lisabensoussan/bigdataminig_finalassignment
This repository contains solutions for the final assignment of the Big Data Mining course (52002/52019), focusing on querying large datasets with BigQuery, network analysis with Python, and distributed data processing with Apache Spark.
bigquery community-detection data-cleaning dataframe exploratory-data-analysis pagerank rdd sql text-analysis visualization
Last synced: 21 Jan 2025
https://github.com/edwinrlambert/cyclistic-bike-share-analysis
This repository is part of the Google Data Analytics Capstone Project, focusing on analyzing Cyclistic's bike-sharing data to identify trends and strategies for converting casual riders to annual members. It aims to provide actionable insights for enhancing marketing efforts.
act analyze ask bigquery prepare process share sql
Last synced: 21 Jan 2025
https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance
Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.
bigquery data-analysis data-cleaning data-visualization excel rstudio sql tableau tidyverse
Last synced: 21 Jan 2025
https://github.com/moeabbas6/dbt_analytics_engine
An end-to-end project using dbt to demonstrate data transformations, testing, and visualization with Google BigQuery, and Looker Studio. It showcases a complete data pipeline from extraction/generation to deployment.
analytics-engineering bigquery data data-pipeline data-transformation data-visualization dbt testing
Last synced: 21 Jan 2025
https://github.com/aazuspan/landsat-bigquery
Summarizing 51 years of Landsat data using Earth Engine and BigQuery
bigquery google-earth-engine landsat
Last synced: 21 Jan 2025
https://github.com/vedantwalia/google-data-analytics-capstone-case-study
This is a repository of my work on data analysis as a part of the Google Data Analytics Capstone
bigquery data data-viz datavisualization-project divvy-bikes google googledataanalytics sql tableau tableau-public
Last synced: 21 Jan 2025
https://github.com/aisurjyasamantaray/-optimizing-target-s-brazilian-operations-insights-from-order-processing-pricing-and-payment-trends-
This project offers an in-depth analysis of consumer behavior, logistical performance, and payment preferences within the e-commerce sector. By examining order costs, delivery times, and payment methods, businesses can uncover valuable insights into operational efficiency and customer preferences.
bigquery consumer-insights data-analysis database sql target
Last synced: 21 Jan 2025
https://github.com/moeabbas6/bq_data_loader
A Python script for executing and logging batch SQL commands in Google BigQuery. Includes tracking of execution times, unique job and statement IDs, and automated logging to a specified BigQuery table.
Last synced: 29 Jan 2025
https://github.com/phukon/package-insights
PyPI package reports and insights. The data was ingested from publicly available source using BigQuery and then transformed.
Last synced: 27 Jan 2025
https://github.com/flowerinthenight/bqstream
A simple library to help facilitate streaming to BigQuery.
Last synced: 08 Jan 2025
https://github.com/syou6162/mackerel-plugin-bigquery-query-result-importer
Mackerel plugin to post bigquery's query result
Last synced: 12 Oct 2024
https://github.com/kellyjadams/ap-exam-scores
Analyzing AP exam scores for a school.
Last synced: 08 Jan 2025
https://github.com/victorelexpe/bq-schema-sync
bigquery gcp google-cloud python schema sync
Last synced: 12 Oct 2024
https://github.com/hckhanh/pg2bigquery
A CLI tool to convert query from PostgreSQL to BigQuery
big bigquery google pg pgsql postgres postgres-tool postgresql postgresql-database postgressql query query-parser querybuilder sql sql-toolkit sql-tools tool toolbox toolkit utility
Last synced: 29 Jan 2025
https://github.com/marceloneppel/gcs-to-bigquery
WIP: Moving data from GCS to BigQuery.
Last synced: 30 Jan 2025
https://github.com/smohanta23/uber_data-engineering_etl-project
This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.
big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api
Last synced: 21 Jan 2025
https://github.com/knands42/data-ingestion
Data Ingestion project to evaluate my Kotlin skill using concurrency
bigquery golang google-cloud-platform google-storage gradle-kotlin-dsl kotlin kotlin-flow
Last synced: 25 Jan 2025
https://github.com/rafal-kowalski-dev/selling-cars-analize
Hobby project for learning PySpark, AirFlow and BigQuery
airflow bigquery gcp pyspark python sqlalchemy
Last synced: 30 Jan 2025
https://github.com/seahrh/nyc-taxi-trips
REST API for the New York City Taxi Trips public dataset, implemented in Scala and Play Framework 2.7
bigquery nyc-taxi-dataset play-framework rest-api scala
Last synced: 03 Feb 2025
https://github.com/quipper/send-ci-result-to-bigquery-action
Send test results to BigQuery in GitHub Actions
bigquery github-actions google-bigquery junit-xml
Last synced: 09 Jan 2025
https://github.com/yeha98555/google-maps-analysis-pipeline
Taiwan Travel Attractions Analysis Data Pipeline
airflow bigquery cloudfunctions docker gcp gcs googlemaps googlesheets python terraform
Last synced: 23 Jan 2025
https://github.com/sergeimakarovv/wine-recommendation-analytics
Wine recommendation system
airflow bigquery pandas postgresql tableau
Last synced: 08 Jan 2025
https://github.com/spacepatcher/google-workspace-gmail-collector
👁 App for collecting Gmail logs from your Google Workspace account and sending them to Kafka
bigquery gmail google-workspace security soc
Last synced: 23 Oct 2024
https://github.com/goatcheesesaladwithpeanutoildressing/scio-demo
Playing w/ Scio
Last synced: 08 Jan 2025
https://github.com/humairarizwan/uber-ride-dataengineering-analysis
This project creates a pipeline to process data and performs data analytics on Uber data.
bigquery dataanalysis dataengineering gcp-project googlestorage looker-studio
Last synced: 21 Jan 2025
https://github.com/oguzgn/data-science-for-business-imp
a case study for business improvment
ab-testing bigquery data-science data-visualization debugging looker marketing-analytics sheets
Last synced: 21 Jan 2025
https://github.com/eddieatgoogle/sql-based-genai-data-pipeline
GenAI data pipeline that performs data preparation, management and performance evaluation tasks for RAG systems using SQL as the primary development language. Please feel free to use this as a starting point for your own projects.
bigquery bqml dataform embeddings gemini google-cloud-platform sql vector-search vertex-ai
Last synced: 08 Jan 2025
https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt
Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.
apache-airflow bigquery csv dbt python3 soda
Last synced: 28 Jan 2025
https://github.com/sayed-ashfaq/target-sql
In this project, I analyzed Target company's data using SQL in BigQuery, focusing on data extraction, manipulation, and performing various analytical queries to derive insights.
aggregation bigquery cte joins sql
Last synced: 23 Dec 2024
https://github.com/ackeecz/terraform-gcp-cloud-function_pubsub_to_bq
Cloud function subscribes itself to given topic and inserts each message to BigQuery table.
bigquery cloud-functions pubsub terraform-module
Last synced: 07 Jan 2025
https://github.com/ackeecz/terraform-gcp-cloud-run_pubsub_to_bq
Cloud Run subscribes itself to given topic and inserts each message to BigQuery table.
Last synced: 07 Jan 2025
https://github.com/allanreda/share-of-search-retrieval-and-visualization
Share of search analysis including data retrieval from Google Ads API, storing data in BigQuery and visualizing it in Looker Studio
bigquery google-ads-api looker-studio python share-of-search
Last synced: 28 Dec 2024
https://github.com/patriciavalentine/loan-data-queries
In this project, I analyzed a vehicle loan dataset using BigQuery to identify demographic, financial, and loan patterns. Through SQL queries, I extracted insights such as the credit scores, and loan distribution by region, and explored high-risk profiles. The findings are visualized in Looker Studio, thus helping to inform strategic decisions.
asset-finance bigquery loan-data looker-studio
Last synced: 09 Dec 2024
https://github.com/janmin123/cyclistic
Capstone project for Google/Coursera Data Analytics Course
analysis bigquery sql tableau visualization
Last synced: 09 Dec 2024
https://github.com/akansharajput280799/strategic-analysis-of-retail-brand-in-south-america-using-sql
Leveraged Big Query and MySQL to analyze 100K records for sales optimization, trend identification, and enhancing customer satisfaction for a retail brand in South America and to provide insights and recommendations to improve their userbase and improve their services
bigquery data-analysis data-science database database-schema google-bigquery mysql-server sql
Last synced: 09 Dec 2024
https://github.com/janaom/gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml
Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.
airflow bigquery bigqueryml cloud-functions cloud-run-functions composer data-engineering-project google-cloud-platform
Last synced: 26 Jan 2025
https://github.com/shegzimus/de_nasa_neow_pipeline
Airflow powered ETL pipeline for moving Near-Earth-Object data from NASA to Google Cloud
airflow-dag airflow-operator airflow-providers bigquery celery-redis docker docker-compose docker-container google-cloud-platform googlecloudstorage nasa-api
Last synced: 27 Jan 2025
https://github.com/machinelearningzuu/data-engineering-projects
This repository is a curated collection of projects and tools that exemplify best practices in data engineering. It serves as a resource for data professionals seeking to enhance their data infrastructure, optimize data pipelines, and implement cutting-edge data processing techniques.
airflow bigquery data-engineering data-science data-visualization data-warehouse
Last synced: 10 Dec 2024
https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform
This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.
bigquery data data-analysis data-modeling live-streaming sql
Last synced: 27 Jan 2025
https://github.com/oliveroneill/wilt-cloud-functions
Wilt Google Cloud Functions
bigquery google-cloud-functions
Last synced: 07 Jan 2025
https://github.com/ankita-selokar/fitbit-for-her-crafting-fitbit-s-strategy-for-women
This project analyzes smart device usage data to uncover trends and insights, guiding Fitbit by Google’s product and marketing strategies for their new women-focused product launch. It combines competitive market analysis with customer behavior insights to inform key decisions.
bigquery excel powerbi spreadsheet sql
Last synced: 10 Dec 2024
https://github.com/ivanildobarauna-dev/pypi-package-stats
Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more
bigquery cloud data-engineering data-warehouse gcp software-engineering
Last synced: 11 Dec 2024
https://github.com/pawel045/big-tech-stocks
ETL project
big-data bigquery dataengineering etl
Last synced: 11 Dec 2024
https://github.com/samanthalang/samanthalang_portfolio
Une data analyste avec la vision d'une consommatrice et la stratégie d'une marketeuse.
bigquery excel figma mysql notebook numpy pandas postgresql powerbi powerquery python sql sqlite wordpress
Last synced: 11 Dec 2024
https://github.com/jasontanx/ridership-headline-project
This end to end data engineering / data analytics project will be about the Malaysian public transport ridership data.
bigquery data-engineering minio-server public-transport-ridership terraform
Last synced: 01 Feb 2025
https://github.com/sangnandar/insert-unique-record
This is Cloud Functions script to insert only unique records into BigQuery.
bigquery digital-marketing-analytics google-cloud-functions
Last synced: 29 Dec 2024
https://github.com/mchmarny/stocker
Using tweeter sentiment and stock market price signal correlation to predict next day closing price
bigquery ml prediction regression-models
Last synced: 31 Dec 2024
https://github.com/ansh-info/stockpulse
Real-time stock market analytics pipeline with live visualization dashboard. Built with Python and GCP, featuring automated data processing and interactive Streamlit analytics.
api big-data bigquery cloud cloud-computing cloud-native data-engineering data-pipeline docker docker-compose gcp gcp-automation-gitops gcp-cloud-run gcp-pubsub google-cloud-platform real-time realtime stock-market stocks streamlit
Last synced: 27 Dec 2024
https://github.com/davelester/gharchive-bigquery-examples
Examples Using BigQuery to Analyze GH Archive Data
Last synced: 01 Feb 2025
https://github.com/ivanildobarauna/pypi-package-stats
Project for ingest pypi packages data from BigQuery and send to DataDog for analysis and insights with dashboards, monitors and more
bigquery cloud data-engineering data-warehouse gcp software-engineering
Last synced: 29 Dec 2024
https://github.com/iht/bigquery-dataflow-cdc-example
A Dataflow streaming pipeline written in Java, reading data from Pubsub and recovering the sessions from potentially unordered data, and upserting the session data into BigQuery with no duplicates
apache-beam bigquery cdc dataflow google-cloud pubsub
Last synced: 29 Dec 2024
https://github.com/marielachirinosr/nyc-taxi-trip-exploration-2019-2020
Explores passenger behavior & impact of COVID-19 on NYC taxi industry (Q1 2019-2020).
bigquery data data-analysis data-visualization python sql tableau
Last synced: 29 Dec 2024
https://github.com/yandex-cloud-examples/yc-bigquery-to-object-storage
Экспорт данных из Google Big Query через Google Storage в Object Storage Yandex Cloud.
bigquery object-storage python3 yandex-cloud yandexcloud
Last synced: 29 Dec 2024
https://github.com/davidkhala/dwh-migration-tools
dwh-migration-tools: contribution fork
Last synced: 23 Jan 2025
https://github.com/djdhairya/uber-data-analytics
Mage Vm
aiml api bigdata bigquery deep-learning docker google-maps-api ml python3 sql ssh vmware
Last synced: 07 Jan 2025
https://github.com/isaacmg/mimic_iv_bq_queries
Queries needed to recreate time series features for model training
Last synced: 21 Jan 2025
https://github.com/rifa8/extract-load-demo
Learning Google Cloud Platform (GCP)
Last synced: 27 Jan 2025
https://github.com/rifa8/data-warehouse-submission
Learning about Data Warehouse
bigquery citus columnar data-warehouse datalake gcs-bucket
Last synced: 27 Jan 2025
https://github.com/owox/sgtm-owox-ga4-bigquery
OWOX BI Streaming is an advanced tracking to get the most from existing Google Analytics 4 installed on your website
Last synced: 20 Dec 2024
https://github.com/valenthr/purchase_funnel
Google merch store sales analysis
Last synced: 27 Jan 2025
https://github.com/chiamakaukwuoma/portfolio
This repository contains various projects I've been privileged to work on outside of work.
aws-rds azure-fabric bigquery data-analysis docker-container elasticsearch excel grafana hadoop looker-studio mssql mysql postgresql powerbi python sql tableau
Last synced: 03 Feb 2025
https://github.com/giorgishengelia/bike-share-analysis-report
Help developing marketing strategy using data analytics to help convert casual riders into members
Last synced: 12 Dec 2024
https://github.com/xxmadkillerx10/data-engineering-zoomcamp
The Data Engineering Zoomcamp covers essential skills in containerization, workflow orchestration, data warehousing, analytics engineering, batch, and streaming processing. It includes tools like Docker, Terraform, BigQuery, dbt, Spark, Kafka, Kestra, Postgres, Google Data Studio, and Metabase.
airflow bigquery data-visualization dbt dbt-clickhouse docker-compose etl gcs google-cloud kafka postgresql spark sql streaming
Last synced: 03 Feb 2025
https://github.com/francois-lenne/play-bq-gcp
Data pipeline in order to retrieve data from the playstation API to BigQuery
bigquery cicd data-engineering google-cloud python
Last synced: 13 Jan 2025
https://github.com/kevin-rsj/real-estate-investments
Sistema de scoring que clasifica ciudades francesas para inversión en segundas viviendas según perfil de riesgo(alto, moderado y bajo). Evalúa ratios clave en áreas como demanda, disponibilidad, infraestructura, demografía y precios.
bigquery data-analytics looker-studio numpy pandas python sklearn-library sql visualization
Last synced: 17 Dec 2024
https://github.com/tupizz/fiap_pnad-covid-19
Este projeto realiza a análise e transformação de dados da PNAD COVID-19 de maio a julho de 2020, utilizando PySpark para processamento de dados em larga escala e BigQuery como destino para armazenamento e análise posterior. O objetivo é consolidar os dados mensais em um único conjunto de dados transformado.
analysis bigquery pyspark python
Last synced: 17 Dec 2024
https://github.com/hariprasath-v/mh_google_cloud_bigquery_ltv_prediction_challenge
Build a model that can predict customers' Long Term Value (LTV).
bigquery colab-notebook klib machine-learning matplotlib numpy pandas python python3 seaborn
Last synced: 13 Jan 2025
https://github.com/antbit96/dataform_poc
Template for basic data preparation
bigquery bigquery-dataform data-preparation
Last synced: 14 Dec 2024
https://github.com/neo4j-field/dataflow-flex-pyarrow-to-gds
Google Dataflow Flex Templates (in Python) for large scale Graph Loading with GDS and Apache Arrow
apache-arrow apache-beam bigquery dataflow neo4j python
Last synced: 23 Dec 2024
https://github.com/lupusruber/music_analytics
This project processes real-time music event data using Kafka, Apache Spark on Google Cloud Dataproc, and stores the transformed data in BigQuery for analytics, all orchestrated by Airflow and managed with Terraform.
bigquery data-proc dimensional-modeling gcp-project kafka spark-structured-streaming
Last synced: 02 Feb 2025