Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/d-kleine/tracking_reporting_dataflow
Data workflow with Google Cloud, dbt, and Looker Studio
https://github.com/d-kleine/tracking_reporting_dataflow
dataflow dbt gcp looker
Last synced: 17 days ago
JSON representation
Data workflow with Google Cloud, dbt, and Looker Studio
- Host: GitHub
- URL: https://github.com/d-kleine/tracking_reporting_dataflow
- Owner: d-kleine
- Created: 2023-08-24T06:50:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-30T11:00:58.000Z (over 1 year ago)
- Last Synced: 2024-11-23T14:12:39.790Z (30 days ago)
- Topics: dataflow, dbt, gcp, looker
- Homepage:
- Size: 667 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tracking & Reporting
This README provides an overview of the tasks related to building a data flow for tracking and reporting using historical weather data. The objective is to demonstrate the technical ability to set up the data flow using various tools and technologies.
## Project Summary
This task involves building a representative dataflow to demonstrate technical proficiency in data processing. A fictional and nonsensical weather report has been selected as the dataset for this purpose. The goal is to showcase the ability to construct a dataflow.
## Task Steps
* Export historical weather data (last 7 days) for the cities of Hamburg, Berlin, and Munich using a Google Cloud Function from a weather API. Save this data in Google BigQuery.
* Utilize dbt (https://www.getdbt.com/) to create a reporting view containing the following data:
* Cumulative average temperature for the cities of Berlin, Hamburg, and Munich.
* Cumulative average precipitation amount for the cities of Berlin, Hamburg, and Munich.
* Devise a fantasy score for the weather in each of the 3 cities (non-cumulative) and calculate it.
* Develop a dashboard in Google Looker Studio that presents the values over time for weather fantasy score, temperature, and precipitation amount.Through these steps, the task aims to demonstrate the construction of a dataflow, integrating data collection, transformation, and visualization using tools like Google Cloud, dbt, and Looker Studio.
## Task Workflow
**Task 1: Setting Up BigQuery Database and Table**
Before exporting historical weather data from the weather API, you need to set up the required BigQuery database and table to store the data.
Steps:
1. *Create a BigQuery Project*:
* If you don't already have one, create a project in Google Cloud Console and enable the BigQuery API.
2. *Create a Dataset*:
* Inside your project, create a new dataset to organize your weather data.
3. *Design the Table Schema*:
* Define the schema of the table that will hold the historical weather data. This schema should match the structure of the data you'll retrieve from the weather API. For example:
```
city (STRING)
date (DATE)
temperature (FLOAT)
precipitation (FLOAT)
... (other relevant fields)
```
4. Create the Table:
* Using the defined schema, create a new table within your dataset to store the historical weather data.![GCP Table Schema](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/gcp_schema.png)
**Task 2: Export of Historical Weather Data**
With the BigQuery database and table set up, you can now proceed to export historical weather data from the weather API and store it in BigQuery using a Google Cloud Function.
Steps:
1. *Set up a Google Cloud Function*:
* Create a new Google Cloud Function project.
* Configure the necessary permissions and authentication for accessing the weather API.
2. *Fetch and Transform Data*:
* Write a function to fetch historical weather data for the specified cities from the weather API.
* Transform and format the data to match the schema of the BigQuery table you've created.
3. *Insert Data into BigQuery*:
* Utilize the BigQuery API to insert the transformed weather data into the appropriate table within your dataset.![GCP Function](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/gcp_function.png)
![GCP BigQuery](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/gcp_table_preview.png)**Task 3: Create Reporting-View with dbt**
In this task, you will use *dbt* (data build tool) to create a reporting view containing aggregated data.
Steps:
1. *Set Up dbt Cloud IDE*:
* Access dbt Cloud IDE and create a new project.
2. *Define Models and Transformations*:
* Create dbt models to aggregate and calculate the required metrics using the data in your BigQuery table.
3. *Execute dbt Run*:
* Run the dbt transformation process to build the reporting view.![dbt yml](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/dbt_yml.png)
![dbt sql](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/dbt_sql.png)
![dbt GCP export](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/gcp_dbt_aggtable.png)**Task 4: Create Dashboard in Google Looker Studio**
In this task, you will create a dashboard using Google Looker Studio to visualize the weather data metrics over time.
Steps:1. *Set Up Looker Studio*:
* Access Google Looker Studio and create a new dashboard project.
2. *Create Visualizations*:
* Design visualizations to display the metrics calculated by dbt over time.
3. *Configure Data Sources*:
* Connect Looker Studio to Google BigQuery and ensure access to the dbt-created reporting view.
4. Build Dashboard:
* Assemble the visualizations into a coherent dashboard layout and configure any necessary filters and date ranges for interactive exploration.![Google Looker Vis](https://github.com/d-kleine/gcp_dbt_challenge/blob/main/looker_vis.png)