Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt
Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.
https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt
apache-airflow bigquery csv dbt python3 soda
Last synced: 3 days ago
JSON representation
Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the DWH and run SODA to check Data Quality.
- Host: GitHub
- URL: https://github.com/riju18/airflow-data-engineering-with-bigquery-and-dbt
- Owner: riju18
- Created: 2024-01-15T14:59:51.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-01-18T15:27:35.000Z (about 1 year ago)
- Last Synced: 2024-11-30T00:10:42.745Z (2 months ago)
- Topics: apache-airflow, bigquery, csv, dbt, python3, soda
- Language: Python
- Homepage:
- Size: 708 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Airflow-data-engineering-with-BigQuery-and-dbt
Fetch Data from a simple csv file, send the data in GCP BigQuery table, run dbt to automate the dwh and run soda to check data quality.![alt retail_data_dag](dags/retail_data/screenshots/retail_data_projct.png)
# Get Started
+ create python venv
+ enable it
+ install dependencies from ```requirements.txt``` file
+ configure airflow in airflow.cfg
+ create as GCP service account and add a key. also, download the key in json format.# Airflow Webserver
+ In ```Variable``` section add the following three varibales
- gcp_project
- gcp_bigquery_retail_dataset
- gcp_account : downloaded json file path
+ In ```Connection``` section add a new ```GCP``` connection
- connection name: my_gcp_conn
- value: downloaded ```service account json file content```# Configure dbt and soda in airflow
+ at the bottom of the ```dags/data_retail_project.py```, modify the ```bash command``` with dbt and soda project dir,dbt and soda env from where the ```dbt and soda``` will run.
# Run
+ airflow webserver
+ airflow scheduler