{"id":22116900,"url":"https://github.com/lixx21/airflow-dbt-gcp","last_synced_at":"2025-04-10T04:34:18.938Z","repository":{"id":265819005,"uuid":"896503526","full_name":"lixx21/airflow-dbt-gcp","owner":"lixx21","description":"A comprehensive data pipeline leveraging Airflow, DBT, Google Cloud Platform (GCP), and Docker to extract, transform, and load data seamlessly from a staging layer to a data warehouse and data mart.","archived":false,"fork":false,"pushed_at":"2024-12-01T03:40:27.000Z","size":263,"stargazers_count":6,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-29T11:33:38.704Z","etag":null,"topics":["airflow","bigquery","data-engineer","dbt","gcp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lixx21.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-30T14:38:22.000Z","updated_at":"2024-12-29T16:09:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"cc40b726-1633-49f7-946c-e4105114ac7b","html_url":"https://github.com/lixx21/airflow-dbt-gcp","commit_stats":null,"previous_names":["lixx21/airflow-dbt-gcp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fairflow-dbt-gcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fairflow-dbt-gcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fairflow-dbt-gcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lixx21%2Fairflow-dbt-gcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lixx21","download_url":"https://codeload.github.com/lixx21/airflow-dbt-gcp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245217789,"owners_count":20579297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","bigquery","data-engineer","dbt","gcp"],"created_at":"2024-12-01T13:19:15.725Z","updated_at":"2025-03-24T05:44:38.053Z","avatar_url":"https://github.com/lixx21.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Warehouse Project: Airflow + DBT + GCP\n\nA comprehensive data pipeline leveraging Airflow, DBT, Google Cloud Platform (GCP), and Docker to extract, transform, and load data seamlessly from a staging layer to a data warehouse and data mart.\n\n## Project Structure\n\n![airflow-dbt](./images/airflow-dbt.png)\n\n![airflow-graph](./images/airflow-graph.png)\n\n## 🛠️ Project Overview\nThis project demonstrates an end-to-end implementation of a modern data stack:\n\n1. Airflow: Orchestrates the data pipeline with DAGs.\n2. DBT (Data Build Tool): Handles the transformation of data from the staging layer to the data warehouse and data mart.\n3. GCP: Serves as the cloud platform for data storage and warehouse management.\n4. Docker: Ensures all tools and dependencies are containerized for consistent development and deployment.\n\n## ✨ Features\n\n1. Automated Orchestration: Airflow DAGs schedule and automate tasks for data extraction, transformation, and loading.\n\n2. Data Transformation with DBT:\n\nStaging Layer: Raw data is cleaned and standardized.\n\nData Warehouse: Normalized data structure.\n\nData Mart: Denormalized data for easy reporting and analytics.\n\n3. Cloud Integration: Leverages GCP's scalable infrastructure for efficient storage and querying.\n\n4. Dockerized Environment: Simplifies setup and deployment across any environment.\n\n## Setup\n\n1. clone repo with ` git clone https://github.com/lixx21/airflow-dbt-gcp.git`\n2. Setup your Google Cloud Platform\n3. Create project in GCP and bucket in GCS and make sure your bucket location is in **US**\n4. Get your credential key from GCP IAM (I would suggest you to store it inside [dags](./dags/) folder)\n5. Fill the [.env](./dags/.env) file with your environment:\n\n```\n#.env\n\nBUCKET_NAME = \nCREDENTIAL_KEY = \nGCP_CONN_ID= \nPROJECT_ID= \n```\n6. Fill the [profiles.yml](./dags/dbt_transform/profiles.yml) with your credential's keyfile location in your local and your Project ID\n```\ndbt_transform:\n  outputs:\n    dev:\n      dataset: shopping_data\n      job_execution_timeout_seconds: 300\n      job_retries: 1\n      keyfile: {your keyfile location}\n      location: US\n      method: service-account\n      priority: interactive\n      project: {GCP project id}\n      threads: 1\n      type: bigquery\n  target: dev\n```\n7. Create dataset named `shopping_data` in BigQuery and make sure your dataset in **US** location (because DBT only support US for now)\n8. run the project using docker:\n\n```\ndocker-compose up --build -d\n```\n\n## Reference\n\nSetup DBT BigQuery\n\nhttps://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup\n\nhttps://medium.com/@perkasaid.rio/easiest-way-installing-dbt-for-bigquery-54d1c05f6dfe","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flixx21%2Fairflow-dbt-gcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flixx21%2Fairflow-dbt-gcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flixx21%2Fairflow-dbt-gcp/lists"}