https://github.com/shirlyngit/elt-pipeline-with-gcp-airflow-looker-studio
Scalable ELT pipeline on GCP using Airflow and BigQuery to ingest, validate, and transform 1M+ anonymized medical records and visualized in Looker Studio."
https://github.com/shirlyngit/elt-pipeline-with-gcp-airflow-looker-studio
airflow-dags bigquery elt-pipeline gcp looker-studio python
Last synced: about 2 months ago
JSON representation
Scalable ELT pipeline on GCP using Airflow and BigQuery to ingest, validate, and transform 1M+ anonymized medical records and visualized in Looker Studio."
- Host: GitHub
- URL: https://github.com/shirlyngit/elt-pipeline-with-gcp-airflow-looker-studio
- Owner: Shirlyngit
- Created: 2025-08-12T02:01:27.000Z (about 2 months ago)
- Default Branch: master
- Last Pushed: 2025-08-12T03:08:02.000Z (about 2 months ago)
- Last Synced: 2025-08-12T05:27:51.350Z (about 2 months ago)
- Topics: airflow-dags, bigquery, elt-pipeline, gcp, looker-studio, python
- Homepage:
- Size: 42.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ELT Data Pipeline with GCP and Airflow
This project demonstrates how to build an **ELT (Extract, Load, Transform)** data pipeline to process **1 Million Medical Records** using **Google Cloud Platform (GCP)** and **Apache Airflow**. The pipeline extracts data from Google Cloud Storage (GCS), loads it into BigQuery, and transforms it to create country-specific tables and views for analysis.
I used Looker studio to display this analysis.---
## Features
- Extract data from GCS in CSV format.
- Load raw data into a staging table in BigQuery.
- Transform data into country-specific tables and reporting views.
- Use Apache Airflow to orchestrate the pipeline.
- Generate clean and structured datasets for analysis.
- Displays the analysis on Looker Studio.
---## Architecture

### Workflow
1. **Extract**: Check for file existence in GCS.
2. **Load**: Load raw CSV data into a BigQuery staging table.
3. **Transform**:
- Create country-specific tables in the transform layer.
- Generate reporting views for each country with filtered insights.### Data Layers
1. **Staging Layer**: Raw data from the CSV file.
2. **Transform Layer**: Cleaned and transformed tables.
3. **Reporting Layer**: Views optimized for analysis and reporting.---
## Requirements
### Tools and Services
- **Google Cloud Platform (GCP)**:
- Google Compute Engine ( for Airflow )
- BigQuery
- Cloud Storage
- **Apache Airflow**:
- Airflow with Google Cloud providers
- **Looker Studio**:
- For data visualization of the medical analysis of the processed data.---
## Setup Instructions
### Prerequisites
1. A Google Cloud project with:
- BigQuery and Cloud Storage enabled.
- Service account with required permissions.
2. Apache Airflow installed.## End Result
### Airflow Pipeline

### Looker Studio Report
