Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/george-nyamao/gcp_etl_project

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.
https://github.com/george-nyamao/gcp_etl_project

airflow bigquery cloudcomposer data-fusion gcs-bucket looker python3 wrangler

Last synced: about 1 month ago
JSON representation

An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.

Awesome Lists containing this project

README

        

# GCP_ETL_Project

First, we create a fake employee dataset with Python with the help of the Faker library.
We then upload the dataset to a Google Cloud Storage bucket using the same Python program.
We use Wrangler in Data Fusion to concatenate columns and mask Personal Identifiable Information (PII).
We then send the resulting table to BigQuery and create a report in Looker.

![Screenshot of the pipeline](./Screenshot.png)

Finally, we automate the workflow using Apache Airflow in Cloud Composer.