Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/george-nyamao/gcp_etl_project
An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.
https://github.com/george-nyamao/gcp_etl_project
airflow bigquery cloudcomposer data-fusion gcs-bucket looker python3 wrangler
Last synced: about 1 month ago
JSON representation
An ETL pipeline to move an uploaded flat file ffrom GCS, mask PII, store Big Query, and Create a report in Looker.
- Host: GitHub
- URL: https://github.com/george-nyamao/gcp_etl_project
- Owner: George-Nyamao
- Created: 2024-07-01T01:55:28.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-07-01T02:28:04.000Z (5 months ago)
- Last Synced: 2024-09-24T23:01:23.616Z (about 2 months ago)
- Topics: airflow, bigquery, cloudcomposer, data-fusion, gcs-bucket, looker, python3, wrangler
- Language: Python
- Homepage:
- Size: 66.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GCP_ETL_Project
First, we create a fake employee dataset with Python with the help of the Faker library.
We then upload the dataset to a Google Cloud Storage bucket using the same Python program.
We use Wrangler in Data Fusion to concatenate columns and mask Personal Identifiable Information (PII).
We then send the resulting table to BigQuery and create a report in Looker.![Screenshot of the pipeline](./Screenshot.png)
Finally, we automate the workflow using Apache Airflow in Cloud Composer.