https://github.com/sylvioneto/gcp-cloud-composer
This project shows how to use Cloud Composer to orchestrate data dependencies and pipelines
https://github.com/sylvioneto/gcp-cloud-composer
cloud gcp google terraform
Last synced: about 1 year ago
JSON representation
This project shows how to use Cloud Composer to orchestrate data dependencies and pipelines
- Host: GitHub
- URL: https://github.com/sylvioneto/gcp-cloud-composer
- Owner: sylvioneto
- Created: 2022-08-18T21:32:45.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-11-25T15:40:36.000Z (over 3 years ago)
- Last Synced: 2025-02-09T07:15:58.554Z (over 1 year ago)
- Topics: cloud, gcp, google, terraform
- Language: PLpgSQL
- Homepage:
- Size: 619 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Analytics
## Description
This example demonstrates how to use Cloud Composer DAGS to:
- [Restore a Postgres backup](./dags/postgres_restore.py)
- [Extract data from Postgres to Cloud Storage (Data Lake)](./dags/postgres_to_datalake.py)
- [Load data from Cloud Storage (Data Lake) to BigQuery (Data Warehouse)](./dags/datalake_to_dw.py)
- [Transform data on BigQuery](./dags/bigquery_transform.py)
Resources created:
- VPC with firewall rules
- Cloud Composer v2
- Cloud SQL for Postgres
- Cloud Storage Buckets
- BigQuery datasets and tables
Check more operators available in [Airflow Google Operators doc](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/index.html).
## Deploy
1. Create a new project and select it
2. Open Cloud Shell and ensure the env var below is set, otherwise set it with `gcloud config set project` command
```
echo $GOOGLE_CLOUD_PROJECT
```
3. Create a bucket to store your project's Terraform state
```
gsutil mb gs://$GOOGLE_CLOUD_PROJECT-tf-state
```
4. Enable the necessary APIs
```
gcloud services enable compute.googleapis.com \
container.googleapis.com \
containerregistry.googleapis.com\
composer.googleapis.com \
bigquery.googleapis.com \
storage.googleapis.com \
cloudfunctions.googleapis.com \
pubsub.googleapis.com \
sqladmin.googleapis.com
```
5. Give permissions to Cloud Build for creating the resources
```
PROJECT_NUMBER=$(gcloud projects describe $GOOGLE_CLOUD_PROJECT --format='value(projectNumber)')
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:$PROJECT_NUMBER@cloudbuild.gserviceaccount.com --role=roles/editor
gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:$PROJECT_NUMBER@cloudbuild.gserviceaccount.com --role=roles/iam.securityAdmin
```
6. Clone this repo
```
git clone https://github.com/sylvioneto/gcp-cloud-composer.git
cd gcp-cloud-composer
```
7. Execute Terraform using Cloud Build
```
gcloud builds submit . --config cloudbuild.yaml
```
8. Go to [Cloud Composer](https://console.cloud.google.com/composer) and check out the dags
## Destroy
1. Execute Terraform using Cloud Build
```
gcloud builds submit . --config cloudbuild_destroy.yaml
```
## Development
Create a virtual environment and install the requirements
```
virtualenv ven
pip install -r requirements.txt
```