https://github.com/a5chin/event-driven-dataflow
This Terraform module sets up a workflow where files stored in Cloud Storage trigger events in Eventarc, which then processes the files and stores the data in Spanner.
https://github.com/a5chin/event-driven-dataflow
cloudfunctions cloudstorage dataflow eventarc pubsub python3 spanner terraform
Last synced: about 1 month ago
JSON representation
This Terraform module sets up a workflow where files stored in Cloud Storage trigger events in Eventarc, which then processes the files and stores the data in Spanner.
- Host: GitHub
- URL: https://github.com/a5chin/event-driven-dataflow
- Owner: a5chin
- Created: 2024-03-24T04:23:10.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-27T14:46:25.000Z (about 2 years ago)
- Last Synced: 2024-03-28T15:01:46.739Z (about 2 years ago)
- Topics: cloudfunctions, cloudstorage, dataflow, eventarc, pubsub, python3, spanner, terraform
- Language: HCL
- Homepage:
- Size: 123 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Terraform Module: Event-Driven batch Dataflow that imports data from Cloud Storage (GCS) to Cloud Spanner
## What is this?
This Terraform module sets up a workflow where files stored in Cloud Storage trigger events in Eventarc, which then processes the files and stores the data in Spanner.
## Architecture

1. File is stored in Cloud Storage
2. Receive event from Eventarc
3. If the received file is `spanner-export.json`
4. Get Cloud API access token and
5. POST to Dataflow creation API
6. Read the `.avro` file in the same hierarchy as the `spanner-export.json` file and store it in Spanner
## `spanner-export.json`
```json
{
"tables": [
{
"name": "TableName",
"dataFiles": [
"TableName-000000000000.avro",
"TableName-000000000001.avro",
"TableName-[0-9]{12}.avro"
]
}
],
"dialect":"GOOGLE_STANDARD_SQL"
}
```