https://github.com/danieldacosta/etl-spark-parallel-stepfunctions
Execute EMR Jobs in parallel
https://github.com/danieldacosta/etl-spark-parallel-stepfunctions
emr spark step-functions
Last synced: 4 days ago
JSON representation
Execute EMR Jobs in parallel
- Host: GitHub
- URL: https://github.com/danieldacosta/etl-spark-parallel-stepfunctions
- Owner: DanielDaCosta
- Created: 2022-03-29T18:05:18.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-29T18:24:46.000Z (about 4 years ago)
- Last Synced: 2025-02-28T11:30:44.839Z (over 1 year ago)
- Topics: emr, spark, step-functions
- Homepage:
- Size: 67.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# etl-spark-parallel-stepfunctions
Execute EMR Jobs in parallel
# Architecture

# Input
## Create/Terminate Cluster and run all steps
```json
{
"CreateCluster": true,
"TerminateCluster": true
}
```
OR
```json
{
"CreateCluster": true,
"TerminateCluster": true,
"Steps_Staging": {
"StepOne": true,
"StepTwo": true
}
}
```
## Create/Terminate Cluster and run only step ONE
```json
{
"CreateCluster": true,
"TerminateCluster": true,
"Steps_Staging": {
"StepOne": true,
"StepTwo": false
}
}
```
## Use current cluster and execute all steps
```json
{
"CreateCluster": false,
"TerminateCluster": false,
"ClusterId": "YOUR CLUSTER ID"
}
```