https://github.com/garystafford/dataproc-workflow-templates
Demonstration of Google Cloud Dataproc Workflow Templates
https://github.com/garystafford/dataproc-workflow-templates
dataproc gcp google-cloud-platform hadoop pyspark spark
Last synced: 3 months ago
JSON representation
Demonstration of Google Cloud Dataproc Workflow Templates
- Host: GitHub
- URL: https://github.com/garystafford/dataproc-workflow-templates
- Owner: garystafford
- License: mit
- Created: 2018-12-14T21:54:32.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-12-17T03:01:54.000Z (over 7 years ago)
- Last Synced: 2023-08-05T02:22:51.475Z (almost 3 years ago)
- Topics: dataproc, gcp, google-cloud-platform, hadoop, pyspark, spark
- Homepage:
- Size: 14.6 KB
- Stars: 5
- Watchers: 3
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Google Cloud Dataproc WorkflowTemplates API Demo
Code repository for post, [Using the Google Cloud Dataproc WorkflowTemplates API to Automate Spark and Hadoop Workloads on GCP](https://programmaticponderings.com/).
## Files
* `template-demo-2.yaml`: Non-parametrized version of workflow template with three jobs, using a managed 3-node Spark cluster
* `template-demo-3.yaml`: Parametrized version of workflow template with one Python-based PySpark job, using a managed 3-node Spark cluster
* `template-demo-4.yaml`: Parametrized version of workflow template with one Python-based PySpark job, using an existing 3-node Spark cluster
* `template-demo-5.yaml`: Parametrized version of workflow template with one Java-based Spark job, using an existing 3-node Spark cluster