https://github.com/jeremybeard/oozieloop
Loops in Oozie
https://github.com/jeremybeard/oozieloop
loop oozie workflow
Last synced: 6 months ago
JSON representation
Loops in Oozie
- Host: GitHub
- URL: https://github.com/jeremybeard/oozieloop
- Owner: jeremybeard
- Created: 2015-02-15T16:45:01.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-02-15T16:48:21.000Z (over 10 years ago)
- Last Synced: 2025-03-29T22:04:24.631Z (6 months ago)
- Topics: loop, oozie, workflow
- Language: Shell
- Size: 125 KB
- Stars: 10
- Watchers: 0
- Forks: 8
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This project helps you implement loops in Oozie by providing a workflow that will call your workflow for either each integer in a range you provide, or each entry in a list of values you provide. The execution of the workflows can be done in either serial order of the range or list, or in parallel.
The workflow that you call to run the loop is `loop.xml`. You can either run this directly or as a sub-workflow from another workflow that you may have.
Based on your desired type `loop.xml` will itself either run `loop_range_step.xml` recursively for each integer, or run `loop_list_step.xml` recursively for each list value, calling your workflow each time.
`loop.xml` requires these properties to process the loop:
- `loop_parallel` - Either "true" or "false". True will fork all of your workflows at once. False will wait for a workflow to finish before commencing the next.
- `loop_type` - Either “range” or “list”
- `loop_start` - The first integer to run in the loop (range type only)
- `loop_end` - The last integer to run in the loop (range type only)
- `loop_list` - The comma separated list of values to iterate over in the loop (list type only)
- `loop_action` - The HDFS path of the workflow XML that you need to loop over
- `loop_name` - A short name (without spaces) to distinguish the loop from othersWhen your workflow is called it will be passed the current value of the loop to the `loop_value` property, which you can use in your workflow. Note that due to limitations of Oozie, empty values in the list type will be provided instead as the value "--NOVALUE--".
The example folder in this project provides a demonstration that will create empty files in HDFS named for each value in the range or list that you provide. To run the example you will need to change the paths and hostnames to match your cluster.
By default this method can not loop over your workflow more than 48 times. This can be overridden at the Oozie server scope (beware!) by modifying `oozie.action.subworkflow.max.depth`.