Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://gitlab.com/RedFred7/Jongleur
Launches, schedules and manages multiple processes represented in a DAG.
https://gitlab.com/RedFred7/Jongleur
Last synced: 2 months ago
JSON representation
Launches, schedules and manages multiple processes represented in a DAG.
- Host: gitlab.com
- URL: https://gitlab.com/RedFred7/Jongleur
- Owner: RedFred7
- License: mit
- Created: 2018-07-17T15:38:47.026Z (over 6 years ago)
- Default Branch: master
- Last Synced: 2024-10-11T17:44:32.158Z (3 months ago)
- Stars: 24
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
- data-science-with-ruby - jongleur
README
# Jongleur [![Gem Version](https://badge.fury.io/rb/jongleur.svg)](https://badge.fury.io/rb/jongleur)
Jongleur is a process scheduler and manager. It allows its users to declare a number of executable tasks as Ruby classes, define precedence between those tasks and run each task as a separate process.
Jongleur is particularly useful for implementing workflows modelled as a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)
(Directed Acyclic Graph), but can be also used to run multiple tasks in parallel or even sequential workflows where each task needs to run as a separate OS process.## Environment
This gem has been built using the [POSIX/UNIX process model](https://support.sas.com/documentation/onlinedoc/sasc/doc750/html/lr2/zid-6574.htm).
It will work on Linux and Mac OS but not on Windows.Jongleur has been tested with MRuby 2.4.3, 2.4.4, 2.5.0 and 2.5.1. I would also expect it to work with other Ruby implementations too, though it hasn't yet been tested on those.
## Depndencies
This gem depends on the [Graphviz](https://www.graphviz.org/) package for drawing graphs. If this isn't already installed on your system please install with
`$ sudo apt-get install graphviz` (Linux)
or
`$ brew cask install graphviz`(Mac OS)
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'jongleur'
```And then execute:
$ bundle
Or install it yourself as:
$ gem install jongleur
In either case, call `require jongleur` before using the gem.
## What does it do?
In a nutshell, Jongleur keeps track of a number of tasks and executes them as separate OS processes according to their precedence criteria. For instance, if there are 3 tasks A, B and C, and task C depends on A and B, Jongleur will start executing A and B in separate processes (i.e. in parallel) and will wait until they are both finished before it executes C in a separate process.
Jongleur is ideal for running workflows represented as DAGs, but is also useful for simply running tasks in parallel or for whenever you need some multi-processing capability.
## Concepts
### Directed Acyclic Graph (DAG)
A graph that is directed and without cycles connecting the other edges. DAGs are very useful for representing different kinds of information models, such as task scheduling and business process workflows.### Task Graph
To run Jongleur, you will need to define the tasks to run and their precedence. A _Task Graph_ is a
representation of the tasks to be run by Jongleur and it usually (but not exclusively) represents a DAG, as in the example below:![DAG example](https://upload.wikimedia.org/wikipedia/commons/6/61/Polytree.svg)
A _Task Graph_ is defined as a Hash in the following format:
`{task-name => list[names-of-dependent-tasks]}`
So the graph above can be defined as:
```
my_graph = {
A: [:C, :D],
B: [:D, :E],
D: [:F, :G],
E: [],
C: [],
G: [:I],
H: [:I],
F: [],
I: []
}```
where they Hash key is the class name of a Task and the Hash value is an Array of other Tasks that can be
run only after this Task is finished. So in the above example:* Tasks C and D can only start after task A has finished.
* Task I can only start after G and H have finished.
* Tasks C, E, F and I have no dependants. No other tasks need wait for them.__N.B:__ Since the _Task Graph_ is a Hash, any duplicate key entries will be overridden. For instance:
```
my_task_graph = { A: [:B, :C], B: [:D] }
```
is re-defined as```
my_task_graph = { A: [:B], A: [:C], B: [:D] }
```
The 2nd assignment of `A` will override the first one so your graph will be:`{:A=>[:C], :B=>[:D]}`
Always assign all dependent tasks together in a single list.
### Task Matrix
It's a tabular real-time representation of the state of task execution. It can be invoked at any time with
```
Jongleur::API.task_matrix
```After defining your Task Graph and before running Jongleur, your _Task Matrix_ should look like this:
```
#,
#,
#,
#,
#```
After Jongleur finishes, your _Task Matrix_ will look something like this:```
#
#
#
#
#
```The `Jongleur::Task` attribute values are as follows
* name : the Task name
* pid : the Task process id (`nil` if the task hasn't yet ran)
* running : `true` if task is currently running
* exit_status : usually 0 if process finished without errors, <>0 or `nil` otherwise
* finish_time : the Task's completion timestamp as a flating point number of seconds since Epoch
* success_status : `true` if process finished successfully, `false` if it didn't or `nil` if process didn't exit at all### WorkerTask
This is the implementation template for a Task. For each Task in your Task Graph you must provide a class that derives from `WorkerTask` and implements the `execute` method. This method is what will be called by Jongleur when the Task is ready to run.
## Usage
Using Jongleur is easy:
1. (Optional) Add `include jongleur` so that you won't have to namespace every api call.
2. Define your Task Graph
test_graph = {
A: [:B, :C],
B: [:D],
C: [:D],
D: [:E],
E: []
}Each Task corresponds to a Ruby class with an `execute` method
3. Add your Task Graph to Jongleur
API.add_task_graph test_graph
=> [#,
#,
#,
#,
#]
Jongleur will show you the Task Matrix for your Task Graph with all attributes set at their initial values, obviously, since the Tasks haven't ran yet.
4. (Optional) You may want to see a graphical representation of your Task Graph
API.print_graph('/tmp')
=> "/tmp/jongleur_graph_08252018_194828.pdf"Opening the PDF file will display this:
5. Implement your tasks. To do that you have to* create a new class, based on `WorkerTask` and named as in your Task Graph
* define an `#execute` method in your class. This is the method that Jongleur will call to run the Task.
For instance task A from your Task Graph may look something like that:
class A < Jongleur::WorkerTask
@desc = 'this is task A'
def execute
sleep 1 # do something
'A is running... '
end
endYou'll have to do the same for Tasks B, C, D and E, as these are the tasks declared in the Task Graph.
6. Run the tasks and implement the `completed` callback. This will be called asynchronously when Jongleur has finished running all the tasks.
$ API.run do |on|
on.completed { |task_matrix| puts "Done!"}
end
=> Starting workflow...
=> starting task A
=> finished task: A, process: 2501, exit_status: 0, success: true
=> starting task B
=> starting task C
=> finished task: C, process: 2503, exit_status: 0, success: true
=> finished task: B, process: 2502, exit_status: 0, success: true
=> starting task D
=> finished task: D, process: 2505, exit_status: 0, success: true
=> starting task E
=> finished task: E, process: 2506, exit_status: 0, success: true
=> Workflow finished
=> Done!
Examples of running Jongleur can be found in the `examples` directory.
## Use-Cases
### Extract-Transform-Load
The ETL workflow is ideally suited to Jongleur. You can define many Extraction tasks -maybe separate Tasks for different data sources- and have them ran in parallel to each other. At the same time Transformation and Loading Tasks wait in turn for the previous task to finish before they start, as in this DAG illustration:### Transactions
Transactional workflows can be greatly sped up by Jongleur by parallelising parts of the transaction that are usually performed sequentially, i.e.:
## DevelopmentAfter checking out the repo, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## F.A.Q
### Does Jongleur allow me to pass messages between Tasks?
No it doesn't. Each task is run completely independently from the other Tasks. There is no Inter-Process Communication, no common data contexts, no shared memory.### How can I share data created by a predecessor Task?
This is something that I would like to build into Jongleur. For now, you can save a Task's data in a database or KV Store and using the Task's process id as part of the key. Subsequent Tasks can retrieve their predecessor's process ids with```
API.get_predecessor_pids
```and therefore retrieve the data created by those Tasks.
### What's the difference between Jongleur::Task's _success\_status_ and _exit\_status_ attributes?
According to [the official docs](https://ruby-doc.org/core-2.4.3/Process/Status.html) `exit_status` returns the least significant eight bits of the return code of the `stat` call while `success_status` returns true if `stat` is successful.### What happens when Jongleur finishes running?
When Jongleur finishes running all tasks in its Task Graph -and regardless of whether the Tasks themselves have failed or not- it will exit the parent process with an exit code of 0.### What happens if a Task fails
If a Task fails to run or to finish its run, Jongleur will simply go on running any other tasks it can. It will not run any Tasks which depend on the failed Task. The status of the failed Task will be indicated via an appropriate output message and will also be visible on the Task Matrix.### Can I quickly analyse the Task Matrix after Jongleur has finished?
Yes. When the `completed` callback is called, Jongleur will enable the following methods:```
API::successful_tasks
API::failed_tasks
API::not_ran_tasks
API::hung_tasks
```### Are there any execution logs saved?
Jongleur serializes each run's Task Matrix as a time-stamped JSON file in the `/tmp` directory. You can either view this in an editor or load it and manipulate it in Ruby with
`JSON.parse( File.read('/tmp/jongleur_task_matrix_08272018_103406.json') )`
## Roadmap
These are the things I'd like Jongleur to support in future releases:
* Task storage mechanism, i.e. the ability for each Task to save data in a uniquely identifiable and safe way so that data can be shared between sequential tasks in a transparent and easy manner.
* Rails integration. Pretty self-explanatory really.## Contributing
Any suggestions for new features or improvements are very welcome. Please raise bug reports and pull requests on [GitLab](https://gitlab.com/RedFred7/Jongleur).
## License
The gem is available as open source under the terms of the [MIT License](./License.txt)