Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/codepr/tasq
A simple task queue implementation to enqeue jobs on local or remote processes.
https://github.com/codepr/tasq
distributed-computing distributed-job distributed-systems job-scheduler queue redis remote-workers scheduled-jobs task-queue zmq
Last synced: 2 months ago
JSON representation
A simple task queue implementation to enqeue jobs on local or remote processes.
- Host: GitHub
- URL: https://github.com/codepr/tasq
- Owner: codepr
- License: gpl-3.0
- Created: 2018-04-25T23:23:49.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-12-09T11:12:04.000Z (about 4 years ago)
- Last Synced: 2024-08-08T23:22:56.230Z (6 months ago)
- Topics: distributed-computing, distributed-job, distributed-systems, job-scheduler, queue, redis, remote-workers, scheduled-jobs, task-queue, zmq
- Language: Python
- Homepage:
- Size: 412 KB
- Stars: 92
- Watchers: 7
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
README
Tasq
====![tasq](https://github.com/codepr/tasq/workflows/tasq/badge.svg)
Very simple distributed Task queue that allow the scheduling of job functions
to be executed on local or remote workers. Can be seen as a Proof of Concept
leveraging ZMQ sockets and cloudpickle serialization capabilities as well as a
very basic actor system to handle different loads of work from connecting
clients. Originally it was meant to be just a brokerless job queue, recently
I dove deeper on the topic and decided to add support for job persistence and
extensions for Redis/RabbitMQ middlewares as well.The main advantage of using a brokerless task queue, beside latencies is the
lower level of complexity of the system. Additionally Tasq offer the
possibility of launching and forget some workers on a network and schedule jobs
to them without having them to know nothing about the code that they will run,
allowing to define tasks dinamically, without stopping the workers. Obviously
this approach open up more risks of malicious code to be injected to the
workers, currently the only security measure is to sign serialized data passed
to workers, but the entire system is meant to be used in a safe environment.**NOTE:** The project is still in development stage and it's not advisable to
try it in production enviroments.Features:
- Redis, RabbitMQ or ZMQ (brokerless) as backend
- Delayed tasks and scheduled cron tasks
- Configuration on disk
- Actor-based workers (I/O bound tasks)
- Process queue workers (CPU bound tasks)Todo:
- Check for pynacl for security on pickled data
- Refactoring of bad parts
- More debug (constant debugging)## Quickstart
Starting a worker on a node using Redis as backend
```sh
$ tq redis-runner --log-level DEBUG
2019-04-26 23:15:28 - tasq.remote.supervisor-17903: Worker type: Actor
```In a python shell
**Using a queue object**
```python
Python 3.7.3 (default, Apr 26 2019, 21:43:19)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
Warning: disable autoreload in ipython_config.py to improve performance.In [1]: import tasq
In [2]: tq = tasq.queue('redis://localhost:6379')
In [3]: def fib(n):
...: if n == 0:
...: return 0
...: a, b = 0, 1
...: for _ in range(n - 1):
...: a, b = b, a + b
...: return b
...:In [4]: # Asynchronous execution
In [5]: fut = tq.put(fib, 50, name='fib-async')In [6]: fut
Out[6]:In [7]: fut.unwrap()
Out[7]: 12586269025In [8]: res = tq.put_blocking(fib, 50, name='fib-sync')
In [9]: res.unwrap()
Out[9]: 12586269025
```Scheduling jobs after a delay
```pythonIn [10]: fut = tq.put(fib, 5, name='fib-delayed', delay=5)
In [11]: fut
Out[11]:In [12]: # wait 5 seconds
In [13]: fut.unwrap()
Out[13]: 5In [14] tq.results
Out[14] {'fib-async': ,
Out[14] 'fib-sync':
Out[14] 'fib-delayed': }
```Scheduling a task to be executed continously in a defined interval
```python
In [15] tq.put(fib, 5, name='8_seconds_interval_fib', eta='8s')In [16] tq.put(fib, 5, name='2_hours_interval_fib', eta='2h')
```Delayed and interval tasks are supported even in blocking scheduling manner.
Tasq also supports an optional static configuration file, in the
`tasq.settings.py` module is defined a configuration class with some default
fields. By setting the environment variable `TASQ_CONF` it is possible to
configure the location of the json configuration file on the filesystem.By setting the `-c` flag it is possible to also set a location of a
configuration to follow on the filesystem```sh
$ tq worker -c path/to/conf/conf.json
```A worker can be started by specifying the type of sub worker we want:
```sh
$ tq rabbitmq-worker --worker-type process
```
Using `process` type subworker it is possible to use a distributed queue for
parallel execution, usefull when the majority of the jobs are CPU bound instead
of I/O bound (actors are preferable in that case).If jobs are scheduled for execution on a disconnected client, or remote workers
are not up at the time of the scheduling, all jobs will be enqeued for later
execution. This means that there's no need to actually start workers before job
scheduling, at the first worker up all jobs will be sent and executed.### Security
Currently tasq gives the option to send pickled functions using digital sign in
order to prevent manipulations of the sent payloads, being dependency-free it
uses `hmac` and `hashlib` to generate digests and to verify integrity of
payloads, planning to move to a better implementation probably using `pynacl`
or something similar.## Behind the scenes
Essentially it is possible to start workers across the nodes of a network
without forming a cluster and every single node can host multiple workers by
setting differents ports for the communication. Each worker, once started,
support multiple connections from clients and is ready to accept tasks.Once a worker receive a job from a client, it demand its execution to dedicated
actor or process, usually selected from a pool according to a defined routing
strategy in the case of actor (e.g. Round robin, Random routing or Smallest
mailbox which should give a trivial indication of the workload of each actor
and select the one with minimum pending tasks to execute) or using a simple
distributed queue across a pool of process in producer-consumer way.![Tasq master-workers arch](static/worker_model_2.png)
Another (pool of) actor(s) is dedicated to answering the clients with the
result once it is ready, this way it is possible to make the worker listening
part unblocking and as fast as possible.The reception of jobs from clients is handled by `ZMQ.PULL` socket while the
response transmission handled by `ResponseActor` is served by `ZMQ.PUSH`
socket, effectively forming a dual channel of communication, separating ingoing
from outgoing traffic.## Installation
Being a didactical project it is not released on Pypi yet, just clone the
repository and install it locally or play with it using `python -i` or
`ipython`.```sh
$ git clone https://github.com/codepr/tasq.git
$ cd tasq
$ pip install .
```or, to skip cloning part
```sh
$ pip install git+https://github.com/codepr/tasq.git@master#egg=tasq
```## Changelog
See the [CHANGES](CHANGES.md) file.