{"id":13814371,"url":"https://github.com/dcramer/taskmaster","last_synced_at":"2025-04-05T03:11:06.808Z","repository":{"id":3177859,"uuid":"4209481","full_name":"dcramer/taskmaster","owner":"dcramer","description":"A simple distributed queue designed for handling one-off tasks with large sets of tasks","archived":false,"fork":false,"pushed_at":"2015-05-07T08:05:49.000Z","size":270,"stargazers_count":444,"open_issues_count":3,"forks_count":32,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-04-02T12:54:06.085Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcramer.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-05-03T01:39:42.000Z","updated_at":"2025-02-23T11:50:33.000Z","dependencies_parsed_at":"2022-08-24T09:00:18.261Z","dependency_job_id":null,"html_url":"https://github.com/dcramer/taskmaster","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcramer%2Ftaskmaster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcramer%2Ftaskmaster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcramer%2Ftaskmaster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcramer%2Ftaskmaster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcramer","download_url":"https://codeload.github.com/dcramer/taskmaster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247280272,"owners_count":20912967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T04:01:55.237Z","updated_at":"2025-04-05T03:11:06.793Z","avatar_url":"https://github.com/dcramer.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"Taskmaster\n----------\n\nTaskmaster is a simple distributed queue designed for handling large numbers of one-off tasks.\n\nWe built this at DISQUS to handle frequent, but uncommon tasks like \"migrate this data to a new schema\".\n\nWhy?\n----\n\nYou might ask, \"Why not use Celery?\". Well the answer is simply that normal queueing requires (not literally,\nbut it'd be painful without) you to buffer all tasks into a central location. This becomes a problem when you\nhave a large amount of tasks, especially when they contain a large amount of data.\n\nImagine you have 1 billion tasks, each weighing in at 5k. Thats, uncompressed, at minimum 4 terabytes of storage\nrequired just to keep that around, and gains you very little.\n\nTaskmaster on the other hand is designed to take a resumable iterator, and only pull in a maximum number of\njobs at a time (using standard Python Queue's). This ensures a consistent memory pattern that can scale linearly.\n\nRequirements\n------------\n\nRequirements **should** be handled by setuptools, but if they are not, you will need the following Python packages:\n\n* progressbar\n* pyzmq (zeromq)\n* gevent\n* gevent_zeromq\n\n\nA note on Gevent\n----------------\n\nBeing that Taskmaster uses gevent for both its iterator task (master) and its consumers, your application will need\nto correctly implement non-blocking gevent compatible callers. In most cases this won't be a problem, but if you're\nusing the network you'll need to look for a compatible library for your adapter. For example, there is an alternative\nversion of ``psycopg2`` designed for gevent called ``gevent-psycopg2``.\n\nUsage\n-----\n\nCreate an iterator, and callback::\n\n    # taskmaster/example.py\n    def get_jobs(last=0):\n        # last would be sent if state was resumed\n        # from a previous run\n        for i in xrange(last, 100000000):\n            # jobs yielded must be serializeable with pickle\n            yield i\n\n    def handle_job(i):\n        # this **must** be idempotent, as resuming the process may execute a job\n        # that had already been run\n        print \"Got %r!\" % i\n\n\nSpawn a master::\n\n    $ tm-master taskmaster.example\n\nYou can also pass keyword arguments for the master::\n\n    $ tm-master taskmaster.example argument=value\n\nSpawn a slave::\n\n    $ tm-slave taskmaster.example\n\nOr spawn 8 slaves (each containing a threadpool)::\n\n    $ tm-spawn taskmaster.example 8\n\nDont like the magical function discover for master/slave? Specify your own targets::\n\n    $ tm-master taskmaster.example:get_jobs\n    $ tm-slave taskmaster.example:handle_job\n\nMaybe you simply need to run things on the same server?\n\n::\n\n    $ tm-run taskmaster/example.py 8\n\n.. note:: All arguments are optional, and the address will default to ``tcp://0.0.0.0:3050``.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcramer%2Ftaskmaster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcramer%2Ftaskmaster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcramer%2Ftaskmaster/lists"}