{"id":14974059,"url":"https://github.com/vinissimus/jobs","last_synced_at":"2025-10-27T05:31:27.963Z","repository":{"id":57452141,"uuid":"281219465","full_name":"vinissimus/jobs","owner":"vinissimus","description":"A PL/PGSQL based work queue, with an asyncio asyncpg python API","archived":false,"fork":false,"pushed_at":"2020-10-23T15:28:50.000Z","size":29,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T02:41:27.717Z","etag":null,"topics":["asyncio","asyncpg","plpgsql","postgresql","python","queue"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vinissimus.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-20T20:30:13.000Z","updated_at":"2024-04-19T23:45:51.000Z","dependencies_parsed_at":"2022-09-02T08:33:13.476Z","dependency_job_id":null,"html_url":"https://github.com/vinissimus/jobs","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinissimus%2Fjobs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinissimus%2Fjobs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinissimus%2Fjobs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vinissimus%2Fjobs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vinissimus","download_url":"https://codeload.github.com/vinissimus/jobs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238445840,"owners_count":19473821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asyncio","asyncpg","plpgsql","postgresql","python","queue"],"created_at":"2024-09-24T13:49:53.777Z","updated_at":"2025-10-27T05:31:27.599Z","avatar_url":"https://github.com/vinissimus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Jobs\n\nA PL/PGSQL based work queue (Publisher/Consumer),\nwith a python asyncio/asyncpg api\n\n**alpha software**\n\n## Features\n\n- Implements a two layer API:\n\n    A postgresql layer: tasks can be published from PL/PGSQL functions, \n    or procedures. Also can be extended using triggers.\n\n    A python layer (or any client with a postgresql driver). The default\n    implementations is based on asyncio python, using the awesome\n    asyncpg driver.\n\n- It's compatible with postgrest. All procedures, and tables, are scoped\n  on an owned postgresql schema, and can be exposed throught it, with postgrest\n\n- Retry logic, schedule_at or timeout, are implemented on the\n  publish method. A task, can be published, with a max_retries, param,\n  or an especific timeout.\n\n- Internally uses two tables `jobs.job_queue` the table where pending and\n  running tasks are scheduled, and `jobs.job` the table where ended tasks,\n  are moved (success or failures).\n\n- By default, tasks are retyried three times, with backoff.\n\n- Timeout jobs, are expired, tasks by default had a 60s tiemout.\n\n- Tasks can be scheduled on the future, just provide a `scheduled_at` param.\n\n- There are views to monitor queue stats: `jobs.all` (all tasks),\n  `jobs.expired` and `jobs.running`\n\n- Tasks could also be priorized, provide a priority number, greater priority,\n  precedence over other tasks\n\n- consumer_topic, allows to consume tasks with a * (*topic.element.%*)\n\n- rudimentary benchs on my laptop showed that it can handle 1000 tasks/second, \n  but anyway it depends on your postgres instance.\n\n- instead of a worker daemon, tasks could also be consumed from a cronjob, or\na regular python or a kubernetes job. (It could be used to parallelize k8 jobs)\n\n## tradeofs\n\n- All jobs had to be aknowledged positive or negative (ack/nack)\n- \n\n## Use from postgresql\n\n```sql\nSELECT job_id FROM\n    jobs.publish(\n        i_task -- method or function to be executed,\n        i_body::jsonb = null -- arguments passed to it (on python {args:[], kwargs:{}}),\n        i_scheduled_at: timestamp = null, -- when the task should run\n        i_timeout:numeric(7,2) -- timeout in seconds for the job\n        i_priority:integer = null -- gretare number more priority\n    )\n```\n\nOn the worker side:\n\n```sql\nSELECT * from jobs.consume(\n    num: integer -- number of desired jobs\n);\n```\nreturns a list of jobs to be processed, \n\nOr selective consume a topic:\n\n```sql\nSELECT * from jobs.consume_topic('topic.xxx.%', 10)\n```\n\njobs are marked as processing,\nand should be acnlowledged with:\n\n```sql\nSELECT FROM jobs.ack(job_id);\n```\n\nor to return a failed job.\n\n```sql\nSELECT FROM jobs.nack(job_id, traceback, i_schedule_at)\n```\n\nAlso you can batch enqueue multiple jobs in a single request, using\n\n```sql\nSELECT * FROM jobs.publish_bulk(jobs.bulk_job[]);\n```\n\nwhere jobs.bulk_job is\n\n```sql\ncreate type jobs.bulk_job as (\n    task varchar,\n    body jsonb,\n    scheduled_at timestamp,\n    timeout integer,\n    priority integer,\n    max_retries integer\n);\n```\n\n## Use from python\n\nOn this side, implementing a worker, should be something like\n\n```python\n    db = await asyncpg.connect(dsn)\n    while True:\n        jobs = await jobs.consume(db, 1)\n        for job in jobs:\n            try:\n                await jobs.run(db, job[\"job_id\"])\n                await jobs.ack(job[\"job_id\"])\n            except Exception as e:\n                await jobs.nack(job[\"job_id\"], str(e))\n        await asyncio.sleep(1)\n```\n\nOn the publisher side, jobs could be enqueued from between a\npostgresql transaction:\n\n```python\ndb = await asyncpg.connect(dsn)\nasync with db.transaction():\n    # do whatever is needed,\n    # queue a task\n    await jobs.publish(\"package.file.sum\", args=[1,2])\n```\n\n## Installing the package\n\n```bash\n\npip install pgjobs\njobs-migrate postgresql://user:password@localhost:5432/db\n\nThis will create the schema on the `jobs` postgresql schema\n\n```\n\nTo run the worker,\n\n```\njobs-worker postgresql://dsn\n```\n\nAt the moment there are no too much things implemented there,\njust a single threaded worker, that needs a bit more of love :)\nIf your application resides on a python package,\ntasks like `yourpackage.file.method` will be runnable as is.\n\n## Observavility and monitor\n\nWith psql, or exposing them throught postgresql_exporter\n\n## TODO\n\n- [ ] connect notifications, using pg_notify, when tasks are queued,\n      are picked, are completed. With this in place, it's easy\n      enought to write o WS to send notifications to connected customers.\n\n- [ ] improve the worker to run every job on an asyncio task\n\n- [ ] handle better exceptions on the python side\n\n- [x] fix requirements file\n\n- [ ] add github actions to run CI\n\n- [ ] write better docs and some examples\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinissimus%2Fjobs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvinissimus%2Fjobs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvinissimus%2Fjobs/lists"}