{"id":29094237,"url":"https://github.com/wetransfer/sqewer","last_synced_at":"2025-06-28T09:10:52.277Z","repository":{"id":3545763,"uuid":"49872110","full_name":"WeTransfer/sqewer","owner":"WeTransfer","description":"SQS queue processor engine","archived":false,"fork":false,"pushed_at":"2024-09-10T17:55:45.000Z","size":412,"stargazers_count":30,"open_issues_count":8,"forks_count":8,"subscribers_count":18,"default_branch":"master","last_synced_at":"2024-09-10T20:05:17.174Z","etag":null,"topics":["wt-branch-protection-default"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WeTransfer.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2016-01-18T11:21:10.000Z","updated_at":"2024-09-10T17:55:50.000Z","dependencies_parsed_at":"2024-02-08T21:31:09.107Z","dependency_job_id":null,"html_url":"https://github.com/WeTransfer/sqewer","commit_stats":{"total_commits":267,"total_committers":12,"mean_commits":22.25,"dds":"0.25093632958801493","last_synced_commit":"a9841cb5ef2d48de38d6eb55016e295437550346"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/WeTransfer/sqewer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeTransfer%2Fsqewer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeTransfer%2Fsqewer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeTransfer%2Fsqewer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeTransfer%2Fsqewer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WeTransfer","download_url":"https://codeload.github.com/WeTransfer/sqewer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WeTransfer%2Fsqewer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262403828,"owners_count":23305692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["wt-branch-protection-default"],"created_at":"2025-06-28T09:10:51.540Z","updated_at":"2025-06-28T09:10:52.272Z","avatar_url":"https://github.com/WeTransfer.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"An AWS SQS-based queue processor, for highly distributed job engines.\n\n[![Build Status](https://travis-ci.org/WeTransfer/sqewer.svg?branch=master)](https://travis-ci.org/WeTransfer/sqewer)\n\n## The shortest introduction possible\n\nIn your environment, set `SQS_QUEUE_URL`. Then, define a job class:\n\n    class MyJob\n      def run\n       File.open('output', 'a') { ... }\n      end\n    end\n\nThen submit the job:\n\n    Sqewer.submit!(MyJob.new)\n\nand to start processing, in your commandline handler:\n\n    #!/usr/bin/env ruby\n    require 'my_applicaion'\n    Sqewer::CLI.start\n\nTo add arguments to the job\n\n    class JobWithArgs\n      include Sqewer::SimpleJob\n      attr_accessor :times\n\n      def run\n        ...\n      end\n    end\n    ...\n    Sqewer.submit!(JobWithArgs.new(times: 20))\n\nSubmitting jobs from other jobs (the job will go to the same queue the parent job came from):\n\n    class MyJob\n      def run(worker_context)\n        ...\n        worker_context.submit!(CleanupJob.new)\n      end\n    end\n\nThe messages will only be deleted from SQS once the job execution completes without raising an exception.\n\n## Requirements\n\nRuby 3+, version 2 of the AWS SDK. You can also run Sqewer backed by a SQLite database file, which can be handy for development situations.\n\n## Job storage\n\nJobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:\n\n    {\"_job_class\": \"MyJob\", \"_job_params\": null}\n\nWhen this ticket is being picked up by the worker, the worker will do the following:\n\n    job = MyJob.new\n    job.run\n\nSo the smallest job class has to be instantiatable, and has to respond to the `run` message.\n\n## Jobs with arguments and parameters\n\nJob parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are\ndirectly translated to keyword arguments of the job constructor. With a job ticket like this:\n\n    {\n      \"_job_class\": \"MyJob\",\n      \"_job_params\": {\"ids\": [1,2,3]}\n    }\n\nthe worker will instantiate your `MyJob` class with the `ids:` keyword argument:\n\n    job = MyJob.new(ids: [1,2,3])\n    job.run\n\nNote that at this point only arguments that are raw JSON types are supported:\n\n* Hash\n* Array\n* Numeric\n* String\n* nil/false/true\n\nIf you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`\n\n### Sqewer::SimpleJob\n\nThe module `Sqewer::SimpleJob` can be included to a job class to add some features, specially dealing with attributes, see more details [here](https://github.com/WeTransfer/sqewer/blob/master/lib/sqewer/simple_job.rb).\n\n## Jobs spawning dependent jobs\n\nIf your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will\nbe passed to the `run` method.\n\n    job = MyJob.new(ids: [1,2,3])\n    job.run(execution_context)\n\nThe execution context has some useful methods:\n\n * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.\n * `submit!` for submitting more jobs to the same queue\n\nA job submitting a subsequent job could look like this:\n\n    class MyJob\n      def run(ctx)\n        ...\n        ctx.submit!(DeferredCleanupJob.new)\n      end\n    end\n\n## Job submission\n\nIn general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must\ninclude all the keyword arguments needed to instantiate the job when executing. For example:\n\n    class SendMail\n      def initialize(to:, body:)\n        ...\n      end\n\n      def run()\n        ...\n      end\n\n      def to_h\n        {to: @to, body: @body}\n      end\n    end\n\nOr if you are using simple Struct you could inherit your Job from it:\n\n    class SendMail \u003c Struct.new(:to, :body, keyword_init: true)\n      def run\n        ...\n      end\n    end\n\n## Job marshaling\n\nBy default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can\noverride that object if you need to handle job tickets that come from external sources and do not necessarily\nconform to the job serialization format used internally. For example, you can handle S3 bucket notifications:\n\n    class CustomSerializer \u003c Sqewer::Serializer\n      # Overridden so that we can instantiate a custom job\n      # from the AWS notification payload.\n      # Return \"nil\" and the job will be simply deleted from the queue\n      def unserialize(message_blob)\n        message = JSON.load(message_blob)\n        return if message['Service'] # AWS test\n        return HandleS3Notification.new(message) if message['Records']\n\n        super # as default\n      end\n    end\n\nOr you can override the serialization method to add some metadata to the job ticket on job submission:\n\n    class CustomSerializer \u003c Sqewer::Serializer\n      def serialize(job_object)\n        json_blob = super\n        parsed = JSON.load(json_blob)\n        parsed['_submitter_host'] = Socket.gethostname\n        JSON.dump(parsed)\n      end\n    end\n\nIf you return `nil` from your `unserialize` method the job will not be executed,\nbut will just be deleted from the SQS queue.\n\n## Starting and running the worker\n\nThe very minimal executable for running jobs would be this:\n\n    #!/usr/bin/env ruby\n    require 'my_applicaion'\n    Sqewer::CLI.start\n\nThis will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and\nuse all the default parameters. The `CLI` module will also set up a signal handler to terminate\nthe current jobs cleanly if the commandline app receives a USR1 and TERM.\n\nYou can also run a worker without signal handling, for example in test\nenvironments. Note that the worker is asynchronous, it has worker threads\nwhich do all the operations by themselves.\n\n    worker = Sqewer::Worker.new\n    worker.start\n    # ...and once you are done testing\n    worker.stop\n\n## Configuring the worker\n\nOne of the reasons this library exists is that sometimes you need to set up some more\nthings than usually assumed to be possible. For example, you might want to have a special\nlogging library:\n\n    worker = Sqewer::Worker.new(logger: MyCustomLogger.new)\n\nOr you might want a different job serializer/deserializer (for instance, if you want to handle\nS3 bucket notifications coming into the same queue):\n\n    worker = Sqewer::Worker.new(serializer: CustomSerializer.new)\n\nYou can also elect to inherit from the `Worker` class and override some default constructor\narguments:\n\n    class CustomWorker \u003c Sqewer::Worker\n      def initialize(**kwargs)\n        super(serializer: CustomSerializer.new, ..., **kwargs)\n      end\n    end\n\nThe `Sqewer::CLI` module that you run from the commandline handler application can be\nstarted with your custom Worker of choice:\n\n    custom_worker = Sqewer::Worker.new(logger: special_logger)\n    Sqewer::CLI.start(custom_worker)\n\n## Threads versus processes\n\nsqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory\nmanagement reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics\nto submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages\nyou generate. For example, you could use a pipe. But in a more general case something like this can be used:\n\n    class MyJob\n      def run\n        pid = fork do\n          SomeRemoteService.reconnect # you are in the child process now\n          ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|\n          end\n        end\n\n        _, status = Process.wait2(pid)\n\n        # Raise an error in the parent process to signal Sqewer that the job failed\n        # if the child exited with a non-0 status\n        raise \"Child process crashed\" unless status.exitstatus \u0026\u0026 status.exitstatus.zero?\n      end\n    end\n\n## Execution and serialization wrappers (middleware)\n\nYou can wrap job processing in middleware. A full-featured middleware class looks like this:\n\n    class MyWrapper\n      # Surrounds the job instantiation from the string coming from SQS.\n      def around_deserialization(serializer, msg_id, msg_payload, msg_attributes)\n        # msg_id is the receipt handle, msg_payload is the message body string, msg_attributes are the message's attributes\n        yield\n      end\n\n      # Surrounds the actual job execution\n      def around_execution(job, context)\n        # job is the actual job you will be running, context is the ExecutionContext.\n        yield\n      end\n    end\n\nYou need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:\n\n    stack = Sqewer::MiddlewareStack.new\n    stack \u003c\u003c MyWrapper.new\n    w = Sqewer::Worker.new(middleware_stack: stack)\n\n# Execution guarantees\n\nAs a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's\n`ensure` clause.\n\n  * When a job succeeds (raises no exceptions), it will be deleted from the queue\n  * When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue\n  * When a job, or any wrapper routing of the job execution,\n    raises any exception, the job will not be deleted\n  * When a submit spun off from the job, or the deletion of the job itself,\n    cause an exception, the job will not be deleted\n\nUse those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts\nstart at the same job at the same time), idempotent (a job should be able to run twice without errors),\nand traceable (make good use of logging).\n\n# Usage with Rails via ActiveJob\n\nThis gem includes a queue adapter for usage with ActiveJob in Rails 5+. The functionality\nis well-tested and should function for any well-conforming ActiveJob subclasses.\n\nTo run the default `sqewer` worker setup against your Rails application, first set it as the\nexecuting backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`\nin the environment variables, and make sure you can access it using your default (envvar-based\nor machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:\n\n    class Application \u003c Rails::Application\n      ...\n      config.active_job.queue_adapter = :sqewer\n    end\n\nand then run\n\n    $ bundle exec sqewer_rails\n\nin your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary\nfor executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly\nbefore the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally\nplay nice with threading. So do not forget to add this in your worker code:\n\n    Rails.application.eager_load!\n\nFor handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.\n\n## ActiveJob feature support matrix\n\nCompared to the matrix of features as seen in the\n[official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)\n`sqewer` has the following support for various ActiveJob options, in comparison to the builtin\nActiveJob adapters:\n\n    |                   | Async | Queues | Delayed    | Priorities | Timeout | Retries |\n    |-------------------|-------|--------|------------|------------|---------|---------|\n    | sqewer            | Yes   | No     | Yes        | No         | No      | Global  |\n    |       //          |  //   |  //    |  //        | //         |  //     | //      |\n    | Active Job Async  | Yes   | Yes    | Yes        | No         | No      | No      |\n    | Active Job Inline | No    | Yes    | N/A        | N/A        | N/A     | N/A     |\n\nRetries are set up globally for the entire SQS queue. There is no specific queue setting per job,\nsince all the messages go to the queue available to `Sqewer.submit!`.\n\nThere is no timeout handling, if you need it you may want to implement it within your jobs proper.\nRetries are handled on Sqewer level for as many deliveries as your SQS settings permit.\n\n## Delay handling\n\nDelayed execution is handled via a combination\nof the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation\nin Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery\ndelay option will be used - and the job will become visible for workers on the SQS queue only after this period.\n\nIf a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain\na UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS\ndelivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the\nqueue with the same maximum delay, and will continue doing so for as long as necessary.\n\nNote that this will incur extra receives and sends on the queue, and even though it is not substantial,\nit will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,\nyou may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue\nfor the actual, executable background task.\n\n# Frequently asked questions (A.K.A. _why is it done this way_)\n\nThis document tries to answer some questions that may arise when reading or using the library. Hopefully\nthis can provide some answers with regards to how things are put together.\n\n## Why separate `new` and `run` methods instead of just `perform`?\n\nBecause the job needs access to the execution context of the worker. It turned out that keeping the context\nin global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context\nto enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore\nit makes more sense to offer Jobs access to the execution context, and to make a Job a command object.\n\nAlso, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense\nto save those parameters in instance variables or in struct members.\n\n## Why keyword constructors for jobs?\n\nBecause keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,\nby checking for missing keywords and by allowing default keyword argument values. Also, we already have some\nproducts that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.\n\n## Why no weighted queues?\n\nBecause very often when you want to split queues servicing one application it means that you do not have enough\ncapacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,\nwhereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not\nstall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most\nproducts using this library, was a very bad idea (more workload for deployment).\n\n## Why so many configurable components?\n\nBecause sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your\nimplementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product\nneeds a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.\nYet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a\npretty substantial union of required features will start to emerge. Do not fear - most classes of the library\nhave a magic `.default` method which will liberate you from most complexities.\n\n## Why multithreading for workers?\n\nBecause it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even\nnetwork-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening\nthe operating system with too many processes. An optional feature for one-process-per-job is going to be added\nsoon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.\n\n## Why no Celluloid?\n\nBecause I found that a producer-consumer model with a thread pool works quite well, and can be created based on\nthe Ruby standard library alone.\n\n## Contributing to the library\n\n* Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.\n* Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.\n* Fork the project.\n* Start a feature/bugfix branch.\n* Commit and push until you are happy with your contribution.\n* Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.\n* Run your tests against a _real_ SQS queue. You will need your tests to have permissions to create and delete SQS queues.\n* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.\n\n## Copyright\n\nCopyright (c) 2016 WeTransfer. See LICENSE.txt for further details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwetransfer%2Fsqewer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwetransfer%2Fsqewer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwetransfer%2Fsqewer/lists"}