{"id":13878356,"url":"https://github.com/fatkodima/sidekiq-iteration","last_synced_at":"2025-05-16T18:08:06.561Z","repository":{"id":62597694,"uuid":"560618932","full_name":"fatkodima/sidekiq-iteration","owner":"fatkodima","description":"Make your long-running sidekiq jobs interruptible and resumable.","archived":false,"fork":false,"pushed_at":"2024-07-29T10:13:17.000Z","size":93,"stargazers_count":276,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-03T17:13:57.912Z","etag":null,"topics":["activerecord","gem","rails","ruby","sidekiq"],"latest_commit_sha":null,"homepage":"https://rubydoc.info/gems/sidekiq-iteration","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fatkodima.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-01T22:22:23.000Z","updated_at":"2025-04-01T08:58:26.000Z","dependencies_parsed_at":"2024-05-01T13:19:54.198Z","dependency_job_id":"b68e21b8-1978-4b92-8d8a-2ea54ac997f5","html_url":"https://github.com/fatkodima/sidekiq-iteration","commit_stats":{"total_commits":57,"total_committers":1,"mean_commits":57.0,"dds":0.0,"last_synced_commit":"36d4317c396d96cc5244f023a80602d25cb6456f"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fsidekiq-iteration","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fsidekiq-iteration/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fsidekiq-iteration/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fatkodima%2Fsidekiq-iteration/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fatkodima","download_url":"https://codeload.github.com/fatkodima/sidekiq-iteration/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248600019,"owners_count":21131411,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["activerecord","gem","rails","ruby","sidekiq"],"created_at":"2024-08-06T08:01:47.186Z","updated_at":"2025-04-12T16:44:13.837Z","avatar_url":"https://github.com/fatkodima.png","language":"Ruby","funding_links":[],"categories":["Ruby","Queues and Messaging"],"sub_categories":[],"readme":"# Sidekiq Iteration\n\n[![Build Status](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/fatkodima/sidekiq-iteration/actions/workflows/ci.yml)\n\nMeet Iteration, an extension for [Sidekiq](https://github.com/mperham/sidekiq) that makes your long-running jobs interruptible and resumable, saving all progress that the job has made (aka checkpoint for jobs).\n\nYou may consider [`pluck_in_batches`](https://github.com/fatkodima/pluck_in_batches) gem to speedup iterating over large database tables.\n\n## Background\n\nImagine the following job:\n\n```ruby\nclass SimpleJob\n  include Sidekiq::Job\n\n  def perform\n    User.find_each do |user|\n      user.notify_about_something\n    end\n  end\nend\n```\n\nThe job would run fairly quickly when you only have a hundred `User` records. But as the number of records grows, it will take longer for a job to iterate over all Users. Eventually, there will be millions of records to iterate and the job will end up taking hours or even days.\n\nWith frequent deploys and worker restarts, it would mean that a job will be either lost or restarted from the beginning. Some records (especially those in the beginning of the relation) will be processed more than once.\n\nCloud environments are also unpredictable, and there's no way to guarantee that a single job will have reserved hardware to run for hours and days. What if AWS diagnosed the instance as unhealthy and will restart it in 5 minutes? All job progress will be lost.\n\nSoftware that is designed for high availability [must be resilient](https://12factor.net/disposability) to interruptions that come from the infrastructure. That's exactly what Iteration brings to Sidekiq.\n\n## Requirements\n\n- Ruby 2.7+\n- Sidekiq 6+\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'sidekiq-iteration'\n```\n\nAnd then execute:\n\n    $ bundle\n\n## Getting started\n\nIn the job, include `SidekiqIteration::Iteration` module and start describing the job with two methods (`build_enumerator` and `each_iteration`) instead of `perform`:\n\n```ruby\nclass NotifyUsersJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(cursor:)\n    active_record_records_enumerator(User.all, cursor: cursor)\n  end\n\n  def each_iteration(user)\n    user.notify_about_something\n  end\nend\n```\n\n`each_iteration` will be called for each `User` model in `User.all` relation. The relation will be ordered by primary key, exactly like `find_each` does.\nIteration hooks into Sidekiq out of the box to support graceful interruption. No extra configuration is required.\n\n## Examples\n\n### Job with custom arguments\n\n```ruby\nclass ArgumentsJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(arg1, arg2, cursor:)\n    active_record_records_enumerator(User.all, cursor: cursor)\n  end\n\n  def each_iteration(user, arg1, arg2)\n    user.notify_about_something\n  end\nend\n\nArgumentsJob.perform_async(arg1, arg2)\n```\n\n### Job with custom lifecycle callbacks\n\n```ruby\nclass NotifyUsersJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def on_start\n    # Will be called when the job starts iterating. Called only once, for the first time.\n  end\n\n  def around_iteration\n    # Will be called around each iteration.\n    # Can be useful for some metrics collection, performance tracking etc.\n    yield\n  end\n\n  def on_resume\n    # Called when the job resumes iterating.\n  end\n\n  def on_shutdown\n    # Called each time the job is interrupted.\n    # This can be due to throttling, `max_job_runtime` configuration, or sidekiq restarting.\n  end\n\n  def on_complete\n    # Called when the job finished iterating.\n  end\n\n  # ...\nend\n```\n\n### Iterating over batches of Active Record objects\n\n```ruby\nclass BatchesJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(product_id, cursor:)\n    active_record_batches_enumerator(\n      Comment.where(product_id: product_id).select(:id),\n      cursor: cursor,\n      batch_size: 100,\n    )\n  end\n\n  def each_iteration(batch_of_comments, product_id)\n    comment_ids = batch_of_comments.map(\u0026:id)\n    CommentService.call(comment_ids: comment_ids)\n  end\nend\n```\n\n### Iterating over Active Record Relations\n\n```ruby\nclass RelationsJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(product_id, cursor:)\n    active_record_relations_enumerator(\n      Product.find(product_id).comments,\n      cursor: cursor,\n      batch_size: 100,\n    )\n  end\n\n  def each_iteration(comments_relation, product_id)\n    # comments_relation will be a Comment::ActiveRecord_Relation\n    comments_relation.update_all(deleted: true)\n  end\nend\n```\n\n### Iterating over arbitrary arrays\n\n```ruby\nclass ArrayJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(cursor:)\n    array_enumerator(['build', 'enumerator', 'from', 'any', 'array'], cursor: cursor)\n  end\n\n  def each_iteration(array_element)\n    # use array_element\n  end\nend\n```\n\n### Iterating over CSV\n\n```ruby\nclass CsvJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(import_id, cursor:)\n    import = Import.find(import_id)\n    csv_enumerator(import.csv, cursor: cursor)\n  end\n\n  def each_iteration(csv_row, import_id)\n    # insert csv_row to database\n  end\nend\n```\n\n### Nested iteration\n\n```ruby\nclass NestedIterationJob\n  include Sidekiq::Job\n  include SidekiqIteration::Iteration\n\n  def build_enumerator(cursor:)\n    nested_enumerator(\n      [\n        -\u003e(cursor) { active_record_records_enumerator(Shop.all, cursor: cursor) },\n        -\u003e(shop, cursor) { active_record_records_enumerator(shop.products, cursor: cursor) },\n        -\u003e(_shop, product, cursor) { active_record_relations_enumerator(product.product_variants, cursor: cursor) }\n      ],\n      cursor: cursor\n    )\n  end\n\n  def each_iteration(product_variants_relation)\n    # do something\n  end\nend\n```\n\n## Guides\n\n* [Iteration: how it works](guides/iteration-how-it-works.md)\n* [Job argument semantics](guides/argument-semantics.md)\n* [Best practices](guides/best-practices.md)\n* [Writing custom enumerator](guides/custom-enumerator.md)\n* [Throttling](guides/throttling.md)\n\nFor more detailed documentation, see [rubydoc](https://rubydoc.info/gems/sidekiq-iteration).\n\n## API\n\nIteration job must respond to `build_enumerator` and `each_iteration` methods. `build_enumerator` must return [`Enumerator`](https://ruby-doc.org/core-3.1.2/Enumerator.html) object that respects the `cursor` value.\n\n## FAQ\n\n**Advantages of this pattern over splitting a large job into many small jobs?**\n* Having one job is easier for redis in terms of memory, time and # of requests needed for enqueuing.\n* It simplifies sidekiq monitoring, because you have a predictable number of jobs in the queues, instead of having thousands of them at one time and millions at another. Also easier to navigate its web UI.\n* You can stop/pause/delete just one job, if something goes wrong. With many jobs it is harder and can take a long time, if it is critical to stop it right now.\n\n**Why can't I just iterate in `#perform` method and do whatever I want?** You can, but then your job has to comply with a long list of requirements, such as the ones above. This creates leaky abstractions more easily, when instead we can expose a more powerful abstraction for developers without exposing the underlying infrastructure.\n\n**What happens when my job is interrupted?** A checkpoint will be persisted to Redis after the current `each_iteration`, and the job will be re-enqueued. Once it's popped off the queue, the worker will work off from the next iteration.\n\n**What happens with retries?** An interruption of a job does not count as a retry. The iteration of job that caused the job to fail will be retried and progress will continue from there on.\n\n**What happens if my iteration takes a long time?** We recommend that a single `each_iteration` should take no longer than 30 seconds. In the future, this may raise an exception.\n\n**Why is it important that `each_iteration` takes less than 30 seconds?** When the job worker is scheduled for restart or shutdown, it gets a notice to finish remaining unit of work. To guarantee that no progress is lost we need to make sure that `each_iteration` completes within a reasonable amount of time.\n\n**What do I do if each iteration takes a long time, because it's doing nested operations?** If your `each_iteration` is complex, we recommend enqueuing another job, which will run your nested business logic. If `each_iteration` performs some other iterations, like iterating over child records, consider using [nested iterations](#nested-iteration).\n\n**My job has a complex flow. How do I write my own Enumerator?** See [the guide on Custom Enumerators](guides/custom-enumerator.md) for details.\n\n## Credits\n\nThanks to [`job-iteration` gem](https://github.com/Shopify/job-iteration) for the original implementation and inspiration.\n\n## Development\n\nAfter checking out the repo, run `bundle install` to install dependencies and start Redis. Run `bundle exec rake` to run the linter and tests. This project uses multiple Gemfiles to test against multiple versions of Sidekiq; you can run the tests against the specific version with `BUNDLE_GEMFILE=gemfiles/sidekiq_6.gemfile bundle exec rake test`.\n\nTo install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/fatkodima/sidekiq-iteration.\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffatkodima%2Fsidekiq-iteration","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffatkodima%2Fsidekiq-iteration","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffatkodima%2Fsidekiq-iteration/lists"}