{"id":13879556,"url":"https://github.com/railsware/backfiller","last_synced_at":"2025-04-22T23:07:54.114Z","repository":{"id":62553934,"uuid":"105379579","full_name":"railsware/backfiller","owner":"railsware","description":"The backfill machine for database records with null columns","archived":false,"fork":false,"pushed_at":"2024-03-07T14:51:10.000Z","size":392,"stargazers_count":19,"open_issues_count":0,"forks_count":4,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-22T23:07:48.185Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/railsware.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-30T15:30:09.000Z","updated_at":"2025-04-08T07:38:00.000Z","dependencies_parsed_at":"2024-11-08T19:10:53.362Z","dependency_job_id":"3bfb93aa-d970-47f6-9819-60ccd3af3e53","html_url":"https://github.com/railsware/backfiller","commit_stats":{"total_commits":25,"total_committers":4,"mean_commits":6.25,"dds":"0.16000000000000003","last_synced_commit":"298ec76cf27e23da420c87089d8512df0ccb2441"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railsware%2Fbackfiller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railsware%2Fbackfiller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railsware%2Fbackfiller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railsware%2Fbackfiller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/railsware","download_url":"https://codeload.github.com/railsware/backfiller/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250337947,"owners_count":21414104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-06T08:02:25.138Z","updated_at":"2025-04-22T23:07:54.098Z","avatar_url":"https://github.com/railsware.png","language":"Ruby","funding_links":[],"categories":["Ruby"],"sub_categories":[],"readme":"![Backfill machine](https://railsware.github.io/backfiller/assets/backfill_machine.jpg)\n\n# Backfiller [![Build Status](https://travis-ci.com/railsware/backfiller.svg?branch=master)](https://travis-ci.com/railsware/backfiller)\n\nThe backfill machine for null database columns.\nThis gem maybe handly for `no-downtime` deployment especially when you need to fill columns for table with huge amount for records without locking the table.\n\n## Typical no-downtime and non-locking cycle\n\n* add migration that adds new column (null: true)\n* deploy and run migration task\n* deploy code that starts filling new column in corresponding flows\n* add backfill task\n* deploy and run backfill task\n* [optional] add migration that invokes backfill task asn so keep all environments consistent (except production environment because we already backfilled data)\n* add migration that disallow null values (null: false)\n* deploy code that starts using new column\n\n## Concept\n\nThe idea is to prepare all data in selection method on database server and fetch it data using CURSOR feature and then build simple UPDATE queries.\nWith this way we minimize db server resources usage and we lock only one record (atomic update).\nWe use two connections to database:\n* master - to creates cursor in transaction and fetch data in batches.\n* worker - to execute small atomic update queries (no wrapper transaction)\n\nEven if backfill process crashes you may resolve issue and run it again to process remaining amount of data.\n\n## Connection adapters\n\nCurently it support next ActiveRecord connection adapters:\n* PostgreSQL\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'backfiller'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install backfiller\n\n## Usage\n\nAssume we want to backfill `profiles.name` column from `users.first_name`, `users.last_name` columns.\n\nCreate backfill task into `db/backfill/profile_name.rb` and defined required methods:\n\n#### Single worker execution query\n\n```ruby\nclass Backfill::ProfileName\n\n  def select_sql\n    \u003c\u003c~SQL\n      SELECT\n        profile.id AS profile_id,\n        CONCAT(users.first_name, ' ', users.last_name) AS profile_name\n      FROM profiles\n      INNER JOIN users ON\n        users.id = profiles.user_id\n      WHERE\n        profiles.name IS NULL\n    SQL\n  end\n\n  def execute_sql(connection, row)\n    \u003c\u003c~SQL\n      UPDATE profiles SET\n        name = #{connection.quote(row['profile_name'])}\n      WHERE\n       id = #{connection.quote(row['profile_id'])}\n    SQL\n  end\n\nend\n```\n\n#### Multiple worker execution queries\n\n```ruby\nclass Backfill::ProfileName\n\n  def select_sql\n    \u003c\u003c~SQL\n      SELECT\n        profile.id AS profile_id,\n        CONCAT(users.first_name, ' ', users.last_name) AS profile_name\n      FROM profiles\n      INNER JOIN users ON\n        users.id = profiles.user_id\n      WHERE\n        profiles.name IS NULL\n    SQL\n  end\n\n  def execute_sql(connection, row)\n    [\n      'BEGIN',\n      \u003c\u003c~SQL,\n        UPDATE profiles SET\n          name = #{connection.quote(row['profile_name'])}\n        WHERE\n         id = #{connection.quote(row['profile_id'])} AND\n        (SELECT pg_try_advisory_xact_lock(12345678)') = TRUE\n      SQL\n      'COMMIT'\n    ]\n  end\n\nend\n\n```\n\n#### Custom row processing\n\n```ruby\nclass Backfill::ProfileName\n\n  def select_sql\n    \u003c\u003c~SQL\n      SELECT\n        profile.id AS profile_id,\n        CONCAT(users.first_name, ' ', users.last_name) AS profile_name\n      FROM profiles\n      INNER JOIN users ON\n        users.id = profiles.user_id\n      WHERE\n        profiles.name IS NULL\n    SQL\n  end\n\n  def process_row(connection, row)\n    connection.execute 'BEGIN'\n    if connection.select_value 'SELECT pg_try_advisory_xact_lock(12345678)'\n      connection.execute \u003c\u003c~SQL\n        INSERT INTO contacts(\n          full_name\n        )\n        VALUES(\n          #{connection.quote(row['profile_name'])},\n        )\n      SQL\n    end\n    connection.execute 'COMMIT'\n  end\n\nend\n\n```\nAnd then just run rake task:\n\n```bash\n$ rails db:backfill[profile_name]\n```\n\n## Configuration\n\nFor Rails application backfiller is initialized with next options\n\n* task_directory: `RAILS_ROOT/db/backfill`\n* task_namespace: `Backfill`\n* batch_size: `1_000`\n* cursor_threshold: `nil`\n* connection_pool: `ApplicationRecord.connection_pool`\n* logger: `ApplicationRecord.logger`\n\nYou may change it globally via `config/initializers/backfiller.rb`:\n\n```ruby\nBackfiller.configure do |config|\n  config.foo = bar\nend\n```\n\nOr specify some options in certain backfill task\n\n```ruby\nclass Backfill::Foo\n  def batch_size\n    100\n  end\n\n  def cursor_threshold\n    100_000\n  end\nend\n```\n\n## Authors\n\n* [Andriy Yanko](http://ayanko.github.io)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frailsware%2Fbackfiller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frailsware%2Fbackfiller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frailsware%2Fbackfiller/lists"}