Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/railsware/backfiller
The backfill machine for database records with null columns
https://github.com/railsware/backfiller
Last synced: about 1 month ago
JSON representation
The backfill machine for database records with null columns
- Host: GitHub
- URL: https://github.com/railsware/backfiller
- Owner: railsware
- Created: 2017-09-30T15:30:09.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-07T14:51:10.000Z (7 months ago)
- Last Synced: 2024-07-10T17:36:59.348Z (2 months ago)
- Language: Ruby
- Homepage:
- Size: 383 KB
- Stars: 18
- Watchers: 16
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: Changelog.md
Awesome Lists containing this project
README
![Backfill machine](https://railsware.github.io/backfiller/assets/backfill_machine.jpg)
# Backfiller [![Build Status](https://travis-ci.com/railsware/backfiller.svg?branch=master)](https://travis-ci.com/railsware/backfiller)
The backfill machine for null database columns.
This gem maybe handly for `no-downtime` deployment especially when you need to fill columns for table with huge amount for records without locking the table.## Typical no-downtime and non-locking cycle
* add migration that adds new column (null: true)
* deploy and run migration task
* deploy code that starts filling new column in corresponding flows
* add backfill task
* deploy and run backfill task
* [optional] add migration that invokes backfill task asn so keep all environments consistent (except production environment because we already backfilled data)
* add migration that disallow null values (null: false)
* deploy code that starts using new column## Concept
The idea is to prepare all data in selection method on database server and fetch it data using CURSOR feature and then build simple UPDATE queries.
With this way we minimize db server resources usage and we lock only one record (atomic update).
We use two connections to database:
* master - to creates cursor in transaction and fetch data in batches.
* worker - to execute small atomic update queries (no wrapper transaction)Even if backfill process crashes you may resolve issue and run it again to process remaining amount of data.
## Connection adapters
Curently it support next ActiveRecord connection adapters:
* PostgreSQL## Installation
Add this line to your application's Gemfile:
```ruby
gem 'backfiller'
```And then execute:
$ bundle
Or install it yourself as:
$ gem install backfiller
## Usage
Assume we want to backfill `profiles.name` column from `users.first_name`, `users.last_name` columns.
Create backfill task into `db/backfill/profile_name.rb` and defined required methods:
#### Single worker execution query
```ruby
class Backfill::ProfileNamedef select_sql
<<~SQL
SELECT
profile.id AS profile_id,
CONCAT(users.first_name, ' ', users.last_name) AS profile_name
FROM profiles
INNER JOIN users ON
users.id = profiles.user_id
WHERE
profiles.name IS NULL
SQL
enddef execute_sql(connection, row)
<<~SQL
UPDATE profiles SET
name = #{connection.quote(row['profile_name'])}
WHERE
id = #{connection.quote(row['profile_id'])}
SQL
endend
```#### Multiple worker execution queries
```ruby
class Backfill::ProfileNamedef select_sql
<<~SQL
SELECT
profile.id AS profile_id,
CONCAT(users.first_name, ' ', users.last_name) AS profile_name
FROM profiles
INNER JOIN users ON
users.id = profiles.user_id
WHERE
profiles.name IS NULL
SQL
enddef execute_sql(connection, row)
[
'BEGIN',
<<~SQL,
UPDATE profiles SET
name = #{connection.quote(row['profile_name'])}
WHERE
id = #{connection.quote(row['profile_id'])} AND
(SELECT pg_try_advisory_xact_lock(12345678)') = TRUE
SQL
'COMMIT'
]
endend
```
#### Custom row processing
```ruby
class Backfill::ProfileNamedef select_sql
<<~SQL
SELECT
profile.id AS profile_id,
CONCAT(users.first_name, ' ', users.last_name) AS profile_name
FROM profiles
INNER JOIN users ON
users.id = profiles.user_id
WHERE
profiles.name IS NULL
SQL
enddef process_row(connection, row)
connection.execute 'BEGIN'
if connection.select_value 'SELECT pg_try_advisory_xact_lock(12345678)'
connection.execute <<~SQL
INSERT INTO contacts(
full_name
)
VALUES(
#{connection.quote(row['profile_name'])},
)
SQL
end
connection.execute 'COMMIT'
endend
```
And then just run rake task:```bash
$ rails db:backfill[profile_name]
```## Configuration
For Rails application backfiller is initialized with next options
* task_directory: `RAILS_ROOT/db/backfill`
* task_namespace: `Backfill`
* batch_size: `1_000`
* cursor_threshold: `nil`
* connection_pool: `ApplicationRecord.connection_pool`
* logger: `ApplicationRecord.logger`You may change it globally via `config/initializers/backfiller.rb`:
```ruby
Backfiller.configure do |config|
config.foo = bar
end
```Or specify some options in certain backfill task
```ruby
class Backfill::Foo
def batch_size
100
enddef cursor_threshold
100_000
end
end
```## Authors
* [Andriy Yanko](http://ayanko.github.io)