Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/grantstreetgroup/dbix-batchchunker

Run large database changes safely
https://github.com/grantstreetgroup/dbix-batchchunker
Last synced: about 1 month ago
JSON representation
Run large database changes safely
Host: GitHub
URL: https://github.com/grantstreetgroup/dbix-batchchunker
Owner: GrantStreetGroup
License: other
Created: 2018-02-06T21:55:45.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2024-06-17T19:39:52.000Z (7 months ago)
Last Synced: 2024-06-21T21:03:42.607Z (7 months ago)
Language: Perl
Homepage:
Size: 191 KB
Stars: 2
Watchers: 17
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
Awesome Lists containing this project

README

        # NAME

DBIx::BatchChunker - Run large database changes safely

# VERSION

version v1.0.1

# SYNOPSIS

    use DBIx::BatchChunker;

    my $account_rs = $schema->resultset('Account')->search({

        account_type => 'deprecated',

    });

    my %params = (

        chunk_size  => 5000,

        target_time => 5,

        rs      => $account_rs,

        id_name => 'account_id',

        coderef => sub { $_[1]->delete },

        sleep   => 1,

        verbose => 1,

        progress_name    => 'Deleting deprecated accounts',

        process_past_max => 1,

    );

    # EITHER:

    # 1) Automatically construct and execute the changes:

    DBIx::BatchChunker->construct_and_execute(%params);

    # OR

    # 2) Manually construct and execute the changes:

    my $batch_chunker = DBIx::BatchChunker->new(%params);

    $batch_chunker->calculate_ranges;

    $batch_chunker->execute;

# DESCRIPTION

This utility class is for running a large batch of DB changes in a manner that doesn't

cause huge locks, outages, and missed transactions.  It's highly flexible to allow for

many different kinds of change operations, and dynamically adjusts chunks to its

workload.

It works by splitting up DB operations into smaller chunks within a loop.  These chunks

are transactionalized, either naturally as single-operation bulk work or by the loop

itself.  The full range is calculated beforehand to get the right start/end points.

A [progress bar](#progress-bar-attributes) will be created to let the deployer know the

processing status.

There are two ways to use this class: call the automatic constructor and executor

(["construct\_and\_execute"](#construct_and_execute)) or manually construct the object and call its methods. See

["SYNOPSIS"](#synopsis) for examples of both.

**DISCLAIMER:** You should not rely on this class to magically fix any and all locking

problems the DB might experience just because it's being used.  Thorough testing and

best practices are still required.

## Processing Modes

This class has several different modes of operation, depending on what was passed to

the constructor:

### DBIC Processing

If both ["rs"](#rs) and ["coderef"](#coderef) are passed, a chunk ResultSet is built from the base

ResultSet, to add in a `BETWEEN` clause, and the new ResultSet is passed into the

coderef.  The coderef should run some sort of active ResultSet operation from there.

An ["id\_name"](#id_name) should be provided, but if it is missing it will be looked up based on

the primary key of the ResultSource.

If ["single\_rows"](#single_rows) is also enabled, then each chunk is wrapped in a transaction and the

coderef is called for each row in the chunk.  In this case, the coderef is passed a

Result object instead of the chunk ResultSet.

Note that whether ["single\_rows"](#single_rows) is enabled or not, the coderef execution is encapsulated

in DBIC's retry logic, so any failures will re-connect and retry the coderef.  Because of

this, any changes you make within the coderef should be idempotent, or should at least be

able to skip over any already-processed rows.

### Active DBI Processing

If an ["stmt"](#stmt) (DBI statement handle args) is passed without a ["coderef"](#coderef), the statement

handle is merely executed on each iteration with the start and end IDs.  It is assumed

that the SQL for the statement handle contains exactly two placeholders for a `BETWEEN`

clause.  For example:

    my $update_stmt = q{

    UPDATE

        accounts a

        JOIN account_updates au USING (account_id)

    SET

        a.time_stamp = au.time_stamp

    WHERE

        a.account_id BETWEEN ? AND ? AND

        a.time_stamp != au.time_stamp

    });

The `BETWEEN` clause should, of course, match the IDs being used in the loop.

The statement is ran with ["dbi\_connector"](#dbi_connector) for retry protection.  Therefore, the

statement should also be idempotent.

### Query DBI Processing

If both a ["stmt"](#stmt) and a ["coderef"](#coderef) are passed, the statement handle is prepared and

executed.  Like the ["Active DBI Processing"](#active-dbi-processing) mode, the SQL for the statement should

contain exactly two placeholders for a `BETWEEN` clause.  Then the `$sth` is passed to

the coderef.  It's up to the coderef to extract data from the executed statement handle,

and do something with it.

If `single_rows` is enabled, each chunk is wrapped in a transaction and the coderef is

called for each row in the chunk.  In this case, the coderef is passed a hashref of the

row instead of the executed `$sth`, with lowercase alias names used as keys.

Note that in both cases, the coderef execution is encapsulated in a [DBIx::Connector::Retry](https://metacpan.org/pod/DBIx%3A%3AConnector%3A%3ARetry)

call to either `run` or `txn` (using ["dbi\_connector"](#dbi_connector)), so any failures will

re-connect and retry the coderef.  Because of this, any changes you make within the

coderef should be idempotent, or should at least be able to skip over any

already-processed rows.

### DIY Processing

If a ["coderef"](#coderef) is passed but neither a `stmt` nor a `rs` are passed, then the

multiplier loop does not touch the database.  The coderef is merely passed the start and

end IDs for each chunk.  It is expected that the coderef will run through all database

operations using those start and end points.

It's still valid to include ["min\_stmt"](#min_stmt), ["max\_stmt"](#max_stmt), and/or ["count\_stmt"](#count_stmt) in the

constructor to enable features like [max ID recalculation](#process_past_max) or

[chunk resizing](#min_chunk_percent).

If you're not going to include any min/max statements for ["calculate\_ranges"](#calculate_ranges), you will

need to set ["min\_id"](#min_id) and ["max\_id"](#max_id) yourself, either in the constructor or before the

["execute"](#execute) call.  Using ["construct\_and\_execute"](#construct_and_execute) is also not an option in this case, as

this tries to call ["calculate\_ranges"](#calculate_ranges) without a way to do so.

### TL;DR Version

    $stmt                             = Active DBI Processing

    $stmt + $coderef                  = Query DBI Processing  | $bc->$coderef($executed_sth)

    $stmt + $coderef + single_rows=>1 = Query DBI Processing  | $bc->$coderef($row_hashref)

    $rs   + $coderef                  = DBIC Processing       | $bc->$coderef($chunk_rs)

    $rs   + $coderef + single_rows=>1 = DBIC Processing       | $bc->$coderef($result)

            $coderef                  = DIY Processing        | $bc->$coderef($start, $end)

# ATTRIBUTES

See the ["METHODS"](#methods) section for more in-depth descriptions of these attributes and their

usage.

## DBIC Processing Attributes

### rs

A [DBIx::Class::ResultSet](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AResultSet).  This is used by all methods as the base ResultSet onto which

the DB changes will be applied.  Required for DBIC processing.

### rsc

A [DBIx::Class::ResultSetColumn](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AResultSetColumn).  This is only used to override ["rs"](#rs) for min/max

calculations.  Optional.

### count\_rs

A [DBIx::Class::ResultSet](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AResultSet), only used to override ["rs"](#rs) for row counting calculations.

For 99.9% of cases, you do not need to set this.  Though, it could be used for the rare

case where the original Resulset would run into indexing problems with its row counting

statement and needs something broader to compensate.

**WARNING:** Do not set this unless you know what you're doing.  Having a different `COUNT`

ResultSet from the base ResultSet means that the row counts to size up the chunk workload

will be different from the workload itself.  If the row counts are too high, you may end

up with workloads that are too quick and [runtime targeting](#target_time) may compensate

with an overly large chunk size, anyway.  If the row counts are too low, you risk having

a oversized chunk that gets processed and locks rows for too long, and chunk resizing may

even skip blocks that it thinks have no rows to process.

### dbic\_retry\_opts

A hashref of DBIC retry options.  These options control how retry protection works within

DBIC.  So far, there are two supported options:

    max_attempts  = Number of times to retry

    retry_handler = Coderef that returns true to continue to retry or false to re-throw

                    the last exception

The default is to let the DBIC storage engine handle its own protection, which will retry

once if the DB connection was disconnected.  If you specify any options, even a blank

hashref, BatchChunker will fill in a default `max_attempts` of 10, and an always-true

`retry_handler`.  This is similar to [DBIx::Connector::Retry](https://metacpan.org/pod/DBIx%3A%3AConnector%3A%3ARetry)'s defaults.

Under the hood, these are options that are passed to the as-yet-undocumented

[DBIx::Class::Storage::BlockRunner](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AStorage%3A%3ABlockRunner).  The `retry_handler` has access to the same

BlockRunner object (passed as its only argument) and its methods/accessors, such as `storage`,

`failed_attempt_count`, and `last_exception`.

## DBI Processing Attributes

### dbi\_connector

A [DBIx::Connector::Retry](https://metacpan.org/pod/DBIx%3A%3AConnector%3A%3ARetry) object.  Instead of [DBI](https://metacpan.org/pod/DBI) statement handles, this is the

recommended way for BatchChunker to interface with the DBI, as it handles retries on

failures.  The connection mode used is whatever default is set within the object.

Required for DBI Processing, unless ["dbic\_storage"](#dbic_storage) is specified.

### dbic\_storage

A DBIC storage object, as an alternative for ["dbi\_connector"](#dbi_connector).  There may be times when

you want to run plain DBI statements, but are still using DBIC.  In these cases, you

don't have to create a [DBIx::Connector::Retry](https://metacpan.org/pod/DBIx%3A%3AConnector%3A%3ARetry) object to run those statements.

This uses a BlockRunner object for retry protection, so the options in

["dbic\_retry\_opts"](#dbic_retry_opts) would apply here.

Required for DBI Processing, unless ["dbi\_connector"](#dbi_connector) is specified.

### min\_stmt

### max\_stmt

SQL statement strings or an arrayref of parameters for ["selectrow\_array" in DBI](https://metacpan.org/pod/DBI#selectrow_array).

When executed, these statements should each return a single value, either the minimum or

maximum ID that will be affected by the DB changes.  These are used by

["calculate\_ranges"](#calculate_ranges).  Required if using either type of DBI Processing.

### stmt

A SQL statement string or an arrayref of parameters for ["prepare" in DBI](https://metacpan.org/pod/DBI#prepare) + binds.

If using ["Active DBI Processing"](#active-dbi-processing) (no coderef), this is a [do-able](https://metacpan.org/pod/DBI#do) statement

(usually DML like `INSERT/UPDATE/DELETE`).  If using ["Query DBI Processing"](#query-dbi-processing) (with

coderef), this is a passive DQL (`SELECT`) statement.

In either case, the statement should contain `BETWEEN` placeholders, which will be

executed with the start/end ID points.  If there are already bind placeholders in the

arrayref, then make sure the `BETWEEN` bind points are last on the list.

Required for DBI Processing.

### count\_stmt

A `SELECT COUNT` SQL statement string or an arrayref of parameters for

["selectrow\_array" in DBI](https://metacpan.org/pod/DBI#selectrow_array).

Like ["stmt"](#stmt), it should contain `BETWEEN` placeholders.  In fact, the SQL should look

exactly like the ["stmt"](#stmt) query, except with `COUNT(*)` instead of the column list.

Used only for ["Query DBI Processing"](#query-dbi-processing).  Optional, but recommended for

[chunk resizing](#min_chunk_percent).

## Progress Bar Attributes

### progress\_bar

The progress bar used for all methods.  This can be specified right before the method

call to override the default used for that method.  Unlike most attributes, this one

is read-write, so it can be switched on-the-fly.

Don't forget to remove or switch to a different progress bar if you want to use a

different one for another method:

    $batch_chunker->progress_bar( $calc_pb );

    $batch_chunker->calculate_ranges;

    $batch_chunker->progress_bar( $loop_pb );

    $batch_chunker->execute;

All of this is optional.  If the progress bar isn't specified, the method will create

a default one.  If the terminal isn't interactive, the default [Term::ProgressBar](https://metacpan.org/pod/Term%3A%3AProgressBar) will

be set to `silent` to naturally skip the output.

### progress\_name

A string used by ["execute"](#execute) to assist in creating a progress bar.  Ignored if

["progress\_bar"](#progress_bar) is already specified.

This is the preferred way of customizing the progress bar without having to create one

from scratch.

### cldr

A [CLDR::Number](https://metacpan.org/pod/CLDR%3A%3ANumber) object.  English speakers that use a typical `1,234.56` format would

probably want to leave it at the default.  Otherwise, you should provide your own.

### verbose

Boolean.  By default, this is on, which displays timing stats on each chunk, as well as

total numbers.  This is still subject to non-interactivity checks from ["progress\_bar"](#progress_bar).

(This was previously defaulted to off, and called `debug`, prior to v1.0.0.)

## Common Attributes

### id\_name

The column name used as the iterator in the processing loops.  This should be a primary

key or integer-based (indexed) key, tied to the [resultset](#rs).

Optional.  Used mainly in DBIC processing.  If not specified, it will look up

the first primary key column from ["rs"](#rs) and use that.

This can still be specified for other processing modes to use in progress bars.

### coderef

The coderef that will be called either on each chunk or each row, depending on how

["single\_rows"](#single_rows) is set.  The first input is always the BatchChunker object.  The rest

vary depending on the processing mode:

    $stmt + $coderef                  = Query DBI Processing  | $bc->$coderef($executed_sth)

    $stmt + $coderef + single_rows=>1 = Query DBI Processing  | $bc->$coderef($row_hashref)

    $rs   + $coderef                  = DBIC Processing       | $bc->$coderef($chunk_rs)

    $rs   + $coderef + single_rows=>1 = DBIC Processing       | $bc->$coderef($result)

            $coderef                  = DIY Processing        | $bc->$coderef($start, $end)

The loop does not monitor the return values from the coderef.

Required for all processing modes except ["Active DBI Processing"](#active-dbi-processing).

### chunk\_size

The amount of rows to be processed in each loop.

This figure should be sized to keep per-chunk processing time at around 5 seconds.  If

this is too large, rows may lock for too long.  If it's too small, processing may be

unnecessarily slow.

Default is 1 row, which is only appropriate if ["target\_time"](#target_time) (on by default) is

enabled.  This will cause the processing to slowly ramp up to the target time as

BatchChunker gathers more data.

Otherwise, if you using static chunk sizes with `target_time` turned off, figure out

the right chunk size with a few test runs and set it here.

(This was previously defaulted to 1000 rows, prior to v1.0.0.)

### target\_time

The target runtime (in seconds) that chunk processing should strive to achieve, not

including ["sleep"](#sleep).  If the chunk processing times are too high or too low, this will

dynamically adjust ["chunk\_size"](#chunk_size) to try to match the target.

BatchChunker will still use the initial `chunk_size`, and it will need at least one

chunk processed, before it makes adjustments.  If the starting chunk size is grossly

inaccurate to the workload, you could end up with several chunks in the beginning causing

long-lasting locks before the runtime targeting reduces them down to a reasonable size.

(Chunk size reductions are prioritized before increases, so it should re-size as soon as

it finds the problem.  But, one bad chunk could be all it takes to cause an outage.)

Default is 5 seconds.  Set this to zero to turn off runtime targeting.  (This was

previously defaulted to off prior to v0.92, and set to 15 in v0.92.)

### sleep

The number of seconds to sleep after each chunk.  It uses [Time::HiRes](https://metacpan.org/pod/Time%3A%3AHiRes)'s version, so

fractional numbers are allowed.

Default is 0.5 seconds, which is fine for most operations.  You can likely get away with

zero for smaller operations, but test it out first.  If processing is going to take up a

lot of disk I/O, you may want to consider a higher setting.  If the database server

spends too much time on processing, the replicas may have a hard time keeping up with

standard load.

This will increase the overall processing time of the loop, so try to find a balance

between the two.

(This was previously defaulted to 0 seconds, prior to v1.0.0.)

### max\_runtime

The number of seconds that the entire process is allowed to run.  If you have a

long-running _and idempotent_ operation that you don't want to run for days, you can set

this attribute, execute the operation, and run it again at a later date.  The ["min\_id"](#min_id)

will be set to the last-used ID, so the operation can be continued with another

["execute"](#execute) call.  Or you can use this to figure out if it finished or not.

Turned off by default.  If you use this, you should add in extra multipler operations to

separate out the time math, like `6 * 60 * 60` for 6 hours.

### process\_past\_max

Boolean that controls whether to check past the ["max\_id"](#max_id) during the loop.  If the loop

hits the end point, it will run another maximum ID check in the DB, and adjust `max_id`

accordingly.  If it somehow cannot run a DB check (no ["rs"](#rs) or ["max\_stmt"](#max_stmt) available,

for example), the last chunk will just be one at the end of `max_id + chunk_size`.

This is useful if the entire table is expected to be processed, and you don't want to

miss any new rows that come up between ["calculate\_ranges"](#calculate_ranges) and the end of the loop.

Turned off by default.

### single\_rows

Boolean that controls whether single rows are passed to the ["coderef"](#coderef) or the chunk's

ResultSets/statement handle is passed.

Since running single-row operations in a DB is painfully slow (compared to bulk

operations), this also controls whether the entire set of coderefs are encapsulated into

a DB transaction.  Transactionalizing the entire chunk brings the speed, and atomicity,

back to what a bulk operation would be.  (Bulk operations are still faster, but you can't

do anything you want in a single DML statement.)

Used only by ["DBIC Processing"](#dbic-processing) and ["Query DBI Processing"](#query-dbi-processing).

### min\_chunk\_percent

The minimum row count, as a percentage of ["chunk\_size"](#chunk_size).  This value is actually

expressed in decimal form, i.e.: between 0 and 1.

This value will be used to determine when to process, skip, or expand a block, based on

a count query.  The default is `0.5` or 50%, which means that it will try to expand the

block to a larger size if the row count is less than 50% of the chunk size.  Zero-sized

blocks will be skipped entirely.

This "chunk resizing" is useful for large regions of the table that have been deleted, or

when the incrementing ID has large gaps in it for other reasons.  Wasting time on

numerical gaps that span millions can slow down the processing considerably, especially

if ["sleep"](#sleep) is enabled.

If this needs to be disabled, set this to 0.  The maximum chunk percentage does not have

a setting and is hard-coded at `100% + min_chunk_percent`.

If DBIC processing isn't used, ["count\_stmt"](#count_stmt) is also required to enable chunk resizing.

### min\_id

### max\_id

Used by ["execute"](#execute) to figure out the main start and end points.  Calculated by

["calculate\_ranges"](#calculate_ranges).

Manually setting this is not recommended, as each database is different and the

information may have changed between the DB change development and deployment.  Instead,

use ["calculate\_ranges"](#calculate_ranges) to fill in these values right before running the loop.

When the operation is finished, `min_id` will be set to the last processed ID, just in

case it was stopped early and needs to be restarted (eg: ["max\_runtime"](#max_runtime) is set).

Alternately, you can run ["calculate\_ranges"](#calculate_ranges) again to confirm from the database.

### loop\_state

A [DBIx::BatchChunker::LoopState](https://metacpan.org/pod/DBIx%3A%3ABatchChunker%3A%3ALoopState) object designed to hold variables during the

processing loop.  The object will be cleared out after use.  Most of the complexity is

needed for chunk resizing.

# CONSTRUCTORS

See ["ATTRIBUTES"](#attributes) for information on what can be passed into these constructors.

## new

    my $batch_chunker = DBIx::BatchChunker->new(...);

A standard object constructor. If you use this constructor, you will need to

manually call ["calculate\_ranges"](#calculate_ranges) and ["execute"](#execute) to execute the DB changes.

## construct\_and\_execute

    my $batch_chunker = DBIx::BatchChunker->construct_and_execute(...);

Constructs a DBIx::BatchChunker object and automatically calls

["calculate\_ranges"](#calculate_ranges) and ["execute"](#execute) on it. Anything passed to this method will be passed

through to the constructor.

Returns the constructed object, post-execution.  This is typically only useful if you want

to inspect the attributes after the process has finished.  Otherwise, it's safe to just

ignore the return and throw away the object immediately.

# METHODS

## calculate\_ranges

    my $batch_chunker = DBIx::BatchChunker->new(

        rsc      => $account_rsc,    # a ResultSetColumn

        ### OR ###

        rs       => $account_rs,     # a ResultSet

        id_name  => 'account_id',    # can be looked up if not provided

        ### OR ###

        dbi_connector => $conn,      # DBIx::Connector::Retry object

        min_stmt      => $min_stmt,  # a SQL statement or DBI $sth args

        max_stmt      => $max_stmt,  # ditto

        ### Optional but recommended ###

        id_name      => 'account_id',  # will also be added into the progress bar title

        chunk_size   => 20_000,        # default is 1000

        ### Optional ###

        progress_bar => $progress,     # defaults to a 2-count 'Calculating ranges' bar

        # ...other attributes for execute...

    );

    my $has_data_to_process = $batch_chunker->calculate_ranges;

Given a [DBIx::Class::ResultSetColumn](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AResultSetColumn), [DBIx::Class::ResultSet](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3AResultSet), or [DBI](https://metacpan.org/pod/DBI) statement

argument set, this method calculates the min/max IDs of those objects.  It fills in the

["min\_id"](#min_id) and ["max\_id"](#max_id) attributes, based on the ID data, and then returns 1.

If either of the min/max statements don't return any ID data, this method will return 0.

## execute

    my $batch_chunker = DBIx::BatchChunker->new(

        # ...other attributes for calculate_ranges...

        dbi_connector => $conn,          # DBIx::Connector::Retry object

        stmt          => $do_stmt,       # INSERT/UPDATE/DELETE $stmt with BETWEEN placeholders

        ### OR ###

        dbi_connector => $conn,          # DBIx::Connector::Retry object

        stmt          => $select_stmt,   # SELECT $stmt with BETWEEN placeholders

        count_stmt    => $count_stmt,    # SELECT COUNT $stmt to be used for min_chunk_percent; optional

        coderef       => $coderef,       # called code that does the actual work

        ### OR ###

        rs      => $account_rs,          # base ResultSet, which gets filtered with -between later on

        id_name => 'account_id',         # can be looked up if not provided

        coderef => $coderef,             # called code that does the actual work

        ### OR ###

        coderef => $coderef,             # DIY database work; just pass the $start/$end IDs

        ### Optional but recommended ###

        sleep             => 0.25, # number of seconds to sleep each chunk; defaults to 0

        process_past_max  => 1,    # use this if processing the whole table

        single_rows       => 1,    # does $coderef get a single $row or the whole $chunk_rs / $stmt

        min_chunk_percent => 0.25, # minimum row count of chunk size percentage; defaults to 0.5 (or 50%)

        target_time       => 5,    # target runtime for dynamic chunk size scaling; default is 5 seconds

        max_runtime       => 12 * 60 * 60, # stop processing after 12 hours

        progress_name => 'Updating Accounts',  # easier than creating your own progress_bar

        ### Optional ###

        progress_bar     => $progress,  # defaults to "Processing $source_name" bar

        verbose          => 1,          # displays timing stats on each chunk

    );

    $batch_chunker->execute if $batch_chunker->calculate_ranges;

Applies the configured DB changes in chunks.  Runs through the loop, processing a

statement handle, ResultSet, and/or coderef as it goes.  Each loop iteration processes a

chunk of work, determined by ["chunk\_size"](#chunk_size).

The ["calculate\_ranges"](#calculate_ranges) method should be run first to fill in ["min\_id"](#min_id) and ["max\_id"](#max_id).

If either of these are missing, the function will assume ["calculate\_ranges"](#calculate_ranges) couldn't

find them and warn about it.

More details can be found in the ["Processing Modes"](#processing-modes) and ["ATTRIBUTES"](#attributes) sections.

# PRIVATE METHODS

## \_process\_block

Runs the DB work and passes it to the coderef.  Its return value determines whether the

block should be processed or not.

## \_process\_past\_max\_checker

Checks to make sure the current endpoint is actually the end, by checking the database.

Its return value determines whether the block should be processed or not.

See ["process\_past\_max"](#process_past_max).

## \_chunk\_count\_checker

Checks the chunk count to make sure it's properly sized.  If not, it will try to shrink

or expand the current chunk (in `chunk_size` increments) as necessary.  Its return value

determines whether the block should be processed or not.

See ["min\_chunk\_percent"](#min_chunk_percent).

This is not to be confused with the ["\_runtime\_checker"](#_runtime_checker), which adjusts `chunk_size`

after processing, based on previous run times.

## \_runtime\_checker

Stores the previously processed chunk's runtime, and then adjusts `chunk_size` as

necessary.

See ["target\_time"](#target_time).

## \_increment\_progress

Increments the progress bar.

## \_print\_chunk\_status

Prints out a standard chunk status line, if ["verbose"](#verbose) is enabled.  What it prints is

generally uniform, but it depends on the processing action.  Most of the data is

pulled from ["loop\_state"](#loop_state).

# CAVEATS

## Big Number Support

If the module detects that the ID numbers are no longer safe for standard Perl NV

storage, it will automatically switch to using [Math::BigInt](https://metacpan.org/pod/Math%3A%3ABigInt) and [Math::BigFloat](https://metacpan.org/pod/Math%3A%3ABigFloat) for

big number support.  If any blessed numbers are already being used to define the

attributes, this will also switch on the support.

## String-based IDs

If you're working with `VARCHAR` types or other string-based IDs to represent integers,

these may be subject to whatever string-based comparison rules your RDBMS uses when

calculating with `MIN`/`MAX` or using `BETWEEN`.  Row counting and chunk size scaling

will try to compensate, but will be mixing string-based comparisons from the RDBMS and

Perl-based integer math.

Using the `CAST` function may help, but it may also cause critical indexes to be

ignored, especially if the function is used on the left-hand side against the column.

Strings with the exact same length may be safe from comparison weirdness, but YMMV.

Non-integer inputs from ID columns, such as GUIDs or other alphanumeric strings, are not

currently supported.  They would have to be converted to integers via SQL, and doing so

may run into a similar risk of having your RDBMS ignore indexes.

# SEE ALSO

[DBIx::BulkLoader::Mysql](https://metacpan.org/pod/DBIx%3A%3ABulkLoader%3A%3AMysql), [DBIx::Class::BatchUpdate](https://metacpan.org/pod/DBIx%3A%3AClass%3A%3ABatchUpdate), [DBIx::BulkUtil](https://metacpan.org/pod/DBIx%3A%3ABulkUtil)

# AUTHOR

Grant Street Group 

# COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 - 2023 by Grant Street Group.

This is free software, licensed under:

    The Artistic License 2.0 (GPL Compatible)