https://github.com/aiven-open/myhoard

MySQL Backup and Point-in-time Recovery service
https://github.com/aiven-open/myhoard

backup cloud-object-storage mysql pitr restore

Last synced: 7 months ago
JSON representation

MySQL Backup and Point-in-time Recovery service

Host: GitHub
URL: https://github.com/aiven-open/myhoard
Owner: Aiven-Open
License: apache-2.0
Created: 2019-05-27T08:18:04.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2025-03-27T15:27:32.000Z (7 months ago)
Last Synced: 2025-03-29T09:11:39.443Z (7 months ago)
Topics: backup, cloud-object-storage, mysql, pitr, restore
Language: Python
Size: 914 KB
Stars: 95
Watchers: 64
Forks: 20
Open Issues: 15
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md

Awesome Lists containing this project

README

# MyHoard [![Build Status](https://github.com/aiven/myhoard/workflows/Build%20MyHoard/badge.svg?branch=master)](https://github.com/aiven/myhoard/actions) [![Codecov](https://codecov.io/gh/aiven/myhoard/branch/main/graph/badge.svg?token=nLr7M7hvCx)](https://codecov.io/gh/aiven/myhoard)

MyHoard is a daemon for creating, managing and restoring MySQL backups.
The backup data can be stored in any of the supported cloud object storages.
It is functionally similar to [pghoard](https://github.com/aiven/pghoard)
backup daemon for PostgreSQL.

# Features

- Automatic periodic full backup
- Automatic binary log backup in near real-time
- Cloud object storage support (AWS S3, Google Cloud Storage, Azure)
- Encryption and compression
- Backup restoration from object storage
- Point-in-time-recovery (PITR)
- Automatic backup history cleanup based on number of backups and/or backup age
- Purging local binary logs once they're backed up and not needed by other
MySQL servers (requires external system to provide executed GTID info for the
standby servers)
- Almost no extra local disk space requirements for creating and restoring
backups
- Incremental backups

Fault-resilience and monitoring:

- Handles temporary object storage connectivity issues by retrying all
operations
- Metrics via statsd using [Telegraf® tag extensions](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/statsd)
- Unexpected exception reporting via Sentry
- State reporting via HTTP API
- Full internal state stored on local file system to cope with process and
server restarts

# Overview

There are a number existing tools and scripts for managing MySQL backups so
why have yet another tool? As far as taking a full (or incremental) snapshot of
MySQL goes, [Percona XtraBackup](https://www.percona.com) does a very
good job and is in fact what MyHoard is using internally as well. Where things usually get
more complicated is when you want to back up and restore binary logs so that
you can do point-in-time recovery and reduce data loss window. Also, as good
as Percona XtraBackup is for taking and restoring the backup you still need
all sorts of scripts and timers added around it to actually execute it and if
anything goes wrong, e.g. because of network issues, it's up to you to retry.

Often binary log backup is based on just uploading the binary log files using
some simple scheduled file copying mechanism and restoring them is left as an
afterthought, usually just comprising of "download all the binlogs and then
use mysqlbinlog to replay them". In addition to not having proper automation
to do this to ensure it is repeatable and safe this approach also does not work
in some cases: In order for binary log restoration with mysqlbinlog to be safe
you need to have all binary logs on local disk. For change heavy environments
this may be much more than the size of the actual database and if server disk
is adjusted based on the database size the binary logs may simply not fit on
the disk.

MyHoard uses an alternative approach for binary log restoration, which is based
on presenting the backed up binary logs as relay logs in batches via direct
relay log index manipulation and having the regular SQL slave thread apply them
as if they were replicated from another MySQL server. This allows applying them
in batches so there's very little extra disk space required during restoration
and this would also allow applying them in parallel (though that requires more
work, currently there are known issues with using `slave-parallel-workers`
value other than 0, i.e. multithreading must currently be disabled).

Existing tooling also doesn't pay much attention to real life HA environments
and failovers where the backup responsibilities need to be switched from one
server to another and getting uninterrupted sequence of backed up transactions
that can be restored to any point in time, including the time around the
failover. This requires something much more sophisticated than just blindly
uploading all local binary logs.

MyHoard aims to provide a single solution daemon that takes care of all of your
MySQL backup and restore needs. It handles creating, managing and restoring
backups in multi-node setups where master nodes may frequently be going away
(either because of rolling forward updates or actual server failure). You just
need to create a fairly simple configuration file, start the systemd service on
the master and any standby servers and make one or two HTTP requests to get the
daemon into correct state and it will start automatically doing the right
things.

# Basic usage

On the very first master after you've initialized MySQL database and started up
MyHoard you'd do this:

```
curl -XPUT -H "Content-Type: application/json" -d '{"mode": "active"}' \
http://localhost:16001/status
```

This tells MyHoard to switch to active mode where it starts backing up data on
this server. If there are no existing backups it will immediately create the
first one.

On a new standby server you'd first install MySQL and MyHoard but not start or
initialize MySQL (i.e. don't do `mysqld --initialize`). After starting the
MyHoard service you'd do this:

```
curl http://localhost:16001/backup # lists all available backups
curl -XPUT -H "Content-Type: application/json" \
-d '{"mode": "restore", "site": "mybackups", "stream_id": "backup_id", "target_time": null}' \
http://localhost:16001/status
```

This tells MyHoard to fetch the given backup, restore it, start the MySQL
server once finished, and switch to observe mode where it keeps on observing
what backups are available and what transactions have been backed up but
doesn't do any backups itself. Because binary logging is expected to be
enabled also on the standby server MyHoard does take care of purging any local
binary logs that contain only transactions that have been backed up. If you
wanted to restore to a specific point in time you'd just give a timestamp like
`"2019-05-22T11:19:02Z"` and restoration will be performed up until the last
transaction before the target time.

If the master server fails for any reason you'd do this on one of the standby
servers:

```
curl -XPUT -H "Content-Type: application/json" -d '{"mode": "promote"}' \
http://localhost:16001/status
```

This updates the object storage to indicate this server is now the master and
any updates from the old master should be ignored by any other MyHoard
instances. (The old master could still be alive at this point but e.g.
responding so slowly that it is considered to be unavailable yet it might be
able to accept writes and back those up before going totally away and those
transactions must be ignored when restoring backups in the future because they
have not been replicated to the new master server.) After the initial object
storage state update is complete MyHoard switches itself to `active` mode and
resumes uploading binary logs to the currently active backup stream starting
from the first binary log that contains transactions that have not yet been
backed up.

# Requirements

MyHoard requires Python 3.10 or later and some additional components to operate:

- [percona-xtrabackup](https://github.com/percona/percona-xtrabackup)
- [python3-PyMySQL](https://github.com/PyMySQL/PyMySQL)
- [python3-rohmu](https://github.com/aiven/rohmu)
- [MySQL server 8.x+](https://www.oracle.com/mysql/)

Currently MyHoard only works on Linux and expects MySQL service to be managed
via systemd.

MyHoard requires MySQL to be used and configured in a specific manner in order
for it to work properly:

- Single writable master, N read only standbys
- Binary logging enabled both on master and on standbys
- `binlog_format` set to `ROW`
- Global transaction identifiers (GTIDs) enabled
- Use of only InnoDB databases

# Configuration options

`myhoard.json` has an example configuration that shows the structure of the
config file and has reasonable default values for many of the settings. Below
is full list the settings and the effect of each.

**backup_settings.backup_age_days_max**

Maximum age of backups. Any backup that has been closed (marked as final with
no more binary logs being uploaded to it) more than this number of days ago
will be deleted from storage, unless total number of backups is below the
minimum number of backups.

**backup_settings.backup_count_max**

Maximum number of backups to keep. Because new backups can be requested
manually it is possible to end up with a large number backups. If the total
number goes above this backups will be deleted even if they are not older than
`backup_age_days_max` days.

**backup_settings.backup_count_min**

Minimum number of backups to keep. If for example the server is powered off and
then back on a month later, all existing backups would be very old. However,
in that case it is usually not desirable to immediately delete all old backups.
This setting allows specifying a minimum number of backups that should always
be preserved regardless of their age.

**backup_hour**

The hour of day at which to take new full backup. If backup interval is less
than 24 hours this is used as base for calculating the backup times. E.g. if
backup interval was 6 hours and backup hour was 4, backups would be taken at
hours 4, 10, 16 and 22.

**backup_minute**
The minute of hour at which to take new full backup.

**backup_interval_minutes**

The interval in minutes at which to take new backups. Individual binary logs
are backed up as soon as they're created so there's usually no need to have
very frequent full backups.
Note: If this value is not does not have a factor of 1440 (1 day) then the backup_hour and backup_minute
settings cannot be changed once the first backup has been taken, as having a cycle not as a multiple of days
means that the hour and minute of the backup will not be the same each day.

**forced_binlog_rotation_interval**

How frequently, in seconds, to force creation of new binary log if one hasn't
been created otherwise. This setting ensures that environments with low rate of
changes so that new binary logs are not created because the size limit is
exceeded also get all data backed up frequently enough.

**upload_site**

Name of the backup site to which new backups should be created to. See
`backup_sites` for more information. Only needs to be defined if multiple
non-recovery backup sites are present.

**incremental.enabled**

Boolean setting, which controls periodic incremental backups, according to the schedule
`incremental.full_backup_week_schedule`

**incremental.full_backup_week_schedule**

A string of comma-separated days of the week: `mon`, `tue`, `wed`, `thu`, `fri`, `sat`, `sun`.
Defines on which days full backups should be taken, thus other days incremental backups will be done.
E.g. if the value of this setting is `sun,wed`, then full backup is taken on Sundays and Wednesdays, other days of the
week incremental backups will be scheduled. Requires `"incremental.enabled": true`.

**backup_sites**

Object storage configurations and encryption keys. This is an object with
`"site_name": {}` entries. Typically there is only a single
backup site but in cases where new server needs to fetch a backup from a
different location than where it should start writing its own backups there
could be two. The backup site name has no relevance for MyHoard and you can
pick whatever names you like.

Each site has the following configuration options:

**backup_sites.{name}.compression**

The compression method and option to use for binary logs uploaded to this
site:

**backup_sites.{name}.compression.algorithm**

One of the supported compression algorithms: `lzma`, `snappy`, `zstd`.
Defaults to `snappy`.

**backup_sites.{name}.compression.level**

Compression level for `lzma` or `zstd`.

**backup_sites.{name}.encryption_keys**

This is an object containing two keys, `public` and `private`. These define
the RSA master key used for encrypting/decrypting individual encryption keys
used for actual data encryption/decryption. The values must be in PEM format.

**backup_sites.{name}.object_storage**

The object storage configuration for this backup site. Please refer to the
[PGHoard readme](https://github.com/aiven/pghoard) for details regarding these
settings.

**backup_sites.{name}.recovery_only**

This site is not considered for new backups, it can only be used for recovering
an existing backup.

**binlog_purge_settings.enabled**

If `true` MyHoard purges binary logs that are no longer required. The
recommended configuration is to have MySQL keep binary logs around for a
longish period of time (several days) but have this setting enabled so that
MyHoard removes the binary logs as soon as they aren't needed anymore.

Note that in order to consider replication to other MySQL servers that are
part of the cluster the `PUT /replication_state` API must be used to
periodically tell MyHoard what transactions other cluster nodes have applied
to avoid MyHoard purging binary logs that haven't been replicated. This state
update is not strictly required since MySQL will not allow purging binary logs
that connected standbys don't yet have but in case standbys get temporarily
disconnected relevant binlogs could get purged. Also, in case a standby is
promoted as new master it should still have any binary log that any other
standby hasn't yet received so that when the other standbys start to replicate
from the standby that gets promoted as master they are able to get in sync.
This requires MyHoard on the standby to know what transactions other standbys
have applied.

**binlog_purge_settings.min_binlog_age_before_purge**

Minimum age of a binary log before purging it is considered. It is advisable to
set this to some minutes to avoid any race conditions where e.g. new standby
joins the cluster but MyHoard hasn't yet been informed of its replication
position and could end up purging binary log that the new standby could end up
needing.

**binlog_purge_settings.purge_interval**

Number of seconds between checks for binary logs that could be removed.

**binlog_purge_settings.purge_when_observe_no_streams**

Allow purging binary logs when no active backups exist. This setting is mostly
relevant for detached read-only replicas that are not configured to take
backups of their own. In this case the read replica would see no active backups
because the backup site used by the source service has been specified as
recovery only site for the replica. For any other nodes than detached read-only
replicas this setting can be set to `false`, for the replicas this should be
`true` or else MyHoard cannot do any purging at all.

**http_address**

The IP address to which MyHoard binds its HTTP server. It is highly recommended
to use local loopback address. MyHoard provides no authentication or TLS
support for the HTTP requests and they should only be made from localhost
unless you use something like HAProxy to add authentication and encryption.

**http_port**

The TCP port to which MyHoard binds its HTTP server.

**mysql.binlog_prefix**

The full path and file name prefix of binary logs. This must be the same as the
corresponding MySQL configuration option except full path is always required
here.

**mysql.client_params**

The parameters MyHoard uses to connect to MySQL. Because MyHoard needs to
perform certain low level operations like manually patch GTID executed value in
`mysql.gtid_executed` table while restoring data the user account must have
high level of privileges.

**mysql.config_file_name**

Full path of the MySQL server configuration file. This is passed to Percona
XtraBackup while creating or restoring full database snapshot.

**mysql.data_directory**

Full path of the MySQL data directory. This is currently only used for getting
file sizes for reporting and progress tracking purposes.

**mysql.relay_log_index_file**

Full path of the MySQL relay log index file. This must be the same as the
corresponding MySQL configuration option except full path is always required
here.

**mysql.relay_log_prefix**

The full path and file name prefix of relay logs. This must be the same as the
corresponding MySQL configuration option except full path is always required
here.

**restore_auto_mark_backups_broken**

Boolean value. Enables a mechanism to automatically mark backups as broken if restoration fails for
multiple times in a row. Defaults to `false`

**restore_free_memory_percentage**

Maximum percentage of system memory to allow xtrabackup to use while
preparing a basebackup for restoration. If not defined, use xtrabackup's
default value.

**restore_max_binlog_bytes**

Maximum amount of disk space to use for binary logs including pre-fetched logs
that are not yet being applied and the binary logs that MySQL is currently
applying. When Percona XtraBackup is restoring the full database snapshot
MyHoard starts prefetching binary logs that are needed on top of the full
snapshot. Up to this number of bytes are fetched. When the snapshot has been
restored MyHoard tells MySQL to start applying the binary logs it has managed
to pre-fetch thus far and keep on pre-fetching files as long as the total size
of pre-fetched and files being applied is below the limit (files are deleted as
soon as they have been fully applied).

This should be set to something like 1% of all disk space or at least a few
hundred MiBs (depending on individual binary log max size] so that sufficient
number of binary logs can be fetched.

**sentry_dsn**

Set this value to Sentry Data Source Name (DSN) string to have any unexpected
exceptions sent to Sentry.

**server_id**

Server identifier of this node (integer in the range 1 - 0x7FFFFFFF). This must
match the corresponding MySQL configuration option for this node.

**start_command**

Command used for starting MySQL. This is mostly used for tests. In any
production setups using systemd to manage MySQL server daemon is highly
recommended.

**state_directory**

Directory where to store MyHoard state files. MyHoard stores its full state
into a number of separate JSON files.

**statsd**

Statsd configuration used for sending metrics data points. The implementation
will send tags along the data points so Telegraf Statsd or other similar
implementation that can handle the tags is expected.

The tags specified here are also reused for Sentry.

**systemctl_command**

The `systemctl` base command to invoke when MyHoard needs to start or stop
the MySQL server. This is only used when restoring a backup where MySQL needs
to be started after full database snapshot has been recovered and restarted
a couple of times with slightly different settings to allow patching GTID
executed information appropriately.

**systemd_env_update_command**

A command to invoke before `systemctl` to configure MySQL server to use the
desired configuration options. This is typically just the built-in
`myhoard_mysql_env_update` command that writes to MySQL systemd environment
file. Separate command is needed to allow running the update as root user.

**systemd_service**

Name of the MySQL systemd service.

**temporary_directory**

Temporary directory to use for backup and restore operations. This is currently
not used directly by MyHoard but instead passed on to Percona XtraBackup. It is
recommended not to use `/tmp` for this because that is an in-memory file
system on many distributions and the exact space requirements for this
directory are not well defined.

**xtrabackup.copy_threads**

Number of worker threads created by XtraBackup for parallel copying data files when taking a backup. The default value is ``1``.

Note: It is recommended to use more threads for copying than to compress or encrypt.

**xtrabackup.compress_threads**

Number of worker threads created by XtraBackup for parallel compression when taking a backup. The default value is ``1``.

**xtrabackup.encrypt_threads**

Number of worker threads created by XtraBackup for parallel encryption when taking a backup. The default value is ``1``.

**xtrabackup.register_redo_log_consumer**

Lets XtraBackup register as a redo log consumer at the start of the backup.
The server does not remove a redo log that Percona XtraBackup (the consumer) has not yet copied.
The consumer reads the redo log and manually advances the log sequence number (LSN).
The server blocks the writes during the process. Based on the redo log consumption,
the server determines when it can purge the log. It is disabled by default.

# HTTP API

MyHoard provides an HTTP API for managing the service. The various entry points
are described here.

All APIs return a response like this on error:

```
{
"message": "details regarding what went wrong"
}
```

## GET /backup

Lists all available backups. This call takes no request parameters. Response
format is as follows:

```
{
"backups": [
{
"basebackup_info": {
...
},
"closed_at": "2019-05-23T06:29:10.041489Z",
"completed_at": "2019-05-22T06:29:20.582302Z",
"recovery_site": false,
"resumable": true,
"site": "{backup_site_name}",
"stream_id": "{backup_identifier}"
}
]
}
```

In case MyHoard has not yet finished fetching the list of backups the root
level `"backups"` value will be `null`.

**basebackup_info**

This contains various details regarding the full snapshot that is the basis of
this backup.

**closed_at**

The time at which last binary log to this stream was uploaded and no more
uploads are expected. The backup can resume to any point in time between
`closed_at` and `completed_at`. If `closed_at` is `null` the backup is
active and new binary logs are still being uploaded to it.

**recovery_site**

Tells whether the backup site this backup is stored into is recovery only.

**resumable**

Tells whether the backup is in a state that another server can continue backing
up data to it. When current master server starts a new backup it first needs to
upload the initial full snapshot and some associated metadata. Before that is
done no other server could possibly do anything useful with that backup. Once
these steps are completed then if the master fails for any reason and one of
the standbys are promoted as new master the newly promoted master can continue
uploading binary logs to the existing active backup. If the backup is not
resumable the new master needs to discard it and start new backup from scratch.

**site**

Name of the backup site this backup is stored into.

**stream_id**

Identifier of this backup.

## POST /backup

Create new full backup or force binary log rotation and back up the latest
binary log file. Request body must be like this:

```
{
"backup_type": "{basebackup|binlog}",
"wait_for_upload": 3.0,
"incremental": false
}
```

**backup_type**

This specifies the kind of backup to perform. If set to `basebackup` a new
full backup is created and old backup is closed once that is complete. If set
to `binlog` this rotates currently active binary log so that a finished
binary log file with all current transactions is created and that file is then
backed up.

**wait_for_upload**

This is only valid in case `backup_type` is `binlog`. In that case the
operation will block for up to as many seconds as specified by this parameter
for the binary log upload to complete before returning.

**incremental**

Boolean value, when `true` - requests incremental base backup if possible.
Setting is used only in combination with `basebackup` backup_type.
Defaults to `false`.

Response on success looks like this:

```
{
"success": true
}
```

## PUT /backup/{stream_id}/preserve

Create a request for updating the preservation status of a backup. Request body must be like this:

```
{
"preserve_until": "2023-09-01T00:00:0000",
"wait_for_applied_preservation": 3.0,
}
```

**stream_id**

Identifier of this backup.

**preserve_until**

Optional datetime value in ISO format for keeping the backup from being deleted.
If a valid value is provided, the backup will be preserved until the specified datetime.
If not provided or has a null value, the backup can be deleted due to old age.

**wait_for_applied_preservation**

Optional amount of time to the wait for the preservation request to be effectively executed. The
operation will block for up to as many seconds as specified by this parameter.

Response on success looks like this:

```
{
"success": true
}
```

## PUT /replication_state

This call can be used to inform MyHoard of the executed GTIDs on other servers
in the cluster to allow MyHoard to only purge binlogs that have been fully
applied on all cluster nodes. Request body must be like this:

```
{
"server1": {
"206375d9-ec5a-46b7-bb26-b621812e7471": [[1, 100]],
"131a1f4d-fb7a-44fe-94b9-5508445aa126": [[1, 5858]],
},
"server2": {
"206375d9-ec5a-46b7-bb26-b621812e7471": [[1, 100]],
"131a1f4d-fb7a-44fe-94b9-5508445aa126": [[1, 5710]],
}
}
```

The main level object must have an entry for each server in the cluster. The
names of the servers are not relevant as long as they are used consistently.
For each of the servers the value is an object that contains server UUID as the
key and list of GNO start end ranges as the value. The server UUID is the
original server from which the transactions originated, not the UUID of the
server reporting the numbers. In the example both servers `server1` and
`server2` have executed transactions 206375d9-ec5a-46b7-bb26-b621812e7471:1-100
and both have executed some part of transactions from server
131a1f4d-fb7a-44fe-94b9-5508445aa126 but `server1` is further ahead, having
executed GNOs up until 5858 while `server2` is only at 5710.

Note that it is not expected to have multiple masters active at the same time.
Multiple server UUIDs exist when old master servers have been replaced.

Response on success echoes back the same data sent in the request.

## GET /status

Returns current main mode of MyHoard. Response looks like this:

```
{
"mode": "{active|idle|observe|promote|restore}"
}
```

See the status update API for more information regarding the different modes.

## PUT /status

Updates current main mode of MyHoard. Request must be like this:

```
{
"force": false,
"mode": "{active|idle|observe|promote|restore}",
"site": "{site_name}",
"stream_id": "{backup_id}",
"target_time": null,
"target_time_approximate_ok": false
}
```

**force**

This value can only be passed when switching mode to active.

When binary log restoration is ongoing setting this true will cause mode to
be forcibly switched to promote without waiting for all binary logs to get
applied and the promote phase will skip the step of ensuring all binary logs
are applied. If mode is already promote and binary logs are being applied in
that state, the binary logs sync is considered to be immediately complete.
If the server is not currently applying binary logs passing `"force": true`
will cause the operation to fail with error 400.

This parameter is only intended for exceptional situations. For example
broken binary logs that cannot be applied and it is preferable to promote the
server in it's current state. Another possible case is binary logs containing
changes to large tables without primary keys with row format in use and the
operation being so slow that it will not complete in reasonable amount of time.
Data loss will incur when using this option!

**mode**

MyHoard initially starts in mode `idle`. In this state it only fetches the
available backups but doesn't actively do anything else. From `idle` state it
is possible to switch to `restore` or `active` states. Only the very first
server in a new cluster should be switched to `active` state directly from
`idle` state. All other servers must first be switched to `restore` and
only after restoration has finished should other state changes be performed.

When restore operation completes MyHoard automatically transitions to mode
`observe`, in which it keeps track of backups managed by other servers in the
cluster but doesn't actively back up anything. If this node should be the new
master (or new separate forked service) then mode must be switched to
`promote` once MyHoard has changed it from `restore` to `observe`. This
will make MyHoard update metadata in object storage appropriately before
automatically transitioning to state `active`.

Servers in state `active` cannot be transitioned to other states. They are
the active master node and MyHoard on the node must just be deactivated if the
server should stop acting in that role.

**site**

This is only applicable when new mode is `restore`. Identifies the site
containing the backup to restore. Use `GET /backup` to list all available
backups.

**stream_id**

This is only applicable when new mode is `restore`. Identifies the backup
to restore. Use `GET /backup` to list all available backups.

**target_time**

This is only applicable when new mode is `restore`. If this is omitted or
`null` the last available transaction for the given backup is restored. When
this is defined restoration is performed up until the last transaction before
this time. Must be ISO 8601 timestamp. If the requested time is not available
in the given timestamp (time is not between the `completed_at` and
`closed_at` timestamps) the request will fail.

**target_time_approximate_ok**

This is only applicable when new mode is `restore` and `target_time` has
been specified. If this is set to `true` then `target_time` is only used to
restrict results on individual binary log level. That is, the restore process
is guaranteed not to restore binary logs whose first transaction is later than
the given target time but the last file that is picked for restoration is fully
applied even if that means applying some transactions that are more recent than
the target time.

This mode is useful when restoring potentially large number of binary logs and
the exact target time is not relevant. Enabling this mode avoids having to use
the `UNTIL SQL_AFTER_GTIDS = x` parameter for the SQL thread. The `UNTIL`
modifier forces single threaded apply and on multi-core machines makes the
restoration slower. The single threaded mode only applies for the last batch
but that too can be very large and setting this value can significantly reduce
the restoration time.

## GET /status/restore

If current mode is `restore` this API can be used to get details regarding
restore progress. If mode is something else the request will fail with HTTP
status 400. For successful requests the response body looks like this:

```
{
"basebackup_compressed_bytes_downloaded": 8489392354,
"basebackup_compressed_bytes_total": 37458729461,
"binlogs_being_restored": 0,
"binlogs_pending": 73,
"binlogs_restored": 0,
"phase": "{current_phase}"
}
```

**basebackup_compressed_bytes_downloaded**

Number of compressed bytes of the full snapshot that have been downloaded so
far. Decryption and decompression is performed on the fly so these bytes have
also been processed.

**basebackup_compressed_bytes_total**

Total number of (compressed) bytes in the full snapshot.

**binlogs_being_restored**

Number of binary logs currently passed on to MySQL to restore.

**binlogs_pending**

Binary logs that are pending to be restored. Note that this number may be
going up if there is currently an active master node that is uploading new
binary logs to the backup being restored and there is no recovery target time
given.

**binlogs_restored**

Number of binary logs that have been successfully applied.

**phase**

Current phase of backup restoration. Possible options are these:

- getting_backup_info: Backup metadata is being fetched to determine what
exactly needs to be restored.
- initiating_binlog_downloads: Binary log prefetch operations are being
scheduled so that progress with those can be made while the full snapshot is
being restored.
- restoring_basebackup: The full snapshot is being downloaded and prepared.
- rebuilding_tables: Rebuilding tables before restoring binary logs. This can
avoid data corruption when updating from older MySQL versions.
- refreshing_binlogs: Refreshing binary log info to see if new binary logs have
been uploaded to object storage from current master. This and the other
binlog related phases are typically entered multiple times as the binlogs are
handled in batches.
- applying_binlogs: Refreshing the list of binary logs MySQL should be
restoring.
- waiting_for_apply_to_finish: Waiting for MySQL to finish applying current
subset of binary logs.
- finalizing: Performing final steps to complete backup restoration.
- completed: The operation has completed. This is typically not returned via
the API because MyHoard will automatically switch to `observe` mode when
restoration completes and the restoration status is not available in that
mode.
- failed: Restoring the backup failed. The operation will be retried
automatically, but it may fail repeatedly and analyzing logs to get more
details regarding the failure is advisable.
- failed_basebackup: Terminal state for a RestoreCoordinator instance but
restoring an earlier backup may be an option.

# Metrics

The following metrics are exported by myhoard:

**myhoard.backup_stream.basebackup_bytes_uploaded**
**myhoard.backup_stream.basebackup_requested**
**myhoard.backup_stream.basebackup_upload_rate**
**myhoard.backup_stream.binlog_upload_rate**
**myhoard.backup_stream.errors**
**myhoard.basebackup.bytes_compressed**
**myhoard.basebackup.bytes_uncompressed**
**myhoard.basebackup.compression_ratio**
**myhoard.basebackup.errors**
**myhoard.basebackup.estimated_progress**
**myhoard.basebackup.optimize_table**
**myhoard.basebackup.xtrabackup_backup**
**myhoard.basebackup_broken**
**myhoard.basebackup_restore.xbstream_extract**
**myhoard.basebackup_restore.xtrabackup_move**
**myhoard.basebackup_restore.xtrabackup_prepare**
**myhoard.binlog.count**
**myhoard.binlog.count_new**
**myhoard.binlog.remote_copy**
**myhoard.binlog.removed**
**myhoard.binlog.size**
**myhoard.binlog.size_new**
**myhoard.binlog.upload**
**myhoard.binlog.upload_errors**
**myhoard.disk_full_errors**
**myhoard.generic_errors**
**myhoard.http.{name}**
**myhoard.network_errors**
**myhoard.pending_binlog_bytes_max**
**myhoard.pending_binlog_bytes_min**
**myhoard.pending_binlog_count_max**
**myhoard.pending_binlog_count_min**
**myhoard.ratetracker.errors**
**myhoard.remote_read_errors**
**myhoard.remote_write_errors**
**myhoard.restore.basebackup_bytes_downloaded**
**myhoard.restore.binlogs_restored**
**myhoard.restore.cannot_reset**
**myhoard.restore.change_master_to_failed**
**myhoard.restore.pending_binlogs**
**myhoard.restore.unexpected_extra_relay_log**
**myhoard.restore_errors**

# Running container-based tests

Make sure docker is installed (podman currently untested) and just run:

```
make PYTHON_VERSION=3.11 PERCONA_VERSION=8.0.30-23-1.bullseye MYSQL_VERSION=8.0.30 build-setup-specific-image

make dockertest
```

If you don't need to change percona or mysql or python version, but you want to change myhoard source code and re-test,
run:

```
make dockertest-resync
```

to re-sync the source code from the host and re-run the tests.

Take a look at `.github/workflows/build.yaml` for possible version values.

In order to locally launch a single test (again while resyncing from current source code), you can use `pytest-quick` e.g.

```
make PYTEST_ARGS="-k test_3_node_service_failover_and_restore" dockertest-pytest
```

# Running tests natively

Running native tests must NOT be performed as root (requires additional options for mysql)

# Test environment setup: Debian/Ubuntu

Run:

```bash
MYSQL_VERSION=8.0.30
PERCONA_VERSION=8.0.30-23-1.bullseye
make build-dep-ubuntu
make clean
scripts/remove-default-mysql
scripts/install-mysql-packages ${MYSQL_VERSION}
scripts/setup-percona-repo
scripts/install-percona-package ${PERCONA_VERSION}
scripts/install-python-deps
pip3 install -e .
```

Note that since Percona-xtrabackup does not work with a version of MySQL newer
than Percona-xtrabackup, the version strings should match.

This command will install all the required package version. Please note: the state of your environment
WILL change with this command. Both native and Python packages will be installed.

# Test environment setup: Fedora

run:

`make build-dep-fedora`

(this can install or change packages on your host system)

# Running tests

Once the environment setup is over, you can execute

`make PYTHON_VERSION=3.11 coverage`

And have all test run, or just

`python${PYTHON_VERSION} -m pytest "$@"`

Setting `PYTHON_VERSION` is optional, but make sure you're using the same interpreter that was employed during setup, otherwise
you may encounter runtime errors.

# License

MyHoard is licensed under the Apache License, Version 2.0. Full license text
is available in the ``LICENSE`` file and at
http://www.apache.org/licenses/LICENSE-2.0.txt

# Contact

Bug reports and patches are very welcome, please post them as GitHub issues
and pull requests at https://github.com/aiven/myhoard. Any possible
vulnerabilities or other serious issues should be reported directly to the
maintainers .

# Trademarks

MySQL is a registered trademark of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Telegraf® is a trademark [registered / owned] by InfluxData, which is not affiliated with, and does not endorse, this product.

Percona Xtrabackup is a trademark and property of its respective owners. All product and service names used in this website are for identification purposes only and do not imply endorsement.

# Credits

MyHoard was created by, and is maintained by, [Aiven](https://aiven.io) cloud
data hub developers.

Recent contributors are listed on the GitHub project page,
https://github.com/aiven/myhoard/graphs/contributors

MyHoard uses [Percona Xtrabackup](https://www.percona.com) for creating and
restoring database snapshot excluding binary logs.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aiven-open/myhoard

Awesome Lists containing this project

README