https://github.com/mkabilov/logical_backup

[WIP] Continuous backup of the tables using PostgreSQL logical replication
https://github.com/mkabilov/logical_backup

backup golang logical-replication postgresql

Last synced: 3 months ago
JSON representation

[WIP] Continuous backup of the tables using PostgreSQL logical replication

Host: GitHub
URL: https://github.com/mkabilov/logical_backup
Owner: mkabilov
License: mit
Created: 2018-07-20T07:18:13.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2019-04-15T12:28:18.000Z (about 6 years ago)
Last Synced: 2024-06-20T13:29:51.409Z (about 1 year ago)
Topics: backup, golang, logical-replication, postgresql
Language: Go
Homepage:
Size: 811 KB
Stars: 8
Watchers: 4
Forks: 3
Open Issues: 13
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![Build Status](https://travis-ci.com/mkabilov/logical_backup.svg?branch=master)](https://travis-ci.com/mkabilov/logical_backup)
# PostgreSQL logical backup tool

## Introduction

The purpose of the Logical Backup Tool (LBT) is to create a continuous
incremental logical backup of one or more PostgreSQL database tables. For each
table that should be backed up, the tool stores a continuous stream of changes
(called deltas) in files on disk; if necessary, it also produces a complete SQL
dump of the table. As opposed to the physical backup, i.e. `pg_basebackup` the
outcome is not a page-by-page copy of the target cluster and cannot be used in
situations that expect the exact full copy, e.g., when creating a physical
replica. At the same time, it doesn't have the limitations of the physical copy
and can be used to restore target tables to a system running a different
PostgreSQL major version from the original one, as well as to those running on
an incompatible CPU architecture.

The primary use-case for the tool is protection against a data-loss of one or
more tables in an OLAP environment. Those are environments with high volumes of
data (typically in the range of tens or hundreds of TB) and relatively low
frequency of changes, possibly just a few thousand rows per day. Usually, the
analytical data stored in the database is generated from some primary source;
therefore, slight inconsistencies between different tables are tolerated and the
goal, when performing the data recovery, is to restore some target tables to the
latest state stored in the backup one by one, as opposed to restoring all
dataset at once to a consistent point in time. Therefore, the tool will not
guarantee that cross-table constraints, such as foreign keys, will be satisfied
as a result of the restore, and it could be necessary to remove those
constraints from the dataset being backed up. In other words, the restore
process is designed to act independently on each table from the backup set,
rather than restoring the whole set one transaction at a time.

The resulting backup is continuous and incremental. Together with the the
initial SQL dump (also called basebackup) of the table, LBT writes ongoing
changes (called deltas). Once the number of deltas reaches the configured
threshold or the time since the previous basebackup exceeds a certain interval
provided in the configuration, the new basebackup is produced. Either of those
conditions leads to a new basebackup; all deltas recorded before the start of
the latest full dump are purged once the dump succeeds.

## Requirements

The Logical Backup Tool fetches the stream of changes using the PostgreSQL
feature called `logical decoding` and groups tables to back up with the concept
of a `publication`, an entity introduced in PostgreSQL 10 to describe a set of
tables that are replicated together; therefore, all PostgreSQL versions starting
from 10 are supported. You will need a direct connection to your database with
the user that either has login and replication permissions, given that the
publication specified in the tool configuration already exists, or the superuser
permissions if you want the tool to create it. You should have enough space on
the server running LBT to store at least 2x the size of N biggest tables used in
the publication, where N is the number of concurrent base backup processes from
the configuration; in a real-world scenario the actual space requirement depends
on a backupThreshold, i.e., the number of deltas accumulated between consecutive
full backups. On a busy system doing a lot of data changes with a
backupThreshold set too high it may require a magnitude of the original table
size; on a side note, such systems probably won't fit the typical OLAP use-case.

The tool normal operations (particularly how often the dumps are created) would
be disrupted if system clock is adjusted, however, switching from/to DST should
not lead to any issues.

## How does it work

On the initial run, the tool connects to the target database, makes a
publication when necessary and calls `pg_create_logical_replication_slot` to
create a new replication slot for the built-in output plugin called `pgoutput`.
It establishes the replication connection to the database and starts streaming
data from the slot created in the previous step. Once the changes for the not
yet observed table are received from the logical stream, the logical backup
routine puts to the workers queue a request to create the basebackup for that
table and goes on with writing a delta corresponding to that change on disk. One
of the spare backup workers (defined by the `concurrentBaseBackups`
configuration parameter) will eventually pick a request for the basebackup and
produce a `COPY targettable to STDOUT` dump of the table.

When LBT resumes after a period of downtime, it continues streaming from the
previously created slot; the slot provides a guarantee that the changes that the
tool haven't processed are not lost during the period when they are not
consumed. On the downside, the unconsumed changes will accumulate on the
database server, taking up disk space in the `wal` directory; therefore, when
planning a prolonged downtime of the backup tool one should drop the slot used
for the backup with a `pg_drop_replication_slot()` command. After the slot is
dropped, the backup directory should be purged, resulting in the backup process
to start from scratch; alternatively, set the `initialBasebackup` described
below.

## Configuration parameters

LBT reads its configuration from the YAML file supplied as a command-line
argument. The following keys can be defined in that file:

* **tempDir**
The directory to store temp files, such as incomplete basebackups.
Once completed, those files will be moved to the main backup directory.

* **deltasPerFile**
The maximum amount of individual changes (called deltas) a
single delta file may contain. This is the hard limit; once reached, LBT
writes all subsequent deltas to the new file, even if that results in a single
transaction to be split between multiple delta files.

* **backupThreshold**
If the tool writes more than `backupThreshold` delta files
since the last basebackup, the new basebackup for the table is requested.
Setting this value too low will result in too many basebackups, setting it too
high may produce too many changes, consuming more disk space than necessary
and resulting in the longer recovery time for the table.

* **concurrentBasebackups**
The maximum number of processes doing basebackups
that can operate concurrently. Each process consumes a single PostgreSQL
connection and runs COPY for a table it is tasked with, writing the outcome
into a file.

* **trackNewTables**
When set to true, allow starting the tool with an empty
publication and permit new tables to be added to the initial set provided by
the publication. It's a good idea to enable this option if you define a
publication `FOR ALL TABLES` or make the LBT define the one for you.

* **slotname**
Name of the logical replication slot that the tool should use.
LBT attempts to create the slot if it doesn't exist. It expects a
non-temporary slot with the output plugin `pgoutput`. Note that LBT never
drops the slot on its own, if you need to start from scratch, you should drop
it manually with `pg_drop_replication_slot`

* **publication**
The name of the publication LBT should use to determine the
set of tables to backup. LBT attempts to create one if it doesn't exist,
defining it `FOR ALL TABLES`. If you need only a subset of tables you should
create the corresponding publication beforehand.

* **initialBasebackup**
If set to true, LBT will trigger the initial basebackup
for all tables in the publication at startup. This feature could be useful to
discard the old backup deltas and start from the fresh basebackup, i.e., after
deleting the replication slot.

* **fsync**
When set to true, runs fsync, causing each write to the delta file
to be durable. On a system with many writes this may negatively impact the
performance.

* **archiveDir**
Main directory to store the resulting backup.

* **forceBasebackupAfterInactivityInterval** Trigger the new backup if there
was an activity since the last backup on the table, but the last delta
written is older than the time interval specified in this parameter. On some
tables the write ratio decreases exponentially with time so that we can
almost guarantee that they don't receive any writes after reaching a certain
age. Normally, the backup tool won't those tables anymore and leave the
previous basebackups and all deltas in the `archiveDir`; however, if there is
a retention policy that requires deletion of old data, we want to take a
fresh base backup after that deletion, which is supposed to be the last major
activity on the table, has concluded to avoid storing those deleted rows as a
part of the previous basebackup. Alternatively, this parameter could be used
to leave only one basebackup file without any deltas on those tables that
become immutable under a certain condition (i.e time interval). The value of
that parameter should have an integer with the time unit attached; valid
units are 's', 'm', 'h' for seconds, minutes and hours. For instance, the
value of `10h5s` corresponds to `10 hours 5 seconds`. Note that the actual
value will be truncate to minute, i.e. `5m55s` will result in a `5m` actual
interval.

* **db**
Database connection parameters. The following values are accepted.
* **host**:
database server hostname or ip addresses
* **port**:
the port the database server listens to
* **user**:
the user name for the connection. See the `Requirements` part for the privilege
this user must have.
* **database**:
the database to connnect to. It is not possible to backup multiple databases
with one insance of the tool at the moment; however, multiple backup tools can
work on the same cluster on different databases.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mkabilov/logical_backup

Awesome Lists containing this project

README