{"id":17722517,"url":"https://github.com/mkabilov/logical_backup","last_synced_at":"2025-04-01T16:30:41.588Z","repository":{"id":94102407,"uuid":"141677707","full_name":"mkabilov/logical_backup","owner":"mkabilov","description":"[WIP] Continuous backup of the tables using PostgreSQL logical replication","archived":false,"fork":false,"pushed_at":"2019-04-15T12:28:18.000Z","size":830,"stargazers_count":8,"open_issues_count":13,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-06-20T13:29:51.409Z","etag":null,"topics":["backup","golang","logical-replication","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkabilov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-07-20T07:18:13.000Z","updated_at":"2024-06-20T13:29:51.410Z","dependencies_parsed_at":"2023-03-13T17:07:32.142Z","dependency_job_id":null,"html_url":"https://github.com/mkabilov/logical_backup","commit_stats":null,"previous_names":["ikitiki/logical_backup"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Flogical_backup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Flogical_backup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Flogical_backup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Flogical_backup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkabilov","download_url":"https://codeload.github.com/mkabilov/logical_backup/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246620397,"owners_count":20806764,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backup","golang","logical-replication","postgresql"],"created_at":"2024-10-25T15:38:33.908Z","updated_at":"2025-04-01T16:30:40.935Z","avatar_url":"https://github.com/mkabilov.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.com/mkabilov/logical_backup.svg?branch=master)](https://travis-ci.com/mkabilov/logical_backup)\n# PostgreSQL logical backup tool\n\n## Introduction\n\nThe purpose of the Logical Backup Tool (LBT) is to create a continuous\nincremental logical backup of one or more PostgreSQL database tables. For each\ntable that should be backed up, the tool stores a continuous stream of changes\n(called deltas) in files on disk; if necessary, it also produces a complete SQL\ndump of the table. As opposed to the physical backup, i.e. `pg_basebackup` the\noutcome is not a page-by-page copy of the target cluster and cannot be used in\nsituations that expect the exact full copy, e.g., when creating a physical\nreplica. At the same time, it doesn't have the limitations of the physical copy\nand can be used to restore target tables to a system running a different\nPostgreSQL major version from the original one, as well as to those running on\nan incompatible CPU architecture.\n\nThe primary use-case for the tool is protection against a data-loss of one or\nmore tables in an OLAP environment. Those are environments with high volumes of\ndata (typically in the range of tens or hundreds of TB) and relatively low\nfrequency of changes, possibly just a few thousand rows per day. Usually, the\nanalytical data stored in the database is generated from some primary source;\ntherefore, slight inconsistencies between different tables are tolerated and the\ngoal, when performing the data recovery, is to restore some target tables to the\nlatest state stored in the backup one by one, as opposed to restoring all\ndataset at once to a consistent point in time. Therefore, the tool will not\nguarantee that cross-table constraints, such as foreign keys, will be satisfied\nas a result of the restore, and it could be necessary to remove those\nconstraints from the dataset being backed up. In other words, the restore\nprocess is designed to act independently on each table from the backup set,\nrather than restoring the whole set one transaction at a time.\n\nThe resulting backup is continuous and incremental. Together with the the\ninitial SQL dump (also called basebackup) of the table, LBT writes ongoing\nchanges (called deltas). Once the number of deltas reaches the configured\nthreshold or the time since the previous basebackup exceeds a certain interval\nprovided in the configuration, the new basebackup is produced. Either of those\nconditions leads to a new basebackup; all deltas recorded before the start of\nthe latest full dump are purged once the dump succeeds.\n\n## Requirements\n\nThe Logical Backup Tool fetches the stream of changes using the PostgreSQL\nfeature called `logical decoding` and groups tables to back up with the concept\nof a `publication`, an entity introduced in PostgreSQL 10 to describe a set of\ntables that are replicated together; therefore, all PostgreSQL versions starting\nfrom 10 are supported. You will need a direct connection to your database with\nthe user that either has login and replication permissions, given that the\npublication specified in the tool configuration already exists, or the superuser\npermissions if you want the tool to create it. You should have enough space on\nthe server running LBT to store at least 2x the size of N biggest tables used in\nthe publication, where N is the number of concurrent base backup processes from\nthe configuration; in a real-world scenario the actual space requirement depends\non a backupThreshold, i.e., the number of deltas accumulated between consecutive\nfull backups. On a busy system doing a lot of data changes with a\nbackupThreshold set too high it may require a magnitude of the original table\nsize; on a side note, such systems probably won't fit the typical OLAP use-case.\n\nThe tool normal operations (particularly how often the dumps are created) would\nbe disrupted if system clock is adjusted, however, switching from/to DST should\nnot lead to any issues.\n\n## How does it work\n\nOn the initial run, the tool connects to the target database, makes a\npublication when necessary and calls `pg_create_logical_replication_slot` to\ncreate a new replication slot for the built-in output plugin called `pgoutput`.\nIt establishes the replication connection to the database and starts streaming\ndata from the slot created in the previous step. Once the changes for the not\nyet observed table are received from the logical stream, the logical backup\nroutine puts to the workers queue a request to create the basebackup for that\ntable and goes on with writing a delta corresponding to that change on disk. One\nof the spare backup workers (defined by the `concurrentBaseBackups`\nconfiguration parameter) will eventually pick a request for the basebackup and\nproduce a `COPY targettable to STDOUT` dump of the table.\n\nWhen LBT resumes after a period of downtime, it continues streaming from the\npreviously created slot; the slot provides a guarantee that the changes that the\ntool haven't processed are not lost during the period when they are not\nconsumed. On the downside, the unconsumed changes will accumulate on the\ndatabase server, taking up disk space in the `wal` directory; therefore, when\nplanning a prolonged downtime of the backup tool one should drop the slot used\nfor the backup with a `pg_drop_replication_slot()` command. After the slot is\ndropped, the backup directory should be purged, resulting in the backup process\nto start from scratch; alternatively, set the `initialBasebackup` described\nbelow. \n \n## Configuration parameters\n\nLBT reads its configuration from the YAML file supplied as a command-line\nargument. The following keys can be defined in that file:\n\n* **tempDir**\n  The directory to store temp files, such as incomplete basebackups.\n  Once completed, those files will be moved to the main backup directory.\n  \n* **deltasPerFile** \n  The maximum amount of individual changes (called deltas) a\n  single delta file may contain. This is the hard limit; once reached, LBT\n  writes all subsequent deltas to the new file, even if that results in a single\n  transaction to be split between multiple delta files.\n\n* **backupThreshold**\n  If the tool writes more than `backupThreshold` delta files\n  since the last basebackup, the new basebackup for the table is requested.\n  Setting this value too low will result in too many basebackups, setting it too\n  high may produce too many changes, consuming more disk space than necessary\n  and resulting in the longer recovery time for the table.\n   \n* **concurrentBasebackups**\n  The maximum number of processes doing basebackups\n  that can operate concurrently. Each process consumes a single PostgreSQL\n  connection and runs COPY for a table it is tasked with, writing the outcome\n  into a file.\n   \n* **trackNewTables**\n   When set to true, allow starting the tool with an empty\n   publication and permit new tables to be added to the initial set provided by\n   the publication. It's a good idea to enable this option if you define a\n   publication `FOR ALL TABLES` or make the LBT define the one for you.\n   \n* **slotname**\n  Name of the logical replication slot that the tool should use.\n  LBT attempts to create the slot if it doesn't exist. It expects a\n  non-temporary slot with the output plugin `pgoutput`. Note that LBT never\n  drops the slot on its own, if you need to start from scratch, you should drop\n  it manually with `pg_drop_replication_slot`\n   \n* **publication**\n  The name of the publication LBT should use to determine the\n  set of tables to backup. LBT attempts to create one if it doesn't exist,\n  defining it `FOR ALL TABLES`. If you need only a subset of tables you should\n  create the corresponding publication beforehand.\n        \n* **initialBasebackup** \n  If set to true, LBT will trigger the initial basebackup\n  for all tables in the publication at startup. This feature could be useful to\n  discard the old backup deltas and start from the fresh basebackup, i.e., after\n  deleting the replication slot.\n    \n* **fsync**\n  When set to true, runs fsync, causing each write to the delta file\n  to be durable. On a system with many writes this may negatively impact the\n  performance.\n\n* **archiveDir**\n  Main directory to store the resulting backup.\n\n * **forceBasebackupAfterInactivityInterval** Trigger the new backup if there\n  was an activity since the last backup on the table, but the last delta\n  written is older than the time interval specified in this parameter. On some\n  tables the write ratio decreases exponentially with time so that we can\n  almost guarantee that they don't receive any writes after reaching a certain\n  age. Normally, the backup tool won't those tables anymore and leave the\n  previous basebackups and all deltas in the `archiveDir`; however, if there is\n  a retention policy that requires deletion of old data, we want to take a\n  fresh base backup after that deletion, which is supposed to be the last major\n  activity on the table, has concluded to avoid storing those deleted rows as a\n  part of the previous basebackup. Alternatively, this parameter could be used\n  to leave only one basebackup file without any deltas on those tables that\n  become immutable under a certain condition (i.e time interval). The value of\n  that parameter should have an integer with the time unit attached; valid\n  units are 's', 'm', 'h' for seconds, minutes and hours. For instance, the\n  value of `10h5s` corresponds to `10 hours 5 seconds`. Note that the actual\n  value will be truncate to minute, i.e. `5m55s` will result in a `5m` actual\n  interval.\n\n* **db**\n  Database connection parameters. The following values are accepted.\n  * **host**:\n  database server hostname or ip addresses\n  * **port**:\n  the port the database server listens to\n  * **user**:\n  the user name for the connection. See the `Requirements` part for the privilege\n  this user must have.\n  * **database**:  \n  the database to connnect to. It is not possible to backup multiple databases\n  with one insance of the tool at the moment; however, multiple backup tools can\n  work on the same cluster on different databases.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkabilov%2Flogical_backup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkabilov%2Flogical_backup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkabilov%2Flogical_backup/lists"}