{"id":16935187,"url":"https://github.com/bsmth/read_replicas","last_synced_at":"2026-05-16T12:34:15.728Z","repository":{"id":43373774,"uuid":"466162703","full_name":"bsmth/read_replicas","owner":"bsmth","description":null,"archived":false,"fork":false,"pushed_at":"2022-03-04T18:20:21.000Z","size":88,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-30T03:42:19.545Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bsmth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-04T14:53:35.000Z","updated_at":"2022-03-04T15:06:35.000Z","dependencies_parsed_at":"2022-09-13T19:23:55.079Z","dependency_job_id":null,"html_url":"https://github.com/bsmth/read_replicas","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bsmth/read_replicas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsmth%2Fread_replicas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsmth%2Fread_replicas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsmth%2Fread_replicas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsmth%2Fread_replicas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bsmth","download_url":"https://codeload.github.com/bsmth/read_replicas/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsmth%2Fread_replicas/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33102935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T04:41:52.686Z","status":"ssl_error","status_checked_at":"2026-05-16T04:41:52.009Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T20:53:55.194Z","updated_at":"2026-05-16T12:34:15.713Z","avatar_url":"https://github.com/bsmth.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Database read replicas\n\nThis document provides an overview of database read replicas and covers the use\ncases where developers may want to use replication for their databases and web\ndevelopment environments. This document will focus on replication in PostgreSQL\nand describes how to benefit from read replicas in a PostgreSQL-based application\narchitecture.\n\n## What are database read replicas?\n\nA read replica is a copy of a database that contains all of the latest changes\nmade to a primary database. These two types of databases may be referred to as\n\"primary/secondary\" or \"main/replica\" in other systems but have the following\nterminology in the context of PostgreSQL-based systems:\n\n**Primary:** the database instance which has a leader role assigned. This\ninstance allows read and write operations on its data.\n\n**Standby:** one or more copies of the primary database which have the latest\nchanges. These instances are typically read-only.\n\nThere are multiple reasons why you might want to employ read replicas in your\narchitecture. The benefits boil down to two main goals that read replicas can\nfacilitate:\n\n- **High availability:** multiple copies of the primary are made to avoid data\n  loss in cases where the main database is a single point of failure in a\n  system. If the primary node is offline for any reason, a failover can be\n  performed so that a standby takes the place of a primary.\n- **Load balancing:** managing ingress and egress traffic and distributing the\n  volume of requests across copies of the same data so that no nodes are\n  overloaded. Load balancing is intended to improve performance on both database\n  reads and writes.\n\n### How native PostgreSQL replication works\n\nPostgreSQL uses an internal component for data durability called a\nwrite-ahead-log (WAL) that tracks all data changes (`INSERT`, `UPDATE`,\n`DELETE`) before they are committed and saved to permanent storage. Each new\ndata change is assigned a log sequence number (LSN), and the WAL can be used to\nrecover from crashes by applying changes in the WAL on restart.\n\nThe native replication functionality in PostgreSQL relies on the WAL of a\nPrimary node to establish whether a replica has the most recent information or\nnot and essentially continuously performs the same operations as an instance\nrecovering from a crash until it has caught up with a Primary node.\n\n![PostgreSQL replication using a primary node and two replicas](img/wal_senders.png)\n\nThe two modes built into PostgreSQL for replication are **log shipping** and\n**streaming replication**. Replicas that only use log shipping are known as warm\nstandbys and mostly used for a high availability configuration. A warm standby\nallows read replicas to take over if a primary server fails. A read replica is\nnot open for SQL reads in a warm standby configuration.\n\nLog shipping has a low overhead in terms of network and compute resources\nbecause the WAL is sent in batches and is processed in bulk. In contrast,\nstreaming replication works by continuously sending WAL records over the network\nconnection from the primary to the standbys. In more recent versions of\nPostgreSQL, a combination of log shipping and streaming replication is employed\nsimultaneously and provides hot standby replicas available for read-only\noperations.\n\nThe usual process for setting up read replicas in PostgreSQL is to provide a\ndirectory on the primary server where the primary stores archive files. It is\nthen provided server configuration that enables archiving and replication. This\nis done in the `postgresql.conf` file like the following example that turns on\narchiving and allows a maximum of three read replicas to connect and replicate\ndata:\n\n```conf\nwal_level = hot_standby\narchive_mode = on\nmax_wal_senders = 3\n```\n\nThe primary server also needs to allow connections incoming from standby nodes.\nThis can be configured in the `pg_hba.conf` file like the following example\nwhere `replication_user` is the expected username and `standby_IP` is the public\nIP address of the standby instance:\n\n```conf\n# Allow replication connections\nhost     replication     {replication_user}         {standby_IP}/32        md5\n```\n\nAfter the primary instance is restarted, it should expect connections from the\nstandby nodes to begin replication. To prepare the standby nodes, the files from\nthe data directory on the primary server should be copied to the same directory\non the standby server. This can be done using the `pg_basebackup` utility **on\nthe standby server**:\n\n```bash\nsudo -u postgres pg_basebackup \\\n  -h {standby_IP} -ip-addr -p 5432 -U {replication_user} \\\n  -D /var/lib/postgresql/12/main/ -Fp -Xs -R\n```\n\nThe flags provided to `pg_basebackup` have common properties such as username\nand credentials, but there are some convenient additions at the end of this\ncommand which automate steps that developers might usually have to perform\nmanually:\n\n- `-Xs` streams the contents of the WAL log as the backup of the primary is\n  performed.\n- `-R` creates an empty file named `standby.signal` in the replica's data\n  directory. This file lets your replica cluster know that it should operate as\n  a standby server. The `-R` option also adds the connection information about\n  the primary server to the `postgresql.auto.conf` file. This is a special\n  configuration file that is read whenever the regular `postgresql.conf` file is\n  read, but the values in the `.auto` file override the values in the regular\n  configuration file.\n\nAfter restarting the standby node, replication should be active and the standby\ncan be used as a replica that allows read-only connections. The example we\ndescribed covers one of many possible ways to achieve replication. There is a\nconsiderable amount of configuration for native replication functionality, which\nallows for fine-grained control of how data is transferred to replicas and the\narchitecture required. A full list of native approaches is described in more\ndetail in the PostgreSQL reference documentation for\n[replication solutions](https://www.postgresql.org/docs/current/different-replication-solutions.html).\n\n### Alternatives to built-in functionality\n\nPostgreSQL doesn't provide the functionality to automatically failover when the\nprimary server fails; this is a manual operation unless you use a third-party\nsolution. Load balancing is also not automatic with hot standbys. Suppose load\nbalancing is a requirement for your application. In that case, you must provide\na load-balancing solution that uses the primary server for read-write operations\nand the standby server for read-only operations. Due to the popularity of\nPostgreSQL, there are many options for handling replication, including\nthird-party extensions and products which offer PostgreSQL solutions. The\nfollowing is a shortlist of standard options that control replication via more\nstraightforward configuration.\n\n[pgpool-II](https://www.pgpool.net/mediawiki/index.php/Main_Page) is a\nmiddleware that works between PostgreSQL servers and a client. It provides\nconnection pooling and load balancing by distributing `SELECT` queries among\nmultiple servers, improving overall throughput. Performance improves\nproportionally to the number of PostgreSQL servers. Load balancing works best in\na situation where there are a lot of users executing many queries at the same\ntime. Parallel queries allow for data to be divided among the multiple servers\nto execute a query on all the servers concurrently to reduce the overall\nexecution time. The parallel query works the best when searching large-scale\ndata.\n\nA more complicated but well-tested and robust approach is to use a combination\nof [PGbouncer](https://www.pgbouncer.org/) and\n[HAProxy](http://www.haproxy.org/). PgBouncer is a connection pooler for\noptimizing database connections so that applications have improved performance.\nPgBouncer can manage database connections well but cannot handle failover\nrequired for deploying high availability. HAProxy's TCP load balancing combined\nwith PgBouncer is used as an architectural solution to a highly available\ndeployment.\n\n[Citus](https://www.citusdata.com/product/community) is a popular commercial\nsolution that easily transforms PostgreSQL into a fully distributed cluster.\nCitus is open source, so it can be self-managed on your infrastructure. They\nalso offer a managed version on Azure where users can set up the entire service\nwithout managing more complex aspects of database configuration.\n\n## Using read replicas\n\nAn excellent way to understand when read replicas are beneficial is to consider\nsome practical examples and use cases. The following section covers two common\nuse cases that show when to use high availability and load balancing in\nPostgreSQL-based web applications and why they suit the needs of each scenario.\n\n### UX insights and data visualization\n\nTake the example of a production PostgreSQL database storing event data from\nuser sessions on a company's website. We're storing multiple different event\ntypes that record different actions visitors take with a website's UI to drive\nmore effective design initiatives. The UX team wants to know which parts of the\nscreen users click on first and the sequences they're using. We're inserting\nsession views, clicks, and active users into our database and correlating this\ninformation with software releases with different UI components. In terms of the\nusage patterns of data access, there's an unpredictable frequency and volume of\nwrite operations happening. One of the requirements is that a UI and UX team\nneeds to connect multiple data visualization and reporting tools to our database\nto find the most commonly used components by users in daily, weekly and monthly\nbuckets. Having our BI and analytics tools perform regular data-intensive\nqueries on top of our primary database node would significantly impact read and\nwrite performance.\n\n![Read replica for reporting or data visualization](img/read_replica.png)\n\nThe asynchronous replication relied upon here will have no guarantee data has\nbeen sent to any replicas. Data is considered written once it has been written\nto the primary server's WAL. The WAL sender will stream all WAL data to any\nconnected replicas, but this will happen asynchronously after the WAL is\nwritten. This replication mode can result in temporary data inconsistency\nbetween the primary and the replicas.\n\nIn this case, it's acceptable that there is latency between writes and data\nbeing visible on read replicas because our UX team only need daily reports. We\ncan ensure that there are read replicas available that do not require data to be\navailable in real-time but can allow expensive or long-running queries to\nwithout impacting ingestion on our primary node.\n\n### eCommerce application\n\nConsider an eCommerce application that has a PostgreSQL backend for storing\norders and customer transactions. If the primary database crashes or is sent\nentirely offline, it would be impossible for the eCommerce store to see a\nhistory of orders, or complete new orders, meaning the entire business function\nof the store would be non-operational. Developers should consider a\nhigh-availability setup to prevent data loss for a business-critical deployment\nlike this.\n\nIn this scenario, it would be a good idea to have a single read/write primary\nand at least one other warm standby replica, preferable in another region. This\nwould ensure that disaster recovery mechanisms are in place should an instance\nor even a data center go offline. A read replica would be available to view\nhistorical transactions and, depending on the setup, can be elected as a Primary\nnode role and begin to accept read/write operations.\n\nThe diagram below is a simplified version of an architectural solution\n[recommended by AWS](https://aws.amazon.com/blogs/database/set-up-highly-available-pgbouncer-and-haproxy-with-amazon-aurora-postgresql-readers/)\nfor a highly available deployment of their managed PostgreSQL service, Aurora.\nThe primary and standby nodes are deployed across multiple availability zones\nfor redundancy. The read replicas serve read-only queries, and the dotted lines\nbelow indicate a custom health check to determine the status of the nodes in the\ncluster.\n\n![A highly available architecture deployed to AWS using Aurora](img/ha.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsmth%2Fread_replicas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbsmth%2Fread_replicas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsmth%2Fread_replicas/lists"}