Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ankane/pgsync
Sync data from one Postgres database to another
https://github.com/ankane/pgsync
postgresql
Last synced: 5 days ago
JSON representation
Sync data from one Postgres database to another
- Host: GitHub
- URL: https://github.com/ankane/pgsync
- Owner: ankane
- License: mit
- Created: 2015-12-08T05:37:40.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2024-12-29T17:11:14.000Z (13 days ago)
- Last Synced: 2024-12-30T19:10:06.746Z (12 days ago)
- Topics: postgresql
- Language: Ruby
- Homepage:
- Size: 505 KB
- Stars: 3,209
- Watchers: 37
- Forks: 202
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-ruby-cli-apps - pgsync - Syncs data from one Postgres database to another. (Awesome Ruby CLIs / Database)
- awesome-starts - ankane/pgsync - Sync data from one Postgres database to another (postgresql)
- awesome-github-repos - ankane/pgsync - Sync data from one Postgres database to another (Ruby)
- starred-awesome - pgsync - Sync Postgres data between databases (Ruby)
- jimsghstars - ankane/pgsync - Sync data from one Postgres database to another (Ruby)
README
# pgsync
Sync data from one Postgres database to another (like `pg_dump`/`pg_restore`). Designed for:
- **speed** - tables are transferred in parallel
- **security** - built-in methods to prevent sensitive data from ever leaving the server
- **flexibility** - gracefully handles schema differences, like missing columns and extra columns
- **convenience** - sync partial tables, groups of tables, and related records:tangerine: Battle-tested at [Instacart](https://www.instacart.com/opensource)
[![Build Status](https://github.com/ankane/pgsync/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/pgsync/actions)
## Installation
pgsync is a command line tool. To install, run:
```sh
gem install pgsync
```This will give you the `pgsync` command. If installation fails, you may need to install [dependencies](#dependencies).
You can also install it with Homebrew:
```sh
brew install pgsync
```## Setup
In your project directory, run:
```sh
pgsync --init
```This creates `.pgsync.yml` for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). `pgsync` commands can be run from this directory or any subdirectory.
## How to Use
First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few [convenience methods](#schema). Once that’s done, you’re ready to sync data.
Sync tables
```sh
pgsync
```Sync specific tables
```sh
pgsync table1,table2
```Works with wildcards as well
```sh
pgsync "table*"
```Sync specific rows (existing rows are overwritten)
```sh
pgsync products "where store_id = 1"
```You can also preserve existing rows
```sh
pgsync products "where store_id = 1" --preserve
```Or truncate them
```sh
pgsync products "where store_id = 1" --truncate
```## Tables
Exclude specific tables
```sh
pgsync --exclude table1,table2
```Add to `.pgsync.yml` to exclude by default
```yml
exclude:
- table1
- table2
```Sync tables from all schemas or specific schemas (by default, only the search path is synced)
```sh
pgsync --all-schemas
# or
pgsync --schemas public,other
# or
pgsync public.table1,other.table2
```## Groups
Define groups in `.pgsync.yml`:
```yml
groups:
group1:
- table1
- table2
```And run:
```sh
pgsync group1
```## Variables
You can also use groups to sync a specific record and associated records in other tables.
To get product `123` with its reviews, last 10 coupons, and store, use:
```yml
groups:
product:
products: "where id = {1}"
reviews: "where product_id = {1}"
coupons: "where product_id = {1} order by created_at desc limit 10"
stores: "where id in (select store_id from products where id = {1})"
```And run:
```sh
pgsync product:123
```## Schema
Sync the schema before the data (this wipes out existing data)
```sh
pgsync --schema-first
```Specify tables
```sh
pgsync table1,table2 --schema-first
```Sync the schema without data (this wipes out existing data)
```sh
pgsync --schema-only
```pgsync does not try to sync Postgres extensions.
## Sensitive Data
Prevent sensitive data like email addresses from leaving the remote server.
Define rules in `.pgsync.yml`:
```yml
data_rules:
email: unique_email
last_name: random_letter
birthday: random_date
users.auth_token:
value: secret
visits_count:
statement: "(RANDOM() * 10)::int"
encrypted_*: null
````last_name` matches all columns named `last_name` and `users.last_name` matches only the users table. Wildcards are supported, and the first matching rule is applied.
Options for replacement are:
- `unique_email`
- `unique_phone`
- `unique_secret`
- `random_letter`
- `random_int`
- `random_date`
- `random_time`
- `random_ip`
- `value`
- `statement`
- `null`
- `untouched`Rules starting with `unique_` require the table to have a single column primary key. `unique_phone` requires a numeric primary key.
## Foreign Keys
Foreign keys can make it difficult to sync data. Three options are:
1. Defer constraints (recommended)
2. Manually specify the order of tables
3. Disable foreign key triggers, which can silently break referential integrity (not recommended)To defer constraints, use:
```sh
pgsync --defer-constraints
```To manually specify the order of tables, use `--jobs 1` so tables are synced one-at-a-time.
```sh
pgsync table1,table2,table3 --jobs 1
```To disable foreign key triggers and potentially break referential integrity, use:
```sh
pgsync --disable-integrity
```This requires superuser privileges on the `to` database. If syncing to (not from) Amazon RDS, use the `rds_superuser` role. If syncing to (not from) Heroku, there doesn’t appear to be a way to disable integrity.
## Triggers
Disable user triggers with:
```sh
pgsync --disable-user-triggers
```## Sequences
Skip syncing sequences with:
```sh
pgsync --no-sequences
```## Append-Only Tables
For extremely large, append-only tables, sync in batches.
```sh
pgsync large_table --in-batches
```Note: This requires the table to have a numeric, increasing primary key
The script will resume where it left off when run again, making it great for backfills.
## Connection Security
Always make sure your [connection is secure](https://ankane.org/postgres-sslmode-explained) when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use `sslmode=verify-full`. If you don’t do this, your database credentials can be compromised.
## Safety
To keep you from accidentally overwriting production, the destination is limited to `localhost` or `127.0.0.1` by default.
To use another host, add `to_safe: true` to your `.pgsync.yml`.
## Multiple Databases
To use with multiple databases, run:
```sh
pgsync --init db2
```This creates `.pgsync-db2.yml` for you to edit. Specify a database in commands with:
```sh
pgsync --db db2
```## Integrations
- [Django](#django)
- [Heroku](#heroku)
- [Laravel](#laravel)
- [Rails](#rails)### Django
If you run `pgsync --init` in a Django project, migrations will be excluded in `.pgsync.yml`.
```yml
exclude:
- django_migrations
```### Heroku
If you run `pgsync --init` in a Heroku project, the `from` database will be set in `.pgsync.yml`.
```yml
from: $(heroku config:get DATABASE_URL)?sslmode=require
```### Laravel
If you run `pgsync --init` in a Laravel project, migrations will be excluded in `.pgsync.yml`.
```yml
exclude:
- migrations
```### Rails
If you run `pgsync --init` in a Rails project, Active Record metadata and schema migrations will be excluded in `.pgsync.yml`.
```yml
exclude:
- ar_internal_metadata
- schema_migrations
```## Debugging
To view the SQL that’s run, use:
```sh
pgsync --debug
```## Other Commands
Help
```sh
pgsync --help
```Version
```sh
pgsync --version
```List tables
```sh
pgsync --list
```## Scripts
Use groups when possible to take advantage of parallelism.
For Ruby scripts, you may need to do:
```rb
Bundler.with_unbundled_env do
system "pgsync ..."
end
```## Docker
Get the [Docker image](https://hub.docker.com/r/ankane/pgsync) with:
```sh
docker pull ankane/pgsync
alias pgsync="docker run -ti ankane/pgsync"
```This will give you the `pgsync` command.
## Dependencies
If installation fails, your system may be missing Ruby or libpq.
On Mac, run:
```sh
brew install libpq
```On Ubuntu, run:
```sh
sudo apt-get install ruby-dev libpq-dev build-essential
```## Upgrading
Run:
```sh
gem install pgsync
```To use master, run:
```sh
gem install specific_install
gem specific_install https://github.com/ankane/pgsync.git
```With Homebrew, run:
```sh
brew upgrade pgsync
```With Docker, run:
```sh
docker pull ankane/pgsync
```## Related Projects
Also check out:
- [Dexter](https://github.com/ankane/dexter) - The automatic indexer for Postgres
- [PgHero](https://github.com/ankane/pghero) - A performance dashboard for Postgres
- [pgslice](https://github.com/ankane/pgslice) - Postgres partitioning as easy as pie## Thanks
Inspired by [heroku-pg-transfer](https://github.com/ddollar/heroku-pg-transfer).
## History
View the [changelog](https://github.com/ankane/pgsync/blob/master/CHANGELOG.md)
## Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- [Report bugs](https://github.com/ankane/pgsync/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/pgsync/pulls)
- Write, clarify, or fix documentation
- Suggest or add new featuresTo get started with development:
```sh
git clone https://github.com/ankane/pgsync.git
cd pgsync
bundle installcreatedb pgsync_test1
createdb pgsync_test2
createdb pgsync_test3bundle exec rake test
```