Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aiven/aiven-db-migrate
https://github.com/aiven/aiven-db-migrate
Last synced: 16 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/aiven/aiven-db-migrate
- Owner: aiven
- License: apache-2.0
- Created: 2020-04-03T07:06:59.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-08-22T10:15:31.000Z (6 months ago)
- Last Synced: 2024-08-22T20:08:10.417Z (5 months ago)
- Language: Python
- Size: 169 KB
- Stars: 20
- Watchers: 75
- Forks: 8
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
README
# aiven-db-migrate
Aiven database migration tool. This tool is meant for easy migration of databases from some database service
provider, such AWS RDS, or on premises data center, to [Aiven Database as a Service](https://aiven.io/).
However, it's not limited for Aiven services and it might be useful as a generic database migration tool.Usually database service providers, such as Aiven, AWS RDS, and alike, don't allow superuser/root access.
Instead the service's master/admin user is granted permissions for the most common DBA tasks, see e.g.
https://help.aiven.io/en/articles/489557-postgresql-superuser-access.
In addition, service provider's web console/API can be used for performing some DBA tasks requiring more privileges than
granted for the master/admin user. However, the missing superuser access makes some existing database migrating tools,
such as `pg_dumpall`, not useful when migrating database to/from service provider.Currently this tool supports only PostgreSQL but we aim to add support for other databases, such as MySQL.
Requires Python 3.9 or newer.
## Usage
Running library module:
```
$ python3 -m aiven_db_migrate.migrate -h
Available commands: pg
```Installing in virtualenv:
```
$ python3 -m venv venv
$ . venv/bin/activate
$ ## Run make to set the proper version
$ make
$ pip install .
```This installs console scripts which have the same interface as the library module:
* `pg_migrate`: PostgreSQL migration## PostgreSQL
Requirements:
* `pg_dump`: from any PostgreSQL version between the source and target versions
* `psql`: any modern version should workRun library module:
```
$ python3 -m aiven_db_migrate.migrate pg -h
```
or, if installed:
```
$ pg_migrate -h
```Migrating is supported to the same or newer PostgreSQL version starting from PostgreSQL 10 to PostgreSQL 14.
Migrating to older version is not supported.By default it searches `pg_dump` under `/usr/`, when using PostgreSQL installs on different directory such as on Mac, use `--pgbin` parameter to define PostgreSQL home directory. e,g,
```
--pgbin /Applications/Postgres.app/Contents/Versions/14/bin
```Supports regular data dump (`pg_dump`) and [logical replication](https://www.postgresql.org/docs/current/logical-replication.html) (PostgreSQL 10 or newer).
In case that logical replication is not available or privileges/requirements are missing migrating falls back to
data dump.### CLI example
Migrating from AWS RDS to Aiven for PostgreSQL. Logical replication is enabled in source AWS RDS PostgreSQL
server and `aiven-extras` extension is installed in target database.```
$ pg_migrate -s "postgres://postgres:@jappja-pg1.chfhzaircbpb.eu-west-1.rds.amazonaws.com:5432/defaultdb" -t "postgres://avnadmin:@pg1-test-jappja-test.avns.net:26192/defaultdb?sslmode=require"# Or:
$ SOURCE_SERVICE_URI="postgres://postgres:@jappja-pg1.chfhzaircbpb.eu-west-1.rds.amazonaws.com:5432/defaultdb" TARGET_SERVICE_URI="postgres://avnadmin:@pg1-test-jappja-test.avns.net:26192/defaultdb?sslmode=require" pg_migrate
...Roles:
rolname: 'rdsadmin', rolpassword: None, status: 'failed', message: 'must be superuser to create superusers'
rolname: 'rds_password', rolpassword: None, status: 'created', message: 'role created'
rolname: 'rds_superuser', rolpassword: None, status: 'created', message: 'role created'
rolname: 'test_user1', rolpassword: 'placeholder_kfbqrvmdhgrpgpvy', status: 'created', message: 'role created'
rolname: 'rds_ad', rolpassword: None, status: 'created', message: 'role created'
rolname: 'rds_iam', rolpassword: None, status: 'created', message: 'role created'
rolname: 'rds_replication', rolpassword: None, status: 'created', message: 'role created'
rolname: 'rdsrepladmin', rolpassword: None, status: 'failed', message: 'must be superuser to create replication users'
rolname: 'postgres', rolpassword: None, status: 'exists', message: 'role already exists'
rolname: 'test_user2', rolpassword: None, status: 'created', message: 'role created'Databases:
dbaname: 'rdsadmin', method: None, status: 'failed', message: 'FATAL: pg_hba.conf rejects connection for host "80.220.195.174", user "postgres", database "rdsadmin", SSL on\nFATAL: pg_hba.conf rejects connection for host "80.220.195.174", user "postgres", database "rdsadmin", SSL off\n'
dbaname: 'defaultdb', method: 'replication', status: 'running', message: 'migrated to existing database'
```By default logical replication is left running and the created pub/sub objects need to be cleaned up once workloads have been
moved to the new server. Objects created by this tool are named like `managed_db_migrate__`.Starting from the target (using `aiven-extras` extension), get first the subscription name:
```
defaultdb= > SELECT * FROM aiven_extras.pg_list_all_subscriptions();
```
and then drop it:
```
defaultdb= > SELECT * FROM aiven_extras.pg_drop_subscription('managed_db_migrate_defaultdb_sub');
```Note that with `aiven-extras` dropping subscription in target also drops replication slot in source (`dblink`).
In the source get first the publication name:
```
defaultdb=> SELECT * FROM pg_publication;
```
and then drop it:
```
defaultdb=> DROP PUBLICATION managed_db_migrate_defaultdb_pub;
```In case that `aiven-extras` is not used clean up replication slot too:
```
defaultdb=> SELECT * FROM pg_replication_slots;
defaultdb=> SELECT * FROM pg_drop_replication_slot('managed_db_migrate_defaultdb_slot');
```Using `--max-replication-lag` waits until replication lag in bytes is less than/equal to given max replication lag. This
can be used together with `--stop-replication` to clean up all created pub/sub objects when replication is done.With `--validate` only best effort validation is run. This checks e.g. PL/pgSQL languages, extensions etc. installed
in source are also installed/available in target.Use `--no-replicate-extension-tables` to skip extension tables. By default it attempts to replicate all extension tables during logical replication.
With `--force-method` you can specify if you wish to use either replication or dump method. Otherwise the most suitable method is chosen automatically.
Using `--dbs-max-total-size` together with `--validate` you can check if the size of the source database in below some threshold.
### API example
Migrating from AWS RDS to Aiven for PostgreSQL. Logical replication is enabled in source AWS RDS PostgreSQL
server but `aiven-extras` extension is not installed in target database so migrating falls back to data dump.```
>>> from aiven.migrate import PGMigrate, PGMigrateResult
>>> pg_mig = PGMigrate(source_conn_info="postgres://postgres:@jappja-pg1.chfhzaircbpb.eu-west-1.rds.amazonaws.com:5432/defaultdb", target_conn_info="postgres://avnadmin:@pg2-test-jappja-test.avns.net:26192/defaultdb?sslmode=require")
>>> result: PGMigrateResult = pg_mig.migrate()
...
Logical replication failed with error: 'must be superuser to create subscriptions', fallback to dump
>>> result
PGMigrateResult(pg_databases={'rdsadmin': {'dbname': 'rdsadmin', 'message': 'FATAL: pg_hba.conf rejects connection for host "80.220.195.174", user "postgres", database "rdsadmin", SSL on\nFATAL: pg_hba.conf rejects connection for host "80.220.195.174", user "postgres", database "rdsadmin", SSL off\n', 'method': None, 'status': 'failed'}, 'defaultdb': {'dbname': 'defaultdb', 'message': 'migrated to existing database', 'method': 'dump', 'status': 'done'}}, pg_roles={'rdsadmin': {'message': 'must be superuser to create superusers', 'rolname': 'rdsadmin', 'rolpassword': None, 'status': 'failed'}, 'rds_password': {'message': 'role created', 'rolname': 'rds_password', 'rolpassword': None, 'status': 'created'}, 'rds_superuser': {'message': 'role created', 'rolname': 'rds_superuser', 'rolpassword': None, 'status': 'created'}, 'test_user1': {'message': 'role created', 'rolname': 'test_user1', 'rolpassword': 'placeholder_qkdryldfsrdaocio', 'status': 'created'}, 'rds_ad': {'message': 'role created', 'rolname': 'rds_ad', 'rolpassword': None, 'status': 'created'}, 'rds_iam': {'message': 'role created', 'rolname': 'rds_iam', 'rolpassword': None, 'status': 'created'}, 'rds_replication': {'message': 'role created', 'rolname': 'rds_replication', 'rolpassword': None, 'status': 'created'}, 'rdsrepladmin': {'message': 'must be superuser to create replication users', 'rolname': 'rdsrepladmin', 'rolpassword': None, 'status': 'failed'}, 'postgres': {'message': 'role already exists', 'rolname': 'postgres', 'rolpassword': None, 'status': 'exists'}, 'test_user2': {'message': 'role created', 'rolname': 'test_user2', 'rolpassword': None, 'status': 'created'}})
```### Logical replication
* requires PostgreSQL 10 or newer
* `wal_level` needs to be `logical`
* currently supports only FOR ALL TABLES publication in source
* [aiven-extras](https://github.com/aiven/aiven-extras) extension installed in both source and target database, or
* superuser or superuser-like privileges, such as `rds_replication` role in AWS RDS, in both source and target
* [AWS RDS additional settings/info](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.FeatureSupport.LogicalReplication)#### Warning
⚠️ Large objects are [unable to be replicated using logical replication](https://www.postgresql.org/docs/15/logical-replication-restrictions.html), up to and including PostgreSQL 15.
### Schemas
* schemas are migrated without object ownership; the user used for migration is given all object ownership
* NOTE: schema changes break logical replication### Roles
* roles with `LOGIN` attribute are migrated using placeholder passwords: `placeholder_<16 char random string>`
* migrating superuser or replication roles requires superuser privileges### Extensions
* requires whitelisting the extension in target with [pgextwlist](https://github.com/dimitri/pgextwlist), or
* superuser or superuser-like privileges
* [Aiven for PostgreSQL supported extensions](https://help.aiven.io/en/articles/489561-supported-postgresql-extensions)## Development
Install build depends (Fedora):
```
$ make build-dep-fedora
```Style checks:
```
$ make validate-style
```Fix style errors with:
```
$ make isort
$ make yapf
```Static checks (`flake8`, `pylint` and `mypy`):
```
$ make static-checks
```Tests (`pytest`):
```
$ make test
```Running whole test set takes time since all supported migration paths are tested. During development it's usually enough
to run tests only for a certain PostgreSQL version, e.g.:
```
$ PG_VERSION="12" make test
```It's also possible to test migration from one PostgreSQL version to another, e.g.:
```
$ PG_SOURCE_VERSION="10" PG_TARGET_VERSION="12" make test
```Test set can be targeted even further by invoking `pytest`, e.g.:
```
$ PG_SOURCE_VERSION="10" PG_TARGET_VERSION="12" python3 -m pytest -s test/test_pg_migrate.py::Test_PGMigrate::test_migrate
```# TODO
* JSON output with CLI (for automation)
* Hard to make pg_dump silent for outputting JSON to stdout
* Output json to file instead?
* More options
* --dump-only, --repl-only
* --include-databases, --exclude-databases
* --include-tables, --exclude-tables
* --role-passwords (role/passwords file for creating roles with real passwords instead of placeholders)
* More tests
* Notably error/corner cases
* Schema changes break logical replication
* While logical replication is running dump schema periodically and check if has changed,
e.g. by calculating hash of the schema dump
* How to continue if schema has changed? Stop replication, dump schema and restart replication?
* Proper README + API doc
* RPM build recipe for aiven-core/prune integration
* Test automation: Jenkins/Github Actions