https://github.com/loafoe/iron-streaming-backup

IronIO schedulable PostgreSQL backup task
https://github.com/loafoe/iron-streaming-backup

backup docker hsdp ironio postgresql

Last synced: 3 months ago
JSON representation

IronIO schedulable PostgreSQL backup task

Host: GitHub
URL: https://github.com/loafoe/iron-streaming-backup
Owner: loafoe
License: mit
Created: 2020-03-26T12:57:38.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2021-11-05T10:13:14.000Z (over 3 years ago)
Last Synced: 2025-01-20T15:17:49.138Z (5 months ago)
Topics: backup, docker, hsdp, ironio, postgresql
Language: Shell
Size: 32.2 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Codeowners: CODEOWNERS

Awesome Lists containing this project

README

# iron-streaming-backup

A Docker image that you can schedule on HSDP IronIO to perform PostgreSQL backups to an S3 bucket

# Features
- Streaming backups so not dependent on runner disk storage
- Compresses and encrypts backups

# Usage

## Prerequisites
- IronIO CLI
- Siderite CLI
- Provisioned: one or more PostgreSQL RDS instances you want to backup
- Provisioned: HSDP Iron instance
- Provisioned: HSDP S3 bucket for storing backups

## Preparing payload.json
As the image uses [siderite](https://github.com/philips-labs/siderite) for runtime orchestration all the required credentials will be passed through a `payload.json` file which will be stored encrypted in the IronIO scheduled task definition.

The payload should have the following `cmd` and `env`-ironment variables

```json
{
"version": "1",
"cmd": ["/app/backup.sh"],
"env": {
"PGPASS_FILE_BASE64": "cG9zdGdyZXMtZGIuZGVmc2ZzYS51cy1lYXN0LTEucmRzLmFtYXpvbmF3cy5jb206NTQzMjpoc2RwX3BnOnh4VXNlckF4eDp5eVBhc3N3ZEF5eTpkYjEKcG9zdGdyZXMtZGIuZGVmc2ZzYi51cy1lYXN0LTEucmRzLmFtYXpvbmF3cy5jb206NTQzMjpoc2RwX3BnOnh4VXNlckJ4eDp5eVBhc3N3ZEJ5eTpkYjIK",
"PASS_FILE_BASE64": "TXlTZWNyZXRQYXNzd29yZAo=",
"AWS_ACCESS_KEY_ID": "APIKeyHere",
"AWS_SECRET_ACCESS_KEY": "SecretKeyHere",
"S3_BUCKET": "cf-s3-some-random-uuid-here"
}
}
```

### PGPASS_FILE_BASE64
The pgpass file contains the credentials for each PostgreSQL database you want to back up. The format is one database per line:

```
hostname:port:database:username:password:someprefix
```

Example:

```
postgres-db.defsfsa.us-east-1.rds.amazonaws.com:5432:hsdp_pg:xxUserAxx:yyPasswdAyy:db1
postgres-db.defsfsb.us-east-1.rds.amazonaws.com:5432:hsdp_pg:xxUserBxx:yyPasswdByy:db2
```

Once you've prepared the file encode it using base64 to get the value to use:

```shell
cat pgpass|base64
cG9zdGdyZXMtZGIuZGVmc2ZzYS51cy1lYXN0LTEucmRzLmFtYXpvbmF3cy5jb206NTQzMjpoc2RwX3BnOnh4VXNlckF4eDp5eVBhc3N3ZEF5eTpkYjEKcG9zdGdyZXMtZGIuZGVmc2ZzYi51cy1lYXN0LTEucmRzLmFtYXpvbmF3cy5jb206NTQzMjpoc2RwX3BnOnh4VXNlckJ4eDp5eVBhc3N3ZEJ5eTpkYjIK
```

### PASS_FILE_BASE64
The pass file contains the key (password) that will be used to encrypt the database backups using AES-256

```shell
echo -n 'MySecretPassword'|base64
TXlTZWNyZXRQYXNzd29yZA==
```

### AWS_ACCESS_KEY_ID
This should be the `api_key` of the HSDP S3 Bucket you provisioned

### AWS_SECRET_ACCESS_KEY
This should be the `secret_key` of the HSDP S3 Bucket you provisioned

### S3_BUCKET
This should be the `bucket` of the HSDP S3 Bucket you provisioned

# Scheduling the task
Once you've prepared the `payload.json` file can you encrypt it using `siderite`

```shell
cat payload.json|siderite encrypt > payload.enc
```

Now you need the IronIO cluster ID

```shell
cat ~/.iron.json |jq -r .cluster_info[0].cluster_id
56someclusteridhere34554
````

Register the `iron-streaming-backup` Docker image in IronIO. You only need to do this once or after updating or publishing the Docker image in this repository

```shell
iron register philipslabs/iron-streaming-backup:latest
```

Finally, you can schedule the task. In the below example the backup task will run once every day

```shell
iron worker schedule \
-cluster 56someclusteridhere34554 \
-run-every 86400 \
-payload-file payload.enc philipslabs/iron-streaming-backup
```

# Bucket lifecycle policy
It is advised to set a S3 Bucket lifecycle policy. A good practice is to move your database backups to the `GLACIER` storage class after a couple of days and to set a expiration date to automatically delete older backups. The below policy moves dumps to `CLACIER` after 7 days and deletes them after 6 months (180 days)

```json
[
{
"Expiration": {
"Days": 180
},
"ID": "Move to Glacier and expire after 6 months",
"Prefix": "",
"Status": "Enabled",
"Transitions": [
{
"Days": 7,
"StorageClass": "GLACIER"
}
]
}
]
```

# Retrieving and decrypting a backup
- Copy the `.gz.aes` file from the bucket back to your restore system
- Decrypting the file, assuming your password is stored in the file `${password_file}`:
```shell
openssl enc -in backup_file.gz.aes -aes-256-cbc -d -pass file:${password_file} |gzip -d > pg_dump_file.sql
```

# License

License is MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/loafoe/iron-streaming-backup

Awesome Lists containing this project

README