https://github.com/percona-lab/percona-binlog-server
Percona Binary Log Server
https://github.com/percona-lab/percona-binlog-server
Last synced: 29 days ago
JSON representation
Percona Binary Log Server
- Host: GitHub
- URL: https://github.com/percona-lab/percona-binlog-server
- Owner: Percona-Lab
- License: gpl-2.0
- Created: 2023-03-10T15:13:03.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-22T01:00:14.000Z (about 1 year ago)
- Last Synced: 2025-05-05T22:18:08.175Z (11 months ago)
- Language: C++
- Size: 304 KB
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Percona Binary Log Server
`binlog_server` is a command line utility that can be considered as an enhanced version of [mysqlbinlog](https://dev.mysql.com/doc/refman/8.0/en/mysqlbinlog.html) in [--read-from-remote-server](https://dev.mysql.com/doc/refman/8.0/en/mysqlbinlog.html#option_mysqlbinlog_read-from-remote-server) mode which serves as a replication client and can stream binary log events from a remote [Oracle MySQL Server](https://www.mysql.com/) / [Percona Server for MySQL](https://www.percona.com/mysql/software/percona-server-for-mysql) both to a local filesystem and to a cloud storage (currently to `AWS S3` or `S3`-compatible service like `MinIO`). It is capable of automatically reconnecting to the remote server and resume operation from the point when it was previously stopped / terminated.
It is written in portable c++ following the c++20 standard best practices.
## Installation
### Prebuilt Binaries
Currently prebuilt binaries are not available.
### Building from source
#### Dependencies
- [CMake](https://cmake.org/) 3.20.0+
- [Clang](https://clang.llvm.org/) (`clang-15` .. `clang-19`) or [GCC](https://gcc.gnu.org/) (`gcc-12` .. `gcc-14`)
- [Boost libraries](https://www.boost.org/) 1.88.0 (git version, not the source tarball)
- [MySQL client library](https://dev.mysql.com/doc/c-api/8.0/en/) 8.0.x (`libmysqlclient`)
- [CURL library](https://curl.se/libcurl/) (`libcurl`) 8.6.0+
- [AWS SDK for C++](https://aws.amazon.com/sdk-for-cpp/) 1.11.570
#### Instructions
##### Creating a build workspace
```bash
mkdir ws
```
Every next step will assume that we are currently inside the `ws` directory unless explicitly stated otherwise.
##### Getting the build scripts and the source code
```bash
git clone https://github.com/Percona-Lab/percona-binlog-server.git
```
##### Defining environment variables affecting build configurations
Define `BUILD_PRESET` depending on whether you want to build in `Debug`, `Release`, or `Debug` with `Address Sanitizer` configuration and which toolset you would like to use.
```bash
export BUILD_PRESET=_
```
The supported values for `` are `debug`, `release`, and `asan`.
The supported values for `` are `gcc14` and `clang19`.
For instance, if you want to build in `RelWithDebInfo` configuration using `GCC 14`, please specify
```bash
export BUILD_PRESET=release_gcc14
```
##### Boost Libraries
###### Getting Boost Libraries source
```bash
git clone --recurse-submodules -b boost-1.88.0 --jobs=8 https://github.com/boostorg/boost.git
cd boost
git switch -c required_release
```
###### Copying CMake presets for Boost Libraries
```bash
cp ./percona-binlog-server/extra/cmake_presets/boost/CMakePresets.json ./boost
```
###### Configuring Boost Libraries
```bash
cmake ./boost --preset ${BUILD_PRESET}
```
###### Building Boost Libraries
```bash
cmake --build ./boost-build-${BUILD_PRESET} --parallel
```
###### Installing Boost Libraries
```bash
cmake --install ./boost-build-${BUILD_PRESET}
```
##### AWS SDK CPP Libraries
###### Getting AWS SDK CPP Libraries source
```bash
git clone --recurse-submodules -b 1.11.570 --jobs=8 https://github.com/aws/aws-sdk-cpp
cd aws-sdk-cpp
git switch -c required_release
```
###### Copying CMake presets for AWS SDK CPP Libraries
```bash
cp ./percona-binlog-server/extra/cmake_presets/aws-sdk-cpp/CMakePresets.json ./aws-sdk-cpp
```
###### Configuring AWS SDK CPP Libraries
```bash
cmake ./aws-sdk-cpp --preset ${BUILD_PRESET}
```
###### Building AWS SDK CPP Libraries
```bash
cmake --build ./aws-sdk-cpp-build-${BUILD_PRESET} --parallel
```
###### Installing AWS SDK CPP Libraries
```bash
cmake --install ./aws-sdk-cpp-build-${BUILD_PRESET}
```
##### Main Application
###### Getting Main Application source
Main application source code should have already been cloned from the git repo during the "Getting the build scripts and the source code" step.
###### Configuring Main Application
```bash
cmake ./percona-binlog-server --preset ${BUILD_PRESET}
```
###### Building Main Application
```bash
cmake --build ./percona-binlog-server-build-${BUILD_PRESET} --parallel
```
###### Main Application binary
The result binary can be found under the following path `ws/percona-binlog-server-build-${BUILD_PRESET}/binlog_server`
## Usage
### Command line arguments
Please run
```bash
./binlog_server version
./binlog_server fetch
./binlog_server pull
./binlog_server search_by_timestamp
./binlog_server search_by_gtid_set
```
where
`` is a path to a JSON configuration file (described below),
`` is a valid timestamp in ISO format (e.g. `2026-02-10T14:30:00`),
`` is a valid gtid set (e.g. `11111111-aaaa-1111-aaaa-111111111111:1:3, 22222222-bbbb-2222-bbbb-222222222222:1-6`).
### Operation modes
Percona Binary Log Server utility can operate in five modes:
- 'version'
- 'search_by_timestamp'
- 'search_by_gtid_set'
- 'fetch'
- 'pull'
#### 'version' operation mode
In this mode the utility simply prints its current [semantic version](https://en.wikipedia.org/wiki/Software_versioning) (embedded into the binary) to the standard output and exits with "success" (`0`) exit code.
For instance,
```bash
./binlog_server version
```
may print
```
0.1.0
```
#### 'search_by_timestamp' operation mode
In this mode the utility requires one additional command line parameter `` and will print to the standard output the list of binlog files stored in the Binary Log Server data directory that have at least one event whose timestamp is less or equal to the provided ``.
Along with the file name the output will also return its current size in bytes, timestamps, URI and optional initial / added GTIDs (when the replication is configured to use GTID mode).
For instance,
```bash
./binlog_server search_by_timestamp config.json 2026-02-10T14:30:00
```
may print
```json
{
"status": "success",
"result": [
{
"name": "binlog.000001",
"size": 134217728,
"uri": "s3://binsrv-bucket/storage/binlog.000001",
"min_timestamp": "2026-02-09T17:22:01",
"max_timestamp": "2026-02-09T17:22:08",
"initial_gtids": "",
"added_gtids": "11111111-aaaa-1111-aaaa-111111111111:1-123456"
},
{
"name": "binlog.000002",
"size": 134217728,
"uri": "s3://binsrv-bucket/storage/binlog.000002",
"min_timestamp": "2026-02-09T17:22:08",
"max_timestamp": "2026-02-09T17:22:09",
"initial_gtids": "11111111-aaaa-1111-aaaa-111111111111:1-123456",
"added_gtids": "11111111-aaaa-1111-aaaa-111111111111:123457-246912"
}
]
}
```
If an error occurs,
```json
{
"status": "error",
"message": ""
}
```
The `` may be one of the following (but not limited to):
- `Invalid timestamp format`
- `Binlog storage is empty`
- `Timestamp is too old`
#### 'search_by_gtid_set' operation mode
In this mode the utility requires one additional command line parameter `` and will print to the standard output the minimal set of binlog files stored in the Binary Log Server data directory required to cover the specidfied GTID set ``. This operation makes sense only when the storage we are querying was created in GTID-based replication mode.
Along with the file name the output will also return its current size in bytes, timestamps, URI and optional initial / added GTIDs.
For instance,
```bash
./binlog_server search_by_timestamp config.json 11111111-aaaa-1111-aaaa-111111111111:10-20
```
may print
```json
{
"status": "success",
"result": [
{
"name": "binlog.000001",
"size": 134217728,
"uri": "s3://binsrv-bucket/storage/binlog.000001",
"min_timestamp": "2026-02-09T17:22:01",
"max_timestamp": "2026-02-09T17:22:08",
"initial_gtids": "",
"added_gtids": "11111111-aaaa-1111-aaaa-111111111111:1-123456"
}
]
}
```
whereas
```bash
./binlog_server search_by_timestamp config.json 11111111-aaaa-1111-aaaa-111111111111:100000-100001:200000-200001
```
may print
```json
{
"status": "success",
"result": [
{
"name": "binlog.000001",
"size": 134217728,
"uri": "s3://binsrv-bucket/storage/binlog.000001",
"min_timestamp": "2026-02-09T17:22:01",
"max_timestamp": "2026-02-09T17:22:08",
"initial_gtids": "",
"added_gtids": "11111111-aaaa-1111-aaaa-111111111111:1-123456",
},
{
"name": "binlog.000002",
"size": 134217728,
"uri": "s3://binsrv-bucket/storage/binlog.000002",
"min_timestamp": "2026-02-09T17:22:08",
"max_timestamp": "2026-02-09T17:22:09",
"initial_gtids": "11111111-aaaa-1111-aaaa-111111111111:1-123456",
"added_gtids": "11111111-aaaa-1111-aaaa-111111111111:123457-246912"
}
]
}
```
If an error occurs,
```json
{
"status": "error",
"message": ""
}
```
The `` may be one of the following (but not limited to):
- `cannot parse GTID set`
- `Binlog storage is empty`
- `The specified GTID set cannot be covered`
- `GTID set search is not supported in storages created in position-based replication mode`
#### 'fetch' operation mode
In this mode the utility tries to connect to a remote MySQL server, switch connection to replication mode and read events from all available binary logs already stored on the server. After reading the very last event, the utility gracefully disconnects and exits.
Any error (network issues, server down, out of space, etc) encountered in this mode results in immediate termination of the program making sure that storage is left in consistent state.
#### 'pull' operation mode
In this mode the utility continuously tries to connect to a remote MySQL server / switch to replication mode and read binary log events. After reading the very last one, the utility does not close the connection immediately but instead waits for `` seconds for the server to generate more events. If this period of time elapses, the utility closes the MySQL connection and enters the `idle` mode. In this mode it just waits for `` seconds in disconnected state. After that another reconnection attempt is made and everything starts from the beginning.
Any network-related error (network issues, server down, etc) encountered in this mode does not result in immediate termination of the program. Instead, another reconnection attempt is made. More serious errors (like out of space, etc.) cause program termination.
### JSON Configuration file
The Percona Binary Log Server configuration file has the following format.
```json
{
"logger": {
"level": "debug",
"file": "binsrv.log"
},
"connection": {
"host": "127.0.0.1",
"port": 3306,
"user": "rpl_user",
"password": "rpl_password",
"connect_timeout": 20,
"read_timeout": 60,
"write_timeout": 60,
"ssl": {
"mode": "verify_identity",
"ca": "/etc/mysql/ca.pem",
"capath": "/etc/mysql/cadir",
"crl": "/etc/mysql/crl-client-revoked.crl",
"crlpath": "/etc/mysql/crldir",
"cert": "/etc/mysql/client-cert.pem",
"key": "/etc/mysql/client-key.pem",
"cipher": "ECDHE-RSA-AES128-GCM-SHA256"
},
"tls": {
"ciphersuites": "TLS_AES_256_GCM_SHA384",
"version": "TLSv1.3"
}
},
"replication": {
"server_id": 42,
"idle_time": 10,
"verify_checksum": true,
"mode": "position"
},
"storage": {
"backend": "s3",
"uri": "https://key_id:secret@192.168.0.100:9000/binsrv-bucket/vault",
"fs_buffer_directory": "/tmp/binsrv",
"checkpoint_size": "128M",
"checkpoint_interval": "30s"
}
}
```
#### \ section
- `` sets the minimum severity of the log messages that user want to appear in the log output, can be one of the `trace` / `debug` / `info` / `warning` / `error` / `fatal` (explained below).
- `` can be either a path to a file on a local filesytem to which all log messages will be written or an empty string `""` meaning that all the output will be made to console (`STDOUT`).
##### Logger message severity levels
Each message written to the log has the `severity` level associated with it.
Currently we use the following mapping:
- `fatal` - currently not used,
- `error` - used for printing messages coming from caught exceptions,
- `warning` - currently not used,
- `info` - primary log severity level used mostly to indicate progress (configuration file read, storage created, connection established, etc.),
- `debug` - used to print function names from caught exceptions and to print the data from parsed binary log events,
- `trace` - used to print source file name / line number / position from caught exceptions and to print raw data (hex dumps) of binary log events.
#### \ section
- `` - MySQL server host name (e.g. `127.0.0.1`, `192.168.0.100`, `dbsrv.mydomain.com`, etc.). Please do not use `localhost` here as it will be interpreted differently by the `libmysqlclient` and will instruct the library to use Unix socket file for connection instead of TCP protocol - use `127.0.0.1` instead (see [this page](https://dev.mysql.com/doc/c-api/8.0/en/mysql-real-connect.html) for more details).
- `` - MySQL server port (e.g. `3306` - the default MySQL server port).
- `` - the name of a DNS SRV record that determines the candidate hosts to use for establishing a connection to a MySQL server ([--dns-srv-name](https://dev.mysql.com/doc/refman/8.4/en/mysql-command-options.html#option_mysql_dns-srv-name) `mysql` utility command line option)
- `` - the name of the MySQL user that has [REPLICATION SLAVE](https://dev.mysql.com/doc/refman/8.0/en/replication-howto-repuser.html) privilege.
- `` - the password for this MySQL user.
- `` - the number of seconds the MySQL client library will wait to establish a connection with a remote host.
- `` - the number of seconds the MySQL client library will wait to read data from a remote server (this parameter may affect the responsiveness of the program to graceful termination - see below).
- `` - the number of seconds the MySQL client library will wait to write data to a remote server.
Note: you should specify either `` / `` pair or single ``.
#### \ optional section
- `` - specifies the desired security state of the connection to the MySQL server, can be one of the `disabled` / `preferred` / `required` / `verify_ca` / `verify_identity` ([--ssl-mode](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-mode) `mysql` utility command line option).
- `` (optional) - specifies the file that contains the list of trusted SSL Certificate Authorities ([--ssl-ca](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-ca) `mysql` utility command line option).
- `` (optional) - specifies the directory that contains trusted SSL Certificate Authority certificate files (an equivalent of the [--ssl-capath](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-capath) `mysql` utility command line option).
- `` (optional) - specifies the file that contains certificate revocation lists ([--ssl-crl](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-crl) `mysql` utility command line option).
- `` (optional) - specifies the directory that contains certificate revocation-list files (an equivalent of the [--ssl-crlpath](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-crlpath) `mysql` utility command line option).
- `` (optional) - specifies the file that contains a X.509 client certificate ([--ssl-cert](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-cert) `mysql` utility command line option).
- `` (optional) - specifies the file that contains the private key associated with the `` ([--ssl-key](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-key) `mysql` utility command line option).
- `` (optional) - specifies the list of permissible ciphers for connection encryption ([--ssl-cipher](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_ssl-cipher) `mysql` utility command line option).
#### \ optional section
- `` (optional) - specifies the list of permissible TLSv1.3 ciphersuites for encrypted connections ([--tls-ciphersuites](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_tls-ciphersuites) `mysql` utility command line option).
- `` (optional) - specifies the list of permissible TLS protocols for encrypted connections ([--tls-version](https://dev.mysql.com/doc/refman/8.4/en/connection-options.html#option_general_tls-version) `mysql` utility command line option).
#### \ section
- `` - specifies the server ID that the utility will be using when connecting to a remote MySQL server (similar to [--connection-server-id](https://dev.mysql.com/doc/refman/8.0/en/mysqlbinlog.html#option_mysqlbinlog_connection-server-id) `mysqlbinlog` command line option).
- `` - the number of seconds the utility will spend in disconnected mode between reconnection attempts.
- `` - a boolean value which specifies whether the utility should verify event checksums.
- `` - the replication mode, can be either `position` for position-based replication or `gtid` for GTID-based replication.
#### \ section
- `` - the type of the storage where the received binary logs should be stored:
- `file` - local filesystem
- `s3` - `AWS S3` or `S3`-compatible server (MinIO, etc.)
- `` - specifies the location (either local or remote) where the received binary logs should be stored
- `` (optional) - specifies the location on the local filesystem where partially downloaded binlog files should be stored. If not specified, the value of the default OS temporary directory will be used (e.g. '/tmp' on Linux). Currently, this parameter is meaningful only for non-`file` storage backends.
- `` (optional) - specifies data portion size after receiving which backend storage should flush its internal buffers and write received binlog data permanently. If not set or set to zero, checkpointing by size will be disabled. The value is expected to be a string containing an integer followed by an optional suffix 'K' / 'M' / 'G' / 'T' / 'P', e.g. /\d+\[KMGTP\]?/:
- 'no suffix' (e.g. "42") means no multiplier, the size will be interpreted in bytes ('42 * 1' bytes)
- 'K' (e.g. "42K") means '2^10' multiplier ('42 * 1024' bytes)
- 'M' (e.g. "42M") means '2^20' multiplier ('42 * 1048576' bytes)
- 'G' (e.g. "42G") means '2^30' multiplier ('42 * 2^20' bytes)
- 'T' (e.g. "42T") means '2^40' multiplier ('42 * 2^40' bytes)
- 'P' (e.g. "42P") means '2^50' multiplier ('42 * 2^50' bytes)
- `` (optional) - specifies time interval after achieving which backend storage should flush its internal buffers and write received binlog data permanently. If not set or set to zero, checkpointing by time interval will be disabled. The value is expected to be a string containing an integer followed by an optional suffix 's' / 'm' / 'h' / 'd' , e.g. /\d+\[smhd\]?/:
- 'no suffix' (e.g. "42") or 's' (e.g. "42s") means seconds
- 'm' (e.g. "42m") means minutes ('42 * 60' seconds)
- 'h' (e.g. "42h") means hours ('42 * 60 * 60' seconds)
- 'd' (e.g. "42d") means days ('42 * 60 * 60 *24' seconds)
##### Storage URI format
- When `` is set to `file`, `` must be `file://...`.
- When `` is set to `s3`, `` can be either:
- `s3://...` for `AWS S3`,
- `http://...` or `https://...` for `S3`-compatible services.
###### Local filesystem storage URIs
In case of local filesystem, the URIs must have the following format.
`file://`, where `` is an absolute path on a local filesystem to a directory where downloaded binary log files must be stored. Relative paths are not supported. For instance, `file:///home/user/vault`.
Please notice 3 forward slashes `/` (2 from the protocol part `file://` and 1 from the absolute path).
###### AWS S3 storage URIs
In case of `AWS S3`, the URIs must have the following format.
`s3://[:@][.]/`, where:
- `` - the AWS key ID (the `` / `` pair is optional),
- ``- the AWS secret access key (the `` / `` pair is optional),
- `` - the name of AWS S3 bucket in which the data must be stored,
- `` - the name of the AWS region (e.g. `us-east-1`) where this bucket was created (optional, if omitted, it will be auto-detected),
- `` - a virtual path (key prefix) inside the bucket under which all the binary log files will be stored.
Note: your `` along with alphanumeric characters (`[a-zA-Z0-9]`) may also include `+` and `/` and because it needs to be inserted into the `userinfo` part of the URI, it is necessary to perform URL-encoding of selected special characters: `/` needs to be transformed into `%2F`, `+` can be left as is.
In case of `S3`-compatible service with custom endpoint, the URIs must have the following format.
`http[s]://[:@][:]//`, where:
- `` - either a host name or an IP address of an `S3`-compatible server,
- `` - the port of an `S3`-compatible server to connect to (optional, if omitted, it will be either 80 or 443, depending of the URI scheme: HTTP or HTTPS).
Please notice that in this case `` must be specified as the very first segment of the URI path.
For example:
- `s3://binsrv-bucket/vault` - no AWS credentials specified, `binsrv-bucket` bucket must be publicly write-accessible, the region will be auto-detected, `/vault` will be the virtual directory.
- `s3://binsrv-bucket.us-east-1/vault` - no AWS credentials specified, `binsrv-bucket` bucket must be publicly write-accessible, the bucket must be created in the `us-east-1` region, `/vault` will be the virtual directory.
- `s3://key_id:secret@binsrv-bucket.us-east-1/vault` - `key_id` will be used as `AWS_ACCESS_KEY_ID`, `secret` will be used as `AWS_SECRET_ACCESS_KEY`, `binsrv-bucket` will be the name of the bucket, the bucket must be created in the `us-east-1` region, `/vault` will be the virtual directory.
- `http://key_id:secret@localhost:9000/binsrv-bucket/vault` - `key_id` will be used as `AWS_ACCESS_KEY_ID`, `secret` will be used as `AWS_SECRET_ACCESS_KEY`, `binsrv-bucket` will be the name of the bucket, `/vault` will be the virtual directory, `localhost:9000` will be the custom endpoint of the `S3`-compatible server, the connection will be established via non-secure HTTP protocol.
- `https://key_id:secret@192.168.0.100:9000/binsrv-bucket/vault` - `key_id` will be used as `AWS_ACCESS_KEY_ID`, `secret` will be used as `AWS_SECRET_ACCESS_KEY`, `binsrv-bucket` will be the name of the bucket, `/vault` will be the virtual directory, `192.168.0.100:9000` will be the custom endpoint of the `S3`-compatible server, the connection will be established via secure HTTPS protocol.
##### Checkpointing on S3
Please note that S3 API does not provide a way to append a portion of data to an existing object. Currently, in our S3 storage backend "append" operations are implemented as complete object overwrites meaning data re-uploads. Practically, if your typical binlog file size is '1G' and you set `` to '256M', you will upload '256M + 512M + 768M + 1024M = 2560M' (about 2.5 times more then your binlog file size in this example). So, keep balance between the value of this parameter and your tipical binlog size. Similar concerns can be rised regarding enabling ``.
### Resuming previous operation
Running the utility for the second time (in any mode) results in resuming streaming from the position at which the previous run finished.
### Graceful termination
The user can request the utility operating in either `fetch` or `pull` mode to be gracefully terminated leaving storage in consistent state. For this, the utility sets custom handlers for the the following POSIX signals.
- `SIGINT` - for processing `^C` in console.
- `SIGTERM` - for processing `kill `.
Because of the synchronous nature of the binlog API from the MySQL client library, there still may be a delay between receiving the signal and reacting to it. Worst case scenario, user will have to wait for `` seconds (the value from the configuration) + 1 (the granularity of sleep intervals in the `idle` mode) seconds.
Please note that killing the program with `kill -9 ` does not guarantee to flush all the internal file buffers / upload temporary data to a cloud storage and may result in losing some progress.
## Licensing
Percona is dedicated to **keeping open source open**. Whenever possible, we strive to include permissive licensing for both our software and documentation. For this project, we are using version 2 of the GNU General Public License (GPLv2).