https://github.com/bbepis/mysqlchump
mysqldump alternative
https://github.com/bbepis/mysqlchump
Last synced: 12 months ago
JSON representation
mysqldump alternative
- Host: GitHub
- URL: https://github.com/bbepis/mysqlchump
- Owner: bbepis
- License: agpl-3.0
- Created: 2019-09-04T13:22:47.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-06-14T06:51:19.000Z (about 1 year ago)
- Last Synced: 2025-06-14T07:44:05.128Z (about 1 year ago)
- Language: C#
- Size: 364 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# mysqlchump
**mysqlchump** is a reimplementation of mysqldump focused on performance. Working around system & resource constraints was one of the requirements of this tool and as such plays a large part in how it is designed.
* * *
## Features
- Native Windows support, especially since almost every other database tool doesn't provide it
- Statically compiled binaries are available for exotic & constrained environments, only requiring glibc or musl
- Significantly improved performance compared to mysqldump; exports up to 2x faster, and imports potentially up to 16x faster
- Works with any MySQL-compatible database
- Sane settings & handling:
- MySQL connections are always utf8mb4
- Binary fields dumped as hex blobs or base64
- Dumps are always read and written as UTF-8
- Dumps are always done on a single transaction to ensure data consistency
- Does not drop tables if they already exist, instead skipping their creation
- Does not lock tables when dumping so that it can be used against a live database without issue
- Supports reading and writing to/from CSV, JSON, and SQL (mysqldump-compatible) formats
- Can directly import existing mysqldump backups at the same high speed (read limitations below)
- Supports both compliant CSV (generated by other tools and sources) and bespoke MySQL CSV (`INTO OUTFILE`) for reading and writing. (`-f csv` for standard, `-f mysqlcsv` for MySQL CSV)
- Supports piping data in/out in MySQL, JSON (and single-table CSV) formats. Does not require intermediate storage like mydumper, so data can be read from/written to compression tools or network locations
- Can perform partial restores by specifying which tables to restore from a dump. Can also rename tables on restore (when restoring a single table at a time)
- Also supports partial backups and/or data transformation applied to exported data; `-q` controls the query mysqlchump uses to retrieve data
- Unlike mydumper it doesn't need the import mechanism (SQL or `LOAD DATA INFILE`) to be specified in advance; it can generate the data on the fly from any input format
- It can also replace inserts with `INSERT IGNORE` without needing the data to be initially dumped as such
* * *
## Benchmark
All benchmarks were performed on a database that contains mostly text & is roughly 9.3GB in size. The host machine is running Windows 10 on a Ryzen 7 9700X, with the following my.ini config:
```ini
[mysqld]
skip-log-bin
local-infile=1
innodb-buffer-pool-size = 32G
innodb-buffer-pool-instances = 8
innodb-buffer-pool-chunk-size = 4G
innodb_flush_method=normal
innodb-write-io-threads = 64
innodb-read-io-threads = 64
```
### Export
| | Command | Time | Rate | Factor |
| ----------------- | ------------------------------------------------ | ---- | ----------- | --------- |
| mysqldump | `mysqldump database --hex-blob > NUL` | 1:32 | 103.62 MB/s | 1.00x |
| mysqlchump (SQL) | `mysqlchump export -d database -f mysql - > NUL` | 0:57 | 167.25 MB/s | **1.61x** |
| mysqlchump (JSON) | `mysqlchump export -d database -f json - > NUL` | 0:54 | 176.55 MB/s | **1.70x** |
### Import
| | Command | Time | Rate | Factor |
| ------------------- | ---------------------------------------------------------------------------------------------------------------- | ----- | ----------- | ---------- |
| mysql | `mysql -u root --default-character-set=utf8mb4 restore-test < dump.sql` | 15:12 | 10.45 MB/s | 1.00x |
| mysqlchump | `mysqlchump import -d restore-test -j 4 --defer-indexes -f mysql dump.sql` | 2:45 | 57.78 MB/s | **5.53x** |
| mysqlchump (unsafe) | `mysqlchump import -d restore-test -j 8 --defer-indexes -m LoadDataInfile --aggressive-unsafe -f mysql dump.sql` | 1:15 | 127.11 MB/s | **12.16x** |
* * *
## JSON format
My recommendation is using the JSON format over MySQL for dumping for the following reasons:
- It is easily machine parsable by everything under the sun, meaning that other tools are able to read from the dumps if need be without needing to restore to database first
- Unlike Avro / Parquet / other more specialized table container formats, it fully supports sequential reading & writing meaning that it can be used in conditions that require the data being piped
- Unlike CSV, metadata about the database/tables are able to be stored and it supports storing more than one table
- It is significantly less complex to parse than SQL, meaning that:
- It's much more performant in reading and writing
- It's also much less vulnerable to parsing bugs and issues
Regardless of this I do fully support sql dumping and importing. JSON dumps suit my use case much more so its implementation has much more experience in the field
* * *
## CLI usage
#### Exporting
```
Description:
Exports data from a database
Usage:
mysqlchump export [] [options]
Arguments:
Specify either a file or a folder to output to. '-' for stdout, otherwise defaults to creating
files in the current directory
Options:
-t, --table The table to be dumped. Can be specified multiple times, or passed '*'
to dump all tables. Supports globbing with '*' and '?' characters
--tables A comma-separated list of tables to dump.
--connection-string A connection string to use to connect to the database. Not required if
-s -d -u -p have been specified
-s, --server The server to connect to. [default: localhost]
-o, --port The port of the server to connect to. [default: 3306]
-d, --database The database to use when dumping.
-u, --username The username to connect with. [default: root]
-p, --password The password to connect with.
-f, --output-format The output format to create when dumping. [default: mysql]
-q, --select The select query to use when filtering rows/columns. If not specified,
will dump the entire table.
Table being examined is specified with "{table}". [default: SELECT *
FROM `{table}`]
--no-creation Don't output table creation statements.
--truncate Prepend data insertions with a TRUNCATE command.
--silent Prevents all progress-related output
-?, -h, --help Show help and usage information
```
#### Importing
```
Description:
Imports data to a database
Usage:
mysqlchump import [options]
Arguments:
Specify a file to read from. Otherwise - for stdin
Options:
--connection-string A connection string to use to connect to the database. Not
required if -s -d -u -p have been specified
-s, --server The server to connect to. [default: localhost]
-o, --port The port of the server to connect to. [default: 3306]
-d, --database The database to use when dumping.
-u, --username The username to connect with. [default: root]
-p, --password The password to connect with.
-f, --input-format The input format to use when importing. [default: mysql]
-m, --import-mechanism The import mechanism to use when importing. [default:
SqlStatements]
-t, --table The destination table name to import to. Required for CSV
data, optional for others
-j, --parallel The amount of parallel insert threads to use. [default: 12]
--insert-ignore Changes INSERT to INSERT IGNORE. Useful for loading into
existing datasets, but can be slower
--csv-columns A comma-separated list of columns that the CSV corresponds to.
Ignored if --csv-use-headers is specified
--csv-use-headers Use the first row in the CSV as header data to determine
column names.
--no-creation (JSON only) Don't run CREATE TABLE statement.
--aggressive-unsafe Enables aggressive binary log options to drastically increase
import speed. Note that this requires root, or an account with
the SUPER privilege. If the database crashes during import,
ALL databases in the database could become corrupt.
--set-innodb (JSON only) Forces created tables to use InnoDB as the storage
engine, with ROW_FORMAT=DYNAMIC. Removes any COMPRESSION
option (e.g. from TokuDB)
--set-compressed (JSON only) Forces created tables to use ROW_FORMAT=COMPRESSED
(overrides --set-innodb)
-T, --source-table (JSON only) List of tables to import from, from a dump that
contains multiple tables
--defer-indexes (JSON only) Does not create indexes upfront, but instead after
data has been inserted for better performance. Does nothing
with --no-create
--strip-indexes (JSON only) Does not create indexes at all. Does nothing with
--no-creation or --defer-indexes
--silent Prevents all progress-related output
-?, -h, --help Show help and usage information
```
* * *
## Limitations
- This tool currently does not handle triggers, stored procedures or non-table structures at all. You can use this mysqldump command to dump them separately, and import them after the data import has taken place:
- `mysqldump [credentials] --no-create-info --no-data --routines --triggers [database_name] > procedures_and_triggers.sql`
- When importing dumps made by mysqldump, if it contains binary data it **must** have been exported with `--hex-blob`. The tool is unable to read the file otherwise, as without it mysqldump writes binary data completely unescaped which this tool can't read
- When importing CSV files, the destination table must already be created. Any columns handling binary data must be base64 encoded, and any date columns must be UTC in the format `yyyy-MM-dd HH:mm:ss`
- The SQL parser may struggle on exotic column/index definitions
* * *
## Tuning imports for performance & stability
There's two aspects to this, from both the tool's side and the target database's side:
#### Tool argument tuning
- The biggest improvement this tool provides is splitting up insert queries from a single source across many MySQL connections. In this era of SSD and NVMe drives, the bottleneck is not I/O anymore; it's instead the fact that MySQL handles each query in a single thread. Splitting up the inserts across multiple connections/queries means that you're able to leverage every CPU core to perform imports. `-j` controls how many connections should be used to perform inserts in parallel; you will likely not see any improvements setting this higher than your core count
- It's faster to create indexes after data has been inserted rather than having each new row update the index as the import happens. There are two arguments to control this:
- `--defer-indexes` will remove any indexes from CREATE TABLE statements, and after importing data for the table it'll recreate them.
- `--strip-indexes` does the same thing but does not recreate the indexes afterwards. Useful if the dump contains an obscene amount of indexes you don't want to use or waste time/space restoring
- There's two ways data can be inserted into tables: from SQL `INSERT` queries and from `LOAD DATA INFILE` (typically using CSVs as the source). Due to the way data files are loaded from the client to the server, mysqlchump can actually generate this input CSV on the fly, even from data sources that aren't CSV. It's usually faster to use this as the server doesn't need to waste any processing time parsing SQL, however due to the added complexity this can potentially be a bit more unstable. mysqlchump will default to SQL INSERTs, however to use the other you can pass `-m LoadDataInfile`.
- This requires `local-infile=1` to be set, either at runtime or in `my.cnf`/`my.ini`. mysqlchump can temporarily set this flag for you if the user you provide is either `root` or has `SUPER` privileges.
There's one last flag that gives a massive boost which requires its own section: `--aggressive-unsafe`.
It will significantly improve import speeds by disabling a lot of safety features and extra I/O that is unnecessary for bulk insertion, however as the name implies it is **dangerous**. If the MySQL server crashes while an import with this flag is occurring, **every database on the instance will very likely become corrupt**. Even if the data doesn't become corrupt, MySQL will force you to use `innodb_force_recovery = 6` which will make SELECTs 100x slower.
Ironically as import speed goes up, so does the chance for crashes (mitigation is described below). Therefore you should only use this flag against completely fresh database instances, or instances with no other data that you care about. (And you should be prepared to wipe and reinitialize the data directory after crashes, because "potential corruption" is permanently flagged and cannot be removed)
This flag requires \`root\` or `SUPER` privileges. It does the following:
- Temporarily disables the redo log via `ALTER INSTANCE DISABLE INNODB REDO_LOG`
- Temporarily disables binary logging via `SET sql_log_bin = 'OFF'` and (permanently) purges all previous binary logs via `PURGE BINARY LOGS BEFORE NOW()`
- Temporarily hinders the double write buffer via `SET GLOBAL innodb_doublewrite = 'DETECT_ONLY'`
While I do set these server variables back to what they were before after the import completes, if the tool encounters an issue or is terminated outside of a regular SIGINT then these flags can remain with their unsafe values; make sure to check that these variables still aren't set to these values after doing so.
You have been warned.
#### MySQL configuration (`my.cnf` / `my.ini`)
There are a few flags you can set (or should be aware of) in the `[mysqld]` section of your MySQL config file:
- `skip-log-bin` will disable binary logging. If you're not using replication at all, this will disable a whole lot of disk I/O and space which can speed things up.
- Setting `innodb-read-io-threads = 64` and `innodb-write-io-threads = 64` (their maximums) can potentially improve speed in NVMe environments; your mileage may vary
- `innodb-flush-log-at-*` variables will likely not affect anything as mysqlchump will not commit transactions until all of the data has been inserted
- Changing `innodb_flush_method` can potentially improve things; on Windows setting it to `normal` will cause MySQL to use buffered I/O, which can help a lot if MySQL is rapidly performing changes to the same disk pages. For Unix I'm not sure; you'll have to consult the [documentation](https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit)
InnoDB buffer pool size also requires its own section.
MySQL is a bit like a dumb pet; as long as you keep putting food (data to insert) in front of it, it will keep consuming and consuming, completely ignoring the fact that it's actually full, and will eventually throw it back up all over the carpet (crash). The only ways I've seen to mitigate this is to either put it on a diet (reduce the speed of imports) or increase its stomach size (increase the InnoDB buffer pool). Let's talk about the latter.
There are three variables which you can use to control this:
- `innodb-buffer-pool-size`
- `innodb-buffer-pool-instances`
- `innodb-buffer-pool-chunk-size`
The gist of it is that you have to make sure that `size` = `instances` \* `chunk-size`. For example, my config for setting the pool size to 32GB:
```ini
innodb-buffer-pool-size = 32G
innodb-buffer-pool-instances = 8
innodb-buffer-pool-chunk-size = 4G
```
The only correlation I've seen for figuring out what size you need to prevent crashes for super-fast imports is how powerful the machine hosting the database is. The slower the CPU, the slower the imports and the less buffer pool it needs. You'll have to do some trial and error.
While the [documentation](https://dev.mysql.com/doc/refman/8.4/en/innodb-parameters.html#sysvar_innodb_buffer_pool_size) says that the actual memory usage of a MySQL instance is `pool size * 1.1`, in reality I've actually seen it at around `pool size * 1.5`; if it allocates too much memory it'll either start slowing down from page file swapping or outright crashing. Be careful when tweaking this
* * *
## Disclaimer
Due to how massive the scope of such a tool is, it's impossible for me to test every configuration and data combination possible; and as such bugs may exist.
You should make sure to test this tool and make sure it exports & imports the data correctly (it's good practice to verify your backup & restore processes anyway)