https://github.com/databendlabs/bend-archiver
Parallel archive tool for syncing MySQL/PostgreSQL/TiDB/SQL Server data into Databend, with key- or time-based splitting.
https://github.com/databendlabs/bend-archiver
Last synced: 2 months ago
JSON representation
Parallel archive tool for syncing MySQL/PostgreSQL/TiDB/SQL Server data into Databend, with key- or time-based splitting.
- Host: GitHub
- URL: https://github.com/databendlabs/bend-archiver
- Owner: databendlabs
- License: apache-2.0
- Created: 2024-06-25T03:24:03.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2026-01-31T07:18:38.000Z (4 months ago)
- Last Synced: 2026-04-13T07:49:22.178Z (2 months ago)
- Language: Go
- Homepage: https://databend.com
- Size: 11.7 MB
- Stars: 15
- Watchers: 6
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bend-archiver
Archive data from common databases into Databend with parallel sync (by key or time range).
## Supported sources
| Data source | Supported |
|:-----------|:---------:|
| MySQL | Yes |
| PostgreSQL | Yes |
| TiDB | Yes |
| SQL Server | Yes |
| Oracle | Coming soon |
| CSV | Coming soon |
| NDJSON | Coming soon |
## Install
Download the binary from the [release page](https://github.com/databendcloud/bend-archiver/releases).
## Configure
Create `config/conf.json`.
Parameters (defaults are from code):
| Key | Required | Default | Notes |
|:----|:--------:|:--------|:------|
| `databaseType` | No | `mysql` | `mysql`, `tidb`, `pg`, `mssql`, `oracle` |
| `sourceHost` | Yes | - | Source host |
| `sourcePort` | Yes | - | Source port |
| `sourceUser` | Yes | - | Source user |
| `sourcePass` | Yes | - | Source password |
| `sourceDB` | If no `sourceDbTables` | - | Source database |
| `sourceTable` | If no `sourceDbTables` | - | Source table |
| `sourceDbTables` | No | `[]` | Multi-table: `["dbRegex@tableRegex"]` |
| `sourceQuery` | No | - | Currently ignored |
| `sourceWhereCondition` | Yes | - | WHERE clause without `WHERE` |
| `sourceSplitKey` | If key split | - | Integer primary key |
| `sourceSplitTimeKey` | If time split | - | Time column |
| `timeSplitUnit` | If time split | `hour` | `minute`, `quarter`, `hour`, `day` |
| `sslMode` | No | `disable` | Postgres only |
| `databendDSN` | Yes | `localhost:8000` | Databend DSN |
| `databendTable` | Yes | - | Target table |
| `batchSize` | Yes | `1000` | Rows per batch |
| `batchMaxInterval` | No | `3` | Seconds between batches |
| `copyPurge` | No | `true` | Databend COPY option |
| `copyForce` | No | `false` | Databend COPY option |
| `disableVariantCheck` | No | `true` | Databend COPY option |
| `userStage` | No | `~` | Databend stage |
| `deleteAfterSync` | No | `false` | Deletes source rows |
| `maxThread` | No | `1` | Max concurrency |
| `oracleSID` | No | - | Oracle SID |
Rules:
- `sourceWhereCondition` is always required; for time split use `t >= '...' and t < '...'` with `YYYY-MM-DD HH:MM:SS`.
- `sourceSplitKey` and `sourceSplitTimeKey` are mutually exclusive.
- For time split, `timeSplitUnit` is required.
Example (key split):
```json
{
"databaseType": "mysql",
"sourceHost": "127.0.0.1",
"sourcePort": 3306,
"sourceUser": "root",
"sourcePass": "123456",
"sourceDB": "mydb",
"sourceTable": "test_table",
"sourceWhereCondition": "id > 0",
"sourceSplitKey": "id",
"databendDSN": "databend://username:password@host:port?sslmode=disable",
"databendTable": "mydb.test_table",
"batchSize": 40000,
"maxThread": 5
}
```
Example (time split keys):
```json
{
"sourceWhereCondition": "t1 >= '2024-06-01 00:00:00' and t1 < '2024-07-01 00:00:00'",
"sourceSplitTimeKey": "t1",
"timeSplitUnit": "hour"
}
```
## Run
```bash
./bend-archiver -f config/conf.json
```
If `-f` is omitted, it loads `config/conf.json`.
## Development
### Build
```bash
go build -o bend-archiver ./cmd
```
### Tests
```bash
go test ./...
```
Tests in `cmd` and `source` expect local databases (Databend plus the source DBs in the tests).
### Run from source
```bash
go run ./cmd -f config/conf.json
```
## Notes
- Multi-table sync uses regex in `sourceDbTables` (example: `["^mydb$@^test_table_.*$"]`).
- The MySQL driver reports BOOL as `TINYINT(1)`, so use `TINYINT` in Databend for boolean columns.
- COPY options reference: https://docs.databend.com/sql/sql-commands/dml/dml-copy-into-table#copy-options