{"id":48765456,"url":"https://github.com/databendlabs/bend-archiver","last_synced_at":"2026-04-13T07:49:33.645Z","repository":{"id":245969059,"uuid":"819701719","full_name":"databendlabs/bend-archiver","owner":"databendlabs","description":"Parallel archive tool for syncing MySQL/PostgreSQL/TiDB/SQL Server data into Databend, with key- or time-based splitting.","archived":false,"fork":false,"pushed_at":"2026-01-31T07:18:38.000Z","size":12263,"stargazers_count":15,"open_issues_count":5,"forks_count":6,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-04-13T07:49:22.178Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://databend.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databendlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-25T03:24:03.000Z","updated_at":"2026-01-28T08:30:40.000Z","dependencies_parsed_at":"2024-06-25T04:43:36.560Z","dependency_job_id":"916623d2-db79-4d2a-a0fa-6906939321f8","html_url":"https://github.com/databendlabs/bend-archiver","commit_stats":null,"previous_names":["databendcloud/db-archiver","databendlabs/bend-archiver"],"tags_count":38,"template":false,"template_full_name":null,"purl":"pkg:github/databendlabs/bend-archiver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databendlabs%2Fbend-archiver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databendlabs%2Fbend-archiver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databendlabs%2Fbend-archiver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databendlabs%2Fbend-archiver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databendlabs","download_url":"https://codeload.github.com/databendlabs/bend-archiver/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databendlabs%2Fbend-archiver/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31744404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T06:26:45.479Z","status":"ssl_error","status_checked_at":"2026-04-13T06:26:44.645Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-13T07:49:30.025Z","updated_at":"2026-04-13T07:49:33.636Z","avatar_url":"https://github.com/databendlabs.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bend-archiver\nArchive data from common databases into Databend with parallel sync (by key or time range).\n\n## Supported sources\n| Data source | Supported |\n|:-----------|:---------:|\n| MySQL      |    Yes    |\n| PostgreSQL |    Yes    |\n| TiDB       |    Yes    |\n| SQL Server |    Yes    |\n| Oracle     | Coming soon |\n| CSV        | Coming soon |\n| NDJSON     | Coming soon |\n\n## Install\nDownload the binary from the [release page](https://github.com/databendcloud/bend-archiver/releases).\n\n## Configure\nCreate `config/conf.json`.\n\nParameters (defaults are from code):\n| Key | Required | Default | Notes |\n|:----|:--------:|:--------|:------|\n| `databaseType` | No | `mysql` | `mysql`, `tidb`, `pg`, `mssql`, `oracle` |\n| `sourceHost` | Yes | - | Source host |\n| `sourcePort` | Yes | - | Source port |\n| `sourceUser` | Yes | - | Source user |\n| `sourcePass` | Yes | - | Source password |\n| `sourceDB` | If no `sourceDbTables` | - | Source database |\n| `sourceTable` | If no `sourceDbTables` | - | Source table |\n| `sourceDbTables` | No | `[]` | Multi-table: `[\"dbRegex@tableRegex\"]` |\n| `sourceQuery` | No | - | Currently ignored |\n| `sourceWhereCondition` | Yes | - | WHERE clause without `WHERE` |\n| `sourceSplitKey` | If key split | - | Integer primary key |\n| `sourceSplitTimeKey` | If time split | - | Time column |\n| `timeSplitUnit` | If time split | `hour` | `minute`, `quarter`, `hour`, `day` |\n| `sslMode` | No | `disable` | Postgres only |\n| `databendDSN` | Yes | `localhost:8000` | Databend DSN |\n| `databendTable` | Yes | - | Target table |\n| `batchSize` | Yes | `1000` | Rows per batch |\n| `batchMaxInterval` | No | `3` | Seconds between batches |\n| `copyPurge` | No | `true` | Databend COPY option |\n| `copyForce` | No | `false` | Databend COPY option |\n| `disableVariantCheck` | No | `true` | Databend COPY option |\n| `userStage` | No | `~` | Databend stage |\n| `deleteAfterSync` | No | `false` | Deletes source rows |\n| `maxThread` | No | `1` | Max concurrency |\n| `oracleSID` | No | - | Oracle SID |\n\nRules:\n- `sourceWhereCondition` is always required; for time split use `t \u003e= '...' and t \u003c '...'` with `YYYY-MM-DD HH:MM:SS`.\n- `sourceSplitKey` and `sourceSplitTimeKey` are mutually exclusive.\n- For time split, `timeSplitUnit` is required.\n\nExample (key split):\n```json\n{\n  \"databaseType\": \"mysql\",\n  \"sourceHost\": \"127.0.0.1\",\n  \"sourcePort\": 3306,\n  \"sourceUser\": \"root\",\n  \"sourcePass\": \"123456\",\n  \"sourceDB\": \"mydb\",\n  \"sourceTable\": \"test_table\",\n  \"sourceWhereCondition\": \"id \u003e 0\",\n  \"sourceSplitKey\": \"id\",\n  \"databendDSN\": \"databend://username:password@host:port?sslmode=disable\",\n  \"databendTable\": \"mydb.test_table\",\n  \"batchSize\": 40000,\n  \"maxThread\": 5\n}\n```\n\nExample (time split keys):\n```json\n{\n  \"sourceWhereCondition\": \"t1 \u003e= '2024-06-01 00:00:00' and t1 \u003c '2024-07-01 00:00:00'\",\n  \"sourceSplitTimeKey\": \"t1\",\n  \"timeSplitUnit\": \"hour\"\n}\n```\n\n## Run\n```bash\n./bend-archiver -f config/conf.json\n```\nIf `-f` is omitted, it loads `config/conf.json`.\n\n## Development\n### Build\n```bash\ngo build -o bend-archiver ./cmd\n```\n\n### Tests\n```bash\ngo test ./...\n```\nTests in `cmd` and `source` expect local databases (Databend plus the source DBs in the tests).\n\n### Run from source\n```bash\ngo run ./cmd -f config/conf.json\n```\n\n## Notes\n- Multi-table sync uses regex in `sourceDbTables` (example: `[\"^mydb$@^test_table_.*$\"]`).\n- The MySQL driver reports BOOL as `TINYINT(1)`, so use `TINYINT` in Databend for boolean columns.\n- COPY options reference: https://docs.databend.com/sql/sql-commands/dml/dml-copy-into-table#copy-options\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabendlabs%2Fbend-archiver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabendlabs%2Fbend-archiver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabendlabs%2Fbend-archiver/lists"}