{"id":13696357,"url":"https://github.com/long2ice/synch","last_synced_at":"2025-04-12T23:40:47.858Z","repository":{"id":45014644,"uuid":"240656296","full_name":"long2ice/synch","owner":"long2ice","description":"Sync data from the other DB to ClickHouse(cluster)","archived":false,"fork":false,"pushed_at":"2024-05-21T19:25:41.000Z","size":2100,"stargazers_count":350,"open_issues_count":13,"forks_count":98,"subscribers_count":7,"default_branch":"dev","last_synced_at":"2025-04-04T03:09:08.382Z","etag":null,"topics":["clickhouse","data-etl","increment-etl","kafka","mysql","postgresql","replication"],"latest_commit_sha":null,"homepage":"https://github.com/long2ice/synch","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/long2ice.png","metadata":{"files":{"readme":"README-zh.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"custom":["https://sponsor.long2ice.io"]}},"created_at":"2020-02-15T06:25:16.000Z","updated_at":"2025-04-03T08:49:24.000Z","dependencies_parsed_at":"2024-10-28T19:42:49.263Z","dependency_job_id":"04310d6f-8257-4a05-a80f-d94459870eea","html_url":"https://github.com/long2ice/synch","commit_stats":{"total_commits":300,"total_committers":4,"mean_commits":75.0,"dds":0.3533333333333334,"last_synced_commit":"d26f30a478fa4e36991008b0410c215b0ef6538f"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/long2ice%2Fsynch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/long2ice%2Fsynch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/long2ice%2Fsynch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/long2ice%2Fsynch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/long2ice","download_url":"https://codeload.github.com/long2ice/synch/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248647254,"owners_count":21139081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","data-etl","increment-etl","kafka","mysql","postgresql","replication"],"created_at":"2024-08-02T18:00:38.803Z","updated_at":"2025-04-12T23:40:47.835Z","avatar_url":"https://github.com/long2ice.png","language":"Python","funding_links":["https://sponsor.long2ice.io"],"categories":["Integrations"],"sub_categories":["Data Transfer and Synchronization"],"readme":"# Synch\n\n![pypi](https://img.shields.io/pypi/v/synch.svg?style=flat)\n![docker](https://img.shields.io/docker/cloud/build/long2ice/synch)\n![license](https://img.shields.io/github/license/long2ice/synch)\n![workflows](https://github.com/long2ice/synch/workflows/pypi/badge.svg)\n![workflows](https://github.com/long2ice/synch/workflows/ci/badge.svg)\n\n[English](https://github.com/long2ice/synch/blob/dev/README.md)\n\n## 简介\n\n从其他数据库同步到 ClickHouse，当前支持 MySQL 与 postgres，支持全量复制与增量复制。\n\n![synch](https://github.com/long2ice/synch/raw/dev/images/synch.png)\n\n## 特性\n\n- 全量复制与实时增量复制。\n- 支持 DML 同步与 DDL 同步， 支持增加字段、删除字段、更改字段，并且支持所有的 DML。\n- 错误邮件通知。\n- 支持 redis 与 kafka 作为消息队列。\n- 支持多源数据库同时同步到 ClickHouse。\n- 支持 ClickHouse `MergeTree`、`CollapsingMergeTree`、`VersionedCollapsingMergeTree`、`ReplacingMergeTree`引擎。\n- 支持 ClickHouse 集群。\n\n## 依赖\n\n- Python \u003e= 3.7\n- [redis](https://redis.io)，缓存 binlog 和作为消息队列，支持 redis 集群。\n- [kafka](https://kafka.apache.org)，使用 kafka 作为消息队列时需要。\n- [clickhouse-jdbc-bridge](https://github.com/long2ice/clickhouse-jdbc-bridge)， 在 postgres 执行`etl`命令的时候需要。\n\n## 安装\n\n```shell\n\u003e pip install synch\n```\n\n## 使用\n\n### 配置文件 `synch.yaml`\n\nsynch 默认从 `./synch.yaml`读取配置， 或者可以使用`synch -c` 指定配置文件。\n\n参考配置文件 [`synch.yaml`](https://github.com/long2ice/synch/blob/dev/synch.yaml)。\n\n### 全量复制\n\n在增量复制之前一般需要进行一次全量复制，或者使用`--renew`进行全量重建。\n\n```shell\n\u003e synch --alias mysql_db etl -h\n\nUsage: synch etl [OPTIONS]\n\n  Make etl from source table to ClickHouse.\n\nOptions:\n  --schema TEXT     Schema to full etl.\n  --renew           Etl after try to drop the target tables.\n  -t, --table TEXT  Tables to full etl.\n  -h, --help        Show this message and exit.\n```\n\n全量复制表 `test.test`：\n\n```shell\n\u003e synch --alias mysql_db etl --schema test --tables test\n```\n\n### 生产\n\n监听源库并将变动数据写入消息队列。\n\n```shell\n\u003e synch --alias mysql_db produce\n```\n\n### 消费\n\n从消息队列中消费数据并插入 ClickHouse，使用 `--skip-error`跳过错误消息。 配置 `auto_full_etl = True` 的时候会首先尝试做一次全量复制。\n\n```shell\n\u003e synch --alias mysql_db consume -h\n\nUsage: synch consume [OPTIONS]\n\n  Consume from broker and insert into ClickHouse.\n\nOptions:\n  --schema TEXT       Schema to consume.  [required]\n  --skip-error        Skip error rows.\n  --last-msg-id TEXT  Redis stream last msg id or kafka msg offset, depend on\n                      broker_type in config.\n\n  -h, --help          Show this message and exit.\n```\n\n消费数据库 `test` 并插入到`ClickHouse`：\n\n```shell\n\u003e synch --alias mysql_db consume --schema test\n```\n\n### 监控\n\n设置`core.monitoring`为`true`的时候会自动在`ClickHouse`创建一个`synch`数据库用以插入监控数据。\n\n表结构：\n\n```sql\ncreate table if not exists synch.log\n(\n    alias String,\n    schema String,\n    table String,\n    num        int,\n    type       int, -- 1：生产者, 2：消费者\n    created_at DateTime\n)\n    engine = MergeTree partition by toYYYYMM\n(\n    created_at\n) order by created_at;\n```\n\n### ClickHouse 表引擎\n\n现在 synch 支持 `MergeTree`、`CollapsingMergeTree`、`VersionedCollapsingMergeTree`、`ReplacingMergeTree`等引擎。\n\n- `MergeTree`，默认引擎，通常情况下的选择。\n- `CollapsingMergeTree`\n  ，详情参考[CollapsingMergeTree](https://clickhouse.tech/docs/zh/engines/table-engines/mergetree-family/collapsingmergetree/)。\n- `VersionedCollapsingMergeTree`\n  ，详情参考[VersionedCollapsingMergeTree](https://clickhouse.tech/docs/zh/engines/table-engines/mergetree-family/versionedcollapsingmergetree/)。\n- `ReplacingMergeTree`\n  ，详情参考[ReplacingMergeTree](https://clickhouse.tech/docs/zh/engines/table-engines/mergetree-family/replacingmergetree/)。\n\n## 使用 docker-compose（推荐）\n\n\u003cdetails\u003e\n\u003csummary\u003eRedis 作为消息队列，轻量级消息队列，依赖少\u003c/summary\u003e\n\n```yaml\nversion: \"3\"\nservices:\n  producer:\n    depends_on:\n      - redis\n    image: long2ice/synch\n    command: synch --alias mysql_db produce\n    volumes:\n      - ./synch.yaml:/synch/synch.yaml\n  # 一个消费者消费一个数据库\n  consumer.test:\n    depends_on:\n      - redis\n    image: long2ice/synch\n    command: synch --alias mysql_db consume --schema test\n    volumes:\n      - ./synch.yaml:/synch/synch.yaml\n  redis:\n    hostname: redis\n    image: redis:latest\n    volumes:\n      - redis\nvolumes:\n  redis:\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eKafka作为消息队列，重量级，高吞吐量\u003c/summary\u003e\n\n```yaml\nversion: \"3\"\nservices:\n  zookeeper:\n    image: bitnami/zookeeper:3\n    hostname: zookeeper\n    environment:\n      - ALLOW_ANONYMOUS_LOGIN=yes\n    volumes:\n      - zookeeper:/bitnami\n  kafka:\n    image: bitnami/kafka:2\n    hostname: kafka\n    environment:\n      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181\n      - ALLOW_PLAINTEXT_LISTENER=yes\n      - JMX_PORT=23456\n      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true\n      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092\n    depends_on:\n      - zookeeper\n    volumes:\n      - kafka:/bitnami\n  kafka-manager:\n    image: hlebalbau/kafka-manager\n    ports:\n      - \"9000:9000\"\n    environment:\n      ZK_HOSTS: \"zookeeper:2181\"\n      KAFKA_MANAGER_AUTH_ENABLED: \"false\"\n    command: -Dpidfile.path=/dev/null\n  producer:\n    depends_on:\n      - redis\n      - kafka\n      - zookeeper\n    image: long2ice/synch\n    command: synch --alias mysql_db produce\n    volumes:\n      - ./synch.yaml:/synch/synch.yaml\n  # 一个消费者消费一个数据库\n  consumer.test:\n    depends_on:\n      - redis\n      - kafka\n      - zookeeper\n    image: long2ice/synch\n    command: synch --alias mysql_db consume --schema test\n    volumes:\n      - ./synch.yaml:/synch/synch.yaml\n  redis:\n    hostname: redis\n    image: redis:latest\n    volumes:\n      - redis:/data\nvolumes:\n  redis:\n  kafka:\n  zookeeper:\n```\n\n\u003c/details\u003e\n\n## 重要提示\n\n- 同步的表必须有主键或非 null 唯一键或复合主键。\n- DDL 不支持 postgres.\n- Postgres 同步未经过大量测试，生产环境谨慎使用。\n\n## 感谢\n\n强大的 Python IDE [Pycharm](https://www.jetbrains.com/pycharm/?from=synch)\n，来自 [Jetbrains](https://www.jetbrains.com/?from=synch)。\n\n![jetbrains](https://github.com/long2ice/synch/raw/dev/images/jetbrains.svg)\n\n## 开源许可\n\n本项目遵从 [Apache-2.0](https://github.com/long2ice/synch/blob/master/LICENSE) 开源许可。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flong2ice%2Fsynch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flong2ice%2Fsynch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flong2ice%2Fsynch/lists"}