{"id":13424874,"url":"https://github.com/mkabilov/pg2ch","last_synced_at":"2025-03-15T18:36:05.608Z","repository":{"id":48144667,"uuid":"170719346","full_name":"mkabilov/pg2ch","owner":"mkabilov","description":"Data streaming from postgresql to clickhouse via logical replication mechanism","archived":true,"fork":false,"pushed_at":"2021-02-02T15:19:35.000Z","size":20051,"stargazers_count":194,"open_issues_count":12,"forks_count":33,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-15T08:03:57.524Z","etag":null,"topics":["clickhouse","go","golang","logical-replication","postgresql"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkabilov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-14T16:08:45.000Z","updated_at":"2024-12-27T18:17:31.000Z","dependencies_parsed_at":"2022-09-02T20:50:16.655Z","dependency_job_id":null,"html_url":"https://github.com/mkabilov/pg2ch","commit_stats":null,"previous_names":["ikitiki/pg2ch"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Fpg2ch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Fpg2ch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Fpg2ch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkabilov%2Fpg2ch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkabilov","download_url":"https://codeload.github.com/mkabilov/pg2ch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243775946,"owners_count":20346294,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","go","golang","logical-replication","postgresql"],"created_at":"2024-07-31T00:01:00.411Z","updated_at":"2025-03-15T18:36:00.591Z","avatar_url":"https://github.com/mkabilov.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# PostgreSQL to ClickHouse\n\nContinuous data transfer from PostgreSQL to ClickHouse using logical replication mechanism.\n\n### Status of the project\nCurrently pg2ch tool is in active testing stage,\nas for now it is not for production use\n\n\n### Getting and running\n\nGet:\n```\n    go get -u github.com/mkabilov/pg2ch\n```\n\nRun:\n```\n    pg2ch --config {path to the config file (default config.yaml)}\n```\n\n\n### Config file\n```yaml\ntables:\n    {postgresql table name}:\n        main_table: {clickhouse table name}\n        buffer_table: {clickhouse buffer table name} # optional, if not specified, insert directly to the main table\n        buffer_row_id: {clickhouse buffer table column name for row id} \n        init_sync_skip: {skip initial copy of the data}\n        init_sync_skip_buffer_table: {if true bypass buffer_table and write directly to the main_table on initial sync copy}\n                                     # makes sense in case of huge tables        \n        init_sync_skip_truncate: {skip truncate of the main_table during init sync}                                 \n        engine: {clickhouse table engine: MergeTree, ReplacingMergeTree or CollapsingMergeTree}\n        max_buffer_length: {number of DML(insert/update/delete) commands to store in the memory before flushing to the buffer/main table } \n        merge_threshold: {if buffer table specified, number of buffer flushed before moving data from buffer to the main table}\n        columns: # postgres - clickhouse column name mapping, \n                 # if not present, all the columns are expected to be on the clickhouse side with the exact same names \n            {postgresql column name}: {clickhouse column name}\n        is_deleted_column: # in case of ReplacingMergeTree 1 will be stored in the {is_deleted_column} in order to mark deleted rows\n        sign_column: {clickhouse sign column name for CollapsingMergeTree engines only, default \"sign\"}\n        ver_column: {clickhouse version column name for the ReplacingMergeTree engine, default \"ver\"}\n\ninactivity_merge_timeout: {interval, default 1 min} # merge buffered data after that timeout\n\nclickhouse: # clickhouse tcp protocol connection params\n    host: {clickhouse host, default 127.0.0.1}\n    port: {tcp port, default 9000}\n    database: {database name}\n    username: {username}\n    password: {password}\n    params:\n        {extra param name}:{extra param value}\n        ...\n\npostgres: # postgresql connection params\n    host: {host name, default 127.0.0.1}\n    port: {port, default 5432}\n    database: {database name}\n    user: {user}\n    replication_slot_name: {logical replication slot name}\n    publication_name: {postgresql publication name}\n    \ndb_path: {path to the persistent storage dir where table lsn positions will be stored}\n```\n\n### Sample setup:\n\n- make sure you have PostgreSQL server running on `localhost:5432`\n    - set `wal_level` in the postgresql config file to `logical`\n    - set `max_replication_slots` to at least `2`\n- make sure you have ClickHouse server running on `localhost:9000` e.g. in the [docker](https://hub.docker.com/r/yandex/clickhouse-server/)\n- create database `pg2ch_test` in PostgreSQL: `CREATE DATABASE pg2ch_test;`\n- create a set of tables using pgbench command: `pgbench -U postgres -d pg2ch_test -i`\n- change [replica identity](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY)\nfor the `pgbench_accounts` table to FULL, so that we'll receive old values of the updated rows: `ALTER TABLE pgbench_accounts REPLICA IDENTITY FULL;`\n- create PostgreSQL publication for the `pgbench_accounts` table: `CREATE PUBLICATION pg2ch_pub FOR TABLE pgbench_accounts;`\n- create PostgreSQL logical replication slot: `SELECT * FROM pg_create_logical_replication_slot('pg2ch_slot', 'pgoutput');`\n- create tables on the ClickHouse side:\n```sql\nCREATE TABLE pgbench_accounts (aid Int32, abalance Int32, sign Int8) ENGINE = CollapsingMergeTree(sign) ORDER BY aid\n-- our target table\n\nCREATE TABLE pgbench_accounts_buf (aid Int32, abalance Int32, sign Int8, row_id UInt64) ENGINE = Memory()\n-- will be used as a buffer table\n```\n- create `config.yaml` file with the following content:\n```yaml\ntables:\n    pgbench_accounts:\n        main_table: pgbench_accounts\n        buffer_table: pgbench_accounts_buf\n        buffer_row_id: row_id\n        engine: CollapsingMergeTree\n        max_buffer_length: 1000\n        merge_threshold: 4\n        columns:\n            aid: aid\n            abalance: abalance\n        sign_column: sign\n\ninactivity_merge_timeout: '10s'\n\nclickhouse:\n    host: localhost\n    port: 9000\n    database: default\n    username: default\npostgres:\n    host: localhost\n    port: 5432\n    database: pg2ch_test\n    user: postgres\n    replication_slot_name: pg2ch_slot\n    publication_name: pg2ch_pub\n    \ndb_path: db\n```\n\n- run pg2ch to start replication:\n```bash\n    pg2ch --config config.yaml\n```\n\n- run `pgbench` to have some test load:\n```bash\n    pgbench -U postgres -d pg2ch_test --time 30 --client 10 \n```\n- wait for `inactivity_merge_timeout` period (in our case 10 seconds) so that data in the memory gets flushed to the table in ClickHouse\n- check the sums of the `abalance` column both on ClickHouse and PostgreSQL:\n    - ClickHouse: `SELECT SUM(abalance * sign), SUM(sign) FROM pgbench_accounts` ([why multiply by `sign` column?](https://clickhouse.yandex/docs/en/operations/table_engines/collapsingmergetree/#example-of-use)) \n    - PostgreSQL: `SELECT SUM(abalance), COUNT(*) FROM pgbench_accounts`\n- numbers must match; if not, please open an issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkabilov%2Fpg2ch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkabilov%2Fpg2ch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkabilov%2Fpg2ch/lists"}