{"id":20826766,"url":"https://github.com/cedrickchee/clickhouse-cluster","last_synced_at":"2025-09-10T10:40:52.146Z","repository":{"id":138118129,"uuid":"414075635","full_name":"cedrickchee/clickhouse-cluster","owner":"cedrickchee","description":"All the essential stuffs to set up ClickHouse cluster with sharding and replication enabled (or sharding only), suitable for local dev and testing.","archived":false,"fork":false,"pushed_at":"2021-10-06T05:11:41.000Z","size":17,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-12T07:27:36.323Z","etag":null,"topics":["clickhouse","clickhouse-cluster","educational-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedrickchee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-06T05:04:22.000Z","updated_at":"2025-02-21T15:54:16.000Z","dependencies_parsed_at":null,"dependency_job_id":"1ab3214f-521f-4d64-bae3-cd4f202a0ca3","html_url":"https://github.com/cedrickchee/clickhouse-cluster","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cedrickchee/clickhouse-cluster","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fclickhouse-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fclickhouse-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fclickhouse-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fclickhouse-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedrickchee","download_url":"https://codeload.github.com/cedrickchee/clickhouse-cluster/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fclickhouse-cluster/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274368104,"owners_count":25272352,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clickhouse","clickhouse-cluster","educational-project"],"created_at":"2024-11-17T23:09:57.618Z","updated_at":"2025-09-10T10:40:52.123Z","avatar_url":"https://github.com/cedrickchee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tutorial: Creating a ClickHouse Cluster\n\nBased on this tutorial: [\"Creating a ClickHouse cluster - Part I: Sharding\"](https://dev.to/zergon321/creating-a-clickhouse-cluster-part-i-sharding-4j20)\n\nThe final cluster has:\n- 1 cluster, with 2 shards\n- Each shard has 2 replica server\n- clickhouse-servers:\n    - Master node run at 127.0.0.1, ports 9000\n    - Subordinate/worker nodes run at 127.0.0.1, ports 9001-9004\n\n## Cluster Deployment\n\nNow we are ready to launch the system. I will do it using `docker-compose`:\n\nFirst, SSH to my Multipass VM (instance name is \"clickhouse\"). Then run:\n\n```sh\n$ cd ~/dev/tutorial/clickhouse-cluster\n\n$ ~/dev/tutorial/clickhouse-cluster$ docker-compose up\nCreating network \"clickhouse-cluster_default\" with the default driver\nCreating volume \"clickhouse-cluster_ch-master-data\" with default driver\nCreating volume \"clickhouse-cluster_ch-master-logs\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-1-data\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-1-logs\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-2-data\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-2-logs\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-3-data\" with default driver\nCreating volume \"clickhouse-cluster_ch-sub-3-logs\" with default driver\nPulling ch-sub-1 (yandex/clickhouse-server:19.14.13.4)...\n19.14.13.4: Pulling from yandex/clickhouse-server\na1125296b23d: Pull complete\n3c742a4a0f38: Pull complete\n4c5ea3b32996: Pull complete\n1b4be91ead68: Pull complete\n8e89ff3b8b56: Pull complete\nb54bb3d8e5ac: Pull complete\na955f5266cb6: Pull complete\nd200f6bc678a: Pull complete\n1250dc772f64: Pull complete\nfad28a14cd72: Pull complete\ndfe82dbaecba: Pull complete\nDigest: sha256:ccf9c2b5e3f22dfda4d00b85d44fb37f90e49d119b9724160981508c31220070\nStatus: Downloaded newer image for yandex/clickhouse-server:19.14.13.4\nCreating ch_sub_2 ... done\nCreating ch_sub_1 ... done\nCreating ch_sub_3 ... done\nCreating ch_master ... done\nAttaching to ch_sub_2, ch_sub_1, ch_sub_3, ch_master\nch_sub_1     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: user_files_path (version 19.14.13.4 (official build)\nch_sub_1     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: format_schema_path (version 19.14.13.4 (official build)\nch_sub_1     | Logging trace to /var/log/clickhouse-server/clickhouse-server.log\nch_sub_1     | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log\nch_sub_1     | Include not found: networks\nch_sub_2     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: user_files_path (version 19.14.13.4 (official build)\nch_sub_2     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: format_schema_path (version 19.14.13.4 (official build)\nch_sub_2     | Logging trace to /var/log/clickhouse-server/clickhouse-server.log\nch_sub_2     | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log\nch_sub_2     | Include not found: networks\nch_sub_3     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: user_files_path (version 19.14.13.4 (official build)\nch_sub_3     | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: format_schema_path (version 19.14.13.4 (official build)\nch_sub_3     | Logging trace to /var/log/clickhouse-server/clickhouse-server.log\nch_sub_3     | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log\nch_sub_3     | Include not found: networks\nch_master    | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: user_files_path (version 19.14.13.4 (official build)\nch_master    | Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Not found: format_schema_path (version 19.14.13.4 (official build)\nch_master    | Logging trace to /var/log/clickhouse-server/clickhouse-server.log\nch_master    | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log\nch_master    | Include not found: networks\n```\n\nNoticed the errors. To fix, we need to modified `master-config.xml` and `sub-config.xml` that we get from the article's [project source](https://github.com/zergon321/clickhouse-clustering). Edit and add these lines in the two config files:\n\n```xml\n\u003cuser_files_path\u003e/var/lib/clickhouse/user_files/\u003c/user_files_path\u003e\n\n...\n\n\u003cformat_schema_path\u003e/var/lib/clickhouse/format_schemas/\u003c/format_schema_path\u003e\n```\n\n---\n\n## Usage\n\nOnce we're done with the cluster deployment, the next step is interacting with the cluster.\n\nInstall Python packages and run:\n\n```sh\n$ make\n```\n\nInstall or update Python packages:\n\n```sh\n$ make install\n```\n\nRun app only:\n\n```sh\n$ make run\n```\n\n##  Cluster Tables\n\nAfter everything is up and running, it's time to create data tables.\nFor this task I will use Python programming language and [clickhouse-driver](https://pypi.org/project/clickhouse-driver/) library.\nNow onto the first script, [create-cluster.py](./create-cluster.py):\n\n```sh\n$ python create-cluster.py\n```\n\n### Distributed Table on the Master Node\n\n**Sharding key**\n\nThe sharding key is an expression whose result is used to decide which shard stores the data row depending on the values of the columns.\nIf you specify rand(), the row goes to the random shard. Sharding key is only applicable if you do INSERT operations on the master table (note that the master table itself doesn't store any data, it only aggregates the data from the shards during queries). But we can perform INSERT operations directly on the subordinate nodes:\n\n```sh\n$ python sub-1.py\n```\n\nYou can insert any data you want to any node.\n\n```sh\n$ python sub-2.py\n$ python sub-3.py\n```\n\n## Cluster Operations\n\nNow try to connect to the master node via ClickHouse client:\n\n```sh\n$ docker run --network=\"clickhouse-cluster_default\" -it --rm --link ch_master:clickhouse-server yandex/clickhouse-client:19.14.12.2 --host clickhouse-server\nClickHouse client version 19.14.12.2 (official build).\nConnecting to clickhouse-server:9000 as user default.\nConnected to ClickHouse server version 19.14.13 revision 54425.\n\n39b6272b0804 :) \n```\n\nWhen you are in, try to execute the next set of SQL instructions:\n\n```sh\n39b6272b0804 :) USE db\n...\n\nOk.\n\n0 rows in set. Elapsed: 0.003 sec.\n\n39b6272b0804 :) SELECT * FROM entries\n...\n┌───────────timestamp─┬─parameter──┬─value─┐\n│ 2021-10-05 06:55:09 │ elasticity │  38.9 │\n└─────────────────────┴────────────┴───────┘\n┌───────────timestamp─┬─parameter───┬─value─┐\n│ 2021-10-05 06:40:00 │ temperature │  38.9 │\n└─────────────────────┴─────────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:55:09 │ density   │  19.8 │\n└─────────────────────┴───────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:40:00 │ density   │  12.3 │\n└─────────────────────┴───────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:55:09 │ gravity   │  27.2 │\n└─────────────────────┴───────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:40:00 │ humidity  │  27.2 │\n└─────────────────────┴───────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:46:12 │ humidity  │  39.8 │\n└─────────────────────┴───────────┴───────┘\n┌───────────timestamp─┬─parameter───┬─value─┐\n│ 2021-10-05 06:46:12 │ temperature │ 88.13 │\n└─────────────────────┴─────────────┴───────┘\n┌───────────timestamp─┬─parameter─┬─value─┐\n│ 2021-10-05 06:46:12 │ voltage   │  72.8 │\n└─────────────────────┴───────────┴───────┘\n\n9 rows in set. Elapsed: 0.021 sec.\n```\n\nIf everything has been set up properly, you'll see all the data you sent to each shard.\n\nTear down cluster\n\n```sh\n# also remove volumes\n$ docker-compose down -v\n```\n\n## Part 2: Enable Replication\n\nThis part is based on this article: [\"Creating a ClickHouse cluster - Part II: Replication\"](https://dev.to/zergon321/creating-a-clickhouse-cluster-part-ii-replication-23mc)\n\nIn the previous set up, we run ClickHouse in cluster mode using only sharding.\nIt's enough for load distribution, but we also need to ensure fault tolerance via replication.\n\n### ZooKeeper\n\nTo enable native replication ZooKeeper is required.\n\n(... see the article ...)\n\n### Cluster Configuration\n\nI will use 1 master with 2 shards, 2 replicas for each shard.\n\nSo, we are going to build a 2(shard) x 2(replica) = 6 node ClickHouse cluster.\n\nHere's the deployments configuration:\n\n(... see the article ...)\n\n_The above configuration creates a 7 nodes cluster (+1 for ZooKeeper node)._\n\n### Cluster Deployment\n\nAfter all the config files are set up, we can finally use scripts to create a cluster and run it.\n\n```sh\n$ docker-compose up\n```\n\nWhen all the database nodes are up and running, we should first execute our Python scripts for subordinate nodes.\nAll of them look like this:\n\n```python\n# sub-1.py\n\nfrom clickhouse_driver import Client\nfrom datetime import datetime\n\nif __name__ == \"__main__\":\n    client = Client(\"127.0.0.1\", port=\"9001\")\n\n    client.execute(\"CREATE DATABASE IF NOT EXISTS billing\")\n\n    client.execute(r'''CREATE TABLE IF NOT EXISTS billing.transactions(\n                      timestamp DateTime,\n                      currency String,\n                      value Float64)\n                      ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/billing.transactions', '{replica}')\n                      PARTITION BY currency\n                      ORDER BY timestamp''')\n```\n\nAs you can see, the subordinate table now uses `ReplicatedMergeTree` engine.\nIts constructor takes the path to the table records in ZooKeeper as the first parameter \nand the replica name as the second parameter.\nThe path to the table in ZooKeeper should be unique.\nAll the parameters in `{}` are taken from the aforementioned macros section of the replica config file.\n\n```sh\n$ python sub-1.py\n$ python sub-2.py\n$ python sub-3.py\n$ python sub-4.py\n```\n\nWhen all the subordinate tables are created, it's time to create a master table.\nThere's no difference from the previous case when only sharding was utilized:\n\n```python\nfrom clickhouse_driver import Client\nfrom datetime import datetime\n\nif __name__ == \"__main__\":\n    client = Client(\"127.0.0.1\", port=\"9000\")\n\n    client.execute(\"CREATE DATABASE IF NOT EXISTS billing\")\n\n    client.execute('''CREATE TABLE IF NOT EXISTS billing.transactions(\n                      timestamp DateTime,\n                      currency String,\n                      value Float64)\n                      ENGINE = Distributed(example_cluster, billing, transactions, rand())''')\n```\n\n```sh\n$ python master.py\n```\n\nQuery distributed tables (subordinate tables and master table):\n\n```sh\n$ python query-cluster.py\n```\n\nIf you set up all the things properly, you will get a working ClickHouse cluster with replication enabled.\nThe shard is alive if at least one of its replicas is up.\nTable replication strengthens fault tolerance of the cluster.\n\n# References\n\n- [How to Create Python 3 Virtual Environment on Ubuntu 20.04](https://linoxide.com/how-to-create-python-virtual-environment-on-ubuntu-20-04/)\n- Other [tutorial for setup clickhouse server](https://github.com/vejed/clickhouse-cluster)\n\n---\n\n## TODO\n\n- Improve `docker-compose.yml`:\n    - `ch-zookeeper`: add one more port\n    - `ch-master` and `ch-sub-{1-4}:\n        - add `hostname`\n        - add `ulimits`\n- Improve node configs\n    - Move all config files to a new directory named `config`\n    - Break the current one big config file into multiple configs. Example of container `volumes`:\n        - `./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml`\n        - `./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml`\n        - `./config/macros/macros-01.xml:/etc/clickhouse-server/config.d/macros.xml`\n        - `./config/users.xml:/etc/clickhouse-server/users.xml`\n        - `./data/server-01:/var/lib/clickhouse`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedrickchee%2Fclickhouse-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedrickchee%2Fclickhouse-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedrickchee%2Fclickhouse-cluster/lists"}