https://github.com/clickhouse/keeper-extend-cluster

Experiment on how to upgrade single-node clickhouse-keeper to a cluster
https://github.com/clickhouse/keeper-extend-cluster
Last synced: over 1 year ago
JSON representation
Experiment on how to upgrade single-node clickhouse-keeper to a cluster
Host: GitHub
URL: https://github.com/clickhouse/keeper-extend-cluster
Owner: ClickHouse
License: mit
Created: 2024-05-13T10:51:18.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-11-26T09:39:40.000Z (over 1 year ago)
Last Synced: 2025-01-13T04:29:26.861Z (over 1 year ago)
Language: Makefile
Size: 15.6 KB
Stars: 9
Watchers: 13
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Sandbox to upgrade a single node keeper to a cluster

## Preparation

The docker-compose starts the containers with the same user ID as the current user. To do it, `UID` and `GID` environment variables should be added to `.env`:

```

$ make prepare

# Or, to clean and create it

$ make reset

```

### Reset the progress, clean up everything

If at any stage you need to clean up the state, just run the following from the repository's root:

```

$ make reset

```

## Stage 1: single node keeper and its client

The default set of docker-compose services has two services: `zoo1` and `clickhouse`:

```

# Run clickhouse and zoo1 containers

$ docker compose up -d

[+] Running 3/3

 ✔ Network keeper-cluster_keeper-cluster  Created   0.1s

 ✔ Container keeper-cluster-zoo1-1        Started   0.5s

 ✔ Container keeper-cluster-clickhouse-1  Started   0.7s

```

At this stage, the `clickhouse-server` nodes are connected to `zoo1` and has a ReplicatedMergeTree table `default.test_repliacation`

```

$ docker compose exec clickhouse1 clickhouse-client -q 'SELECT * FROM test_repliacation'

1

2

$ docker compose exec clickhouse2 clickhouse-client -q 'SELECT * FROM test_repliacation'

1

2

```

And `zoo1` node has `clickhouse` "directory" in it's root:

```

$ docker compose exec zoo1 clickhouse-keeper-client -q 'ls "/"'

clickhouse keeper

```

## Stage 2: adding a second node to the keeper cluster:

> Attention!  

> One should never add more nodes than existing currently in the cluster. In this case, they will decide to make a new cluster.

Now, when everything works, let's add a second node:

```

$ docker compose --profile keeper-extend up -d

[+] Running 3/3

 ✔ Container keeper-cluster-zoo2-1        Started   0.4s

 ✔ Container keeper-cluster-zoo1-1        Running   0.0s

 ✔ Container keeper-cluster-clickhouse-1  Running   0.0s

```

`zoo2` is unknown to `zoo1`. It can't join the cluster and does not work:

```

$ docker compose exec zoo2 clickhouse-keeper-client -q 'ls "/"'

Coordination::Exception: All connection tries failed while connecting to ZooKeeper. nodes: [::1]:9181

Poco::Exception. Code: 1000, e.code() = 111, Connection refused (version 24.4.1.2088 (official build)), [::1]:9181

Poco::Exception. Code: 1000, e.code() = 111, Connection refused (version 24.4.1.2088 (official build)), [::1]:9181

Poco::Exception. Code: 1000, e.code() = 111, Connection refused (version 24.4.1.2088 (official build)), [::1]:9181

```

And we have the next lines in a log:

```

$ grep deny -C10 data/zoo1/clickhouse-keeper.log

........

2024.05.14 13:23:44.983985 [ 35 ] {}  RaftInstance: receive a incoming rpc connection

2024.05.14 13:23:44.984019 [ 35 ] {}  RaftInstance: session 1 got connection from ::ffff:172.24.0.4:48888 (as a server)

2024.05.14 13:23:44.984058 [ 35 ] {}  RaftInstance: asio rpc session created: 0x7eb9daab4018

2024.05.14 13:23:44.984109 [ 36 ] {}  RaftInstance: Receive a pre_vote_request message from 2 with LastLogIndex=0, LastLogTerm 0, EntriesLength=0, CommitIndex=0 and Term=0

2024.05.14 13:23:44.984136 [ 36 ] {}  RaftInstance: [PRE-VOTE REQ] my role leader, from peer 2, log term: req 0 / mine 3

last idx: req 0 / mine 67, term: req 0 / mine 3

HB alive

2024.05.14 13:23:44.984145 [ 36 ] {}  RaftInstance: pre-vote decision: XX (strong deny, non-existing node)

2024.05.14 13:23:44.984154 [ 36 ] {}  RaftInstance: Response back a pre_vote_response message to 2 with Accepted=0, Term=0, NextIndex=18446744073709551615

........

```

We need to add the new server to a known one. To make it possible, the config parameter `clickhouse.keeper_server.enable_reconfiguration=true` should be set, see [keeper_single.xml](configs/keeper_single.xml) and [keeper_cluster.xml](configs/keeper_cluster.xml).

Now, to add it, run the next command on the `zoo1` node:

```

$ docker compose exec zoo1 clickhouse-keeper-client -q 'reconfig ADD "server.2=zoo2:9234"'

server.2=zoo2:9234;participant;1

server.1=zoo1:9234;participant;1

```

The next lines will be in the `zoo1` log:

```

$ grep 'Add server' -C10 data/zoo1/clickhouse-keeper.log

2024.05.14 13:24:07.224522 [ 24 ] {}  RaftInstance: append at log_idx 74, timestamp 1715693047224499

2024.05.14 13:24:07.242767 [ 46 ] {}  RaftInstance: commit upto 74, current idx 73

2024.05.14 13:24:07.242838 [ 46 ] {}  RaftInstance: commit upto 74, current idx 74

2024.05.14 13:24:07.242890 [ 46 ] {}  RaftInstance: DONE: commit upto 74, current idx 74

2024.05.14 13:24:08.778083 [ 36 ] {}  RaftInstance: Receive a pre_vote_request message from 2 with LastLogIndex=0, LastLogTerm 0, EntriesLength=0, CommitIndex=0 and Term=0

2024.05.14 13:24:08.778122 [ 36 ] {}  RaftInstance: [PRE-VOTE REQ] my role leader, from peer 2, log term: req 0 / mine 3

last idx: req 0 / mine 74, term: req 0 / mine 3

HB alive

2024.05.14 13:24:08.778128 [ 36 ] {}  RaftInstance: pre-vote decision: XX (strong deny, non-existing node)

2024.05.14 13:24:08.778134 [ 36 ] {}  RaftInstance: Response back a pre_vote_response message to 2 with Accepted=0, Term=0, NextIndex=18446744073709551615

2024.05.14 13:24:09.240087 [ 24 ] {}  KeeperDispatcher: Processing config update (Add server 2): pushed

2024.05.14 13:24:09.240159 [ 48 ] {}  RaftInstance: Receive a add_server_request message from 0 with LastLogIndex=0, LastLogTerm 0, EntriesLength=1, CommitIndex=0 and Term=0

2024.05.14 13:24:09.240336 [ 48 ] {}  RaftInstance: sent join request to peer 2, zoo2:9234

2024.05.14 13:24:09.240347 [ 48 ] {}  RaftInstance: Response back a add_server_response message to 1 with Accepted=1, Term=3, NextIndex=75

2024.05.14 13:24:09.240356 [ 48 ] {}  KeeperDispatcher: Processing config update (Add server 2): accepted

2024.05.14 13:24:09.241586 [ 32 ] {}  RaftInstance: 0x7eb9d9676018 connected to zoo2:9234 (as a client)

2024.05.14 13:24:09.258657 [ 33 ] {}  RaftInstance: type: 13, err 0

2024.05.14 13:24:09.258703 [ 33 ] {}  RaftInstance: Receive an extended join_cluster_response message from peer 2 with Result=1, Term=0, NextIndex=1

2024.05.14 13:24:09.258719 [ 33 ] {}  RaftInstance: new server (2) confirms it will join, start syncing logs to it

2024.05.14 13:24:09.258732 [ 33 ] {}  RaftInstance: [SYNC LOG] peer 2 start idx 1, my log start idx 1

2024.05.14 13:24:09.258745 [ 33 ] {}  RaftInstance: [SYNC LOG] LogSync is done for server 2 with log gap 73 (74 - 1, limit 99999), now put the server into cluster

2024.05.14 13:24:09.266283 [ 46 ] {}  RaftInstance: commit upto 75, current idx 74

2024.05.14 13:24:09.266348 [ 46 ] {}  RaftInstance: commit upto 75, current idx 75

2024.05.14 13:24:09.266361 [ 46 ] {}  RaftInstance: config at index 75 is committed, prev config log idx 64

2024.05.14 13:24:09.266377 [ 46 ] {}  RaftInstance: new config log idx 75, prev log idx 64, cur config log idx 64, prev log idx 63

```

And `zoo2` works now:

```

$ docker compose exec zoo2 clickhouse-keeper-client -q 'ls "/"'

clickhouse keeper

```

## Stage 3: add a third node

Now, we need to do the same with the `zoo3` node

```

$ docker compose --profile keeper-cluster up -d

[+] Running 4/4

 ✔ Container keeper-cluster-zoo2-1        Running   0.0s

 ✔ Container keeper-cluster-zoo3-1        Started   0.4s

 ✔ Container keeper-cluster-zoo1-1        Running   0.0s

 ✔ Container keeper-cluster-clickhouse-1  Running   0.0s

$ docker compose exec zoo1 clickhouse-keeper-client -q 'reconfig ADD "server.3=zoo3:9234"'

server.3=zoo3:9234;participant;1

server.1=zoo1:9234;participant;1

server.2=zoo2:9234;participant;1

$ docker compose exec zoo3 clickhouse-keeper-client -q 'ls "/"'

clickhouse keeper

```

## Check the cluster works after restart

To restart the cluster, just run the next command:

```

$ docker compose --profile keeper-cluster restart

```

And let's check the status of keeper nodes:

```

$ for id in {1..3}; do echo zoo${id}; docker compose exec zoo${id} clickhouse-keeper-client -q 'stat' | grep Mode; done

zoo1

Mode: follower

zoo2

Mode: follower

zoo3

Mode: leader

```

## Remaining steps

In real world, there are few more things to do:

- Update the [zookeeper.xml](configs/zookeeper.xml) configuration by adding new nodes there. The `clickhouse-server` must restart to apply zookeeper configuration.

- Update the [keeper_single.xml](configs/keeper_single.xml) config by adding there two new nodes.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/clickhouse/keeper-extend-cluster

Awesome Lists containing this project

README