An open API service indexing awesome lists of open source software.

https://github.com/clickhouse/copier

clickhouse-copier (obsolete)
https://github.com/clickhouse/copier

Last synced: 11 months ago
JSON representation

clickhouse-copier (obsolete)

Awesome Lists containing this project

README

          

> [!NOTE]
> This tool is no longer supported, but you can use the latest available version as is.

# clickhouse-copier

Copies data from the tables in one cluster to tables in another (or the same) cluster.

To get a consistent copy, the data in the source tables and partitions should not change during the entire process.

You can run multiple `clickhouse-copier` instances on different servers to perform the same job. ClickHouse Keeper, or ZooKeeper, is used for syncing the processes.

After starting, `clickhouse-copier`:

- Connects to ClickHouse Keeper and receives:

- Copying jobs.
- The state of the copying jobs.

- It performs the jobs.

Each running process chooses the “closest” shard of the source cluster and copies the data into the destination cluster, resharding the data if necessary.

`clickhouse-copier` tracks the changes in ClickHouse Keeper and applies them on the fly.

To reduce network traffic, we recommend running `clickhouse-copier` on the same server where the source data is located.

## Download and Install

Download the binaries from the [final release](releases/tag/final).

## Running Clickhouse-copier

The utility should be run manually:

``` bash
$ clickhouse-copier --daemon --config keeper.xml --task-path /task/path --base-dir /path/to/dir
```

Parameters:

- `daemon` — Starts `clickhouse-copier` in daemon mode.
- `config` — The path to the `keeper.xml` file with the parameters for the connection to ClickHouse Keeper.
- `task-path` — The path to the ClickHouse Keeper node. This node is used for syncing `clickhouse-copier` processes and storing tasks. Tasks are stored in `$task-path/description`.
- `task-file` — Optional path to file with task configuration for initial upload to ClickHouse Keeper.
- `task-upload-force` — Force upload `task-file` even if node already exists. Default is false.
- `base-dir` — The path to logs and auxiliary files. When it starts, `clickhouse-copier` creates `clickhouse-copier_YYYYMMHHSS_` subdirectories in `$base-dir`. If this parameter is omitted, the directories are created in the directory where `clickhouse-copier` was launched.

## Format of keeper.xml

``` xml


trace
100M
3



127.0.0.1
2181

```

## Configuration of Copying Tasks

``` xml






false

127.0.0.1
9000



...


...


2



1



0



3

1






source_cluster
test
hits


destination_cluster
test
hits2



ENGINE=ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/hits2', '{replica}')
PARTITION BY toMonday(date)
ORDER BY (CounterID, EventDate)


jumpConsistentHash(intHash64(UserID), 2)


CounterID != 0



'2018-02-26'
'2018-03-05'
...



...

...

```

`clickhouse-copier` tracks the changes in `/task/path/description` and applies them on the fly. For instance, if you change the value of `max_workers`, the number of processes running tasks will also change.

## Build from sources

You don't have to. Download the binaries from the [final release](releases/tag/final).

But if you want, use the following repository snapshot https://github.com/ClickHouse/ClickHouse/tree/1179a70c21eeca88410a012a73a49180cc5e5e2e and proceed with the normal ClickHouse build. The built `clickhouse` binary will contain the copier tool.