{"id":13481347,"url":"https://github.com/crobox/clickhouse-scala-client","last_synced_at":"2025-10-08T05:53:41.149Z","repository":{"id":19423532,"uuid":"86823191","full_name":"crobox/clickhouse-scala-client","owner":"crobox","description":"Clickhouse Scala Client with Reactive Streams support","archived":false,"fork":false,"pushed_at":"2025-09-26T08:55:04.000Z","size":1554,"stargazers_count":118,"open_issues_count":16,"forks_count":28,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-09-26T10:32:28.921Z","etag":null,"topics":["akka","clickhouse","reactive","reactive-streams","scala"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crobox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-03-31T13:40:12.000Z","updated_at":"2025-09-26T08:55:09.000Z","dependencies_parsed_at":"2024-05-10T15:53:31.664Z","dependency_job_id":"ea08ad13-3e92-4aac-80cf-44d4493cb6ee","html_url":"https://github.com/crobox/clickhouse-scala-client","commit_stats":null,"previous_names":[],"tags_count":118,"template":false,"template_full_name":null,"purl":"pkg:github/crobox/clickhouse-scala-client","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crobox%2Fclickhouse-scala-client","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crobox%2Fclickhouse-scala-client/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crobox%2Fclickhouse-scala-client/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crobox%2Fclickhouse-scala-client/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crobox","download_url":"https://codeload.github.com/crobox/clickhouse-scala-client/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crobox%2Fclickhouse-scala-client/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278897285,"owners_count":26064780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["akka","clickhouse","reactive","reactive-streams","scala"],"created_at":"2024-07-31T17:00:51.081Z","updated_at":"2025-10-08T05:53:41.143Z","avatar_url":"https://github.com/crobox.png","language":"Scala","funding_links":[],"categories":["Database","Language bindings","Table of Contents"],"sub_categories":["Scala","Database"],"readme":"# Clickhouse Scala Client\n\n[![Build Status](https://github.com/crobox/clickhouse-scala-client/actions/workflows/ci.yml/badge.svg)](https://github.com/crobox/clickhouse-scala-client/actions/workflows/)\n[![Gitter](https://img.shields.io/gitter/room/clickhouse-scala-client/lobby.svg)](https://gitter.im/clickhouse-scala-client/lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n![Maven Central Version](https://img.shields.io/maven-central/v/com.crobox.clickhouse/client_2.13)\n\n\nClickhouse Scala Client that uses Pekko Http to create a reactive streams implementation to access the [Clickhouse](https://clickhouse.yandex) database in a reactive way.\n\nFeatures:\n* read/write query execution\n* pekko streaming source for result parsing\n* pekko streaming sink for data insertion\n* streaming query progress (experimental)\n* all the http interface settings \n* load balancing with internal health checks (multi host and cluster aware host balancer)\n* ability to retry queries\n\n*We do not guarantee api-backwards compatibility, although the API has been very stable over the last years.*  \n\nScala version: \n- 2.13\n- 2.12\n\nArtifacts:\nhttps://mvnrepository.com/artifact/com.crobox.clickhouse/client_2.12\nhttps://oss.sonatype.org/content/repositories/snapshots/com/crobox/clickhouse\n\nfor sbt you can use\n\n```\n// https://mvnrepository.com/artifact/com.crobox/clickhouse-scala-client_2.12 \nlibraryDependencies += \"com.crobox.clickhouse\" %% \"client\" % \"0.9.0\"\n```\n\n## Documentation\n- [Quick Setup](#quick-setup)\n    - [Client](#client)\n    - [Indexer (Pekko Streams Sink)](#indexer)\n- [Configuration](#configuration)\n    - [Client](#client-configuration)\n        - [Health checks](#health-checks)\n        - [Single host connection](#single-host-connection)\n        - [Multi host balancing connection](#multi-host-balancing-connection)\n        - [Cluster aware balancing connection](#cluster-aware-balancing-connection)\n    - [Indexer configuration](#indexer-configuration)\n    - [Query settings](#query-settings)\n- [Client API](#client-api)\n    - [Query execution](#query-execution)\n    - [Query progress](#query-progress)\n    - [Query settings](#query-settings)\n    - [Query retrying](#query-retrying)\n- [DSL](#dsl)\n- [Test Kit](#test-kit)\n\nWhen in doubt about the documentation please read the tests to find the truth. \n    \n## Quick Setup\n\n### Client\n\n```scala\n\nval config: Config\nval queryDatabase: String = \"default\"\nimplicit val system:ActorSystem\n\nval client = new ClickhouseClient(config, queryDatabase)\nclient.query(\"SELECT 1 + 1\").map(result =\u003e {\n    println(s\"Got query result $result\")\n})\n```\n\n### Indexer\n\n```scala\nval config: Config\nval client: ClickhouseClient\n\nval sink = ClickhouseSink.insertSink(config, client)\nsink.runWith(Source.single(Insert(\"clicks\", \"{some_column: 3 }\")))\n```\n\n## Configuration\n\n - Client: All the configuration keys are under the prefix `crobox.clickhouse.client`\n - Indexer: All the configuration keys are under the prefix `crobox.clickhouse.indexer`. You can also provide specific overrides based on the indexer name by using the same configs under the prefix `crobox.clickhouse.indexer.{indexer-name}`\n\n### Client configuration\n\nYou can find all the configuration options in the [reference file](https://github.com/crobox/clickhouse-scala-client/blob/master/client/src/main/resources/reference.conf), with explanatory comments about their usage.\n\n### Connection configuration\nThree different connection modes are supported.\n\n* single-host\n* balancing-hosts\n* cluster-aware\n\n#### Health checks\n\nThe `balancing-hosts` and `cluster-aware` connections are setting up health checks for each host, by running a simple http request on clickhouse host as specified in the clickhouse [docs](https://clickhouse.yandex/docs/en/interfaces/http_interface/). For the healthchecks we use separate `Cached Host Connection Pools` with a maximum of one connection to ensure we never run more than one health check at the same time for the same host. When a host fails the healthchecks we will no longer use it to run queries. If all the health checks are failing the queries will fail fast.\n\n```\ncrobox.clickhouse.client.connection {\n      health-check {\n        interval = 5 seconds #minimum interval between two health checks\n        timeout = 1 second #health check will fail if it exceed timeout\n      }\n}\n```\n\n#### Single host connection\n\n```\ncrobox.clickhouse.client {\n    connection: {\n        type = \"single-host\",\n        host = \"localhost\",\n        port = 8123\n    }\n}\n\n```\nThis will not setup a health check and will dispatch all queries to the configured host.\n\n\n\n#### Multi host balancing connection\n\nRound robin on the configured hosts.\n\n```\ncrobox.clickhouse.client {\n    connection: {\n        type = \"balancing-hosts\"\n        hosts: [\n          {\n            host = \"localhost\",\n            port = 7415\n          }\n        ]\n        \n    }\n}\n\n```\n\n#### Cluster aware balancing connection\n\nThe host and the port will be used to continually update the list of clickhouse nodes by querying and using the `host-name` from the `system.cluster` clickhouse table. (check `scanning-interval`)\nYou can specify a specific clickhouse cluster to run queries only on the respective cluster.\nPlease do note that this connection type will default to using the port of 8123 for all nodes.\n\n```\ncrobox.clickhouse.client {\n    connection: {\n        type = \"cluster-aware\"\n        host = \"localhost\"\n        port = 8123\n        cluster = \"cluster\" # use only hosts which belong to the \"cluster\" cluster\n        health-check {\n              interval = 5 seconds\n              timeout = 1 second\n        }\n        scanning-interval = 10 seconds # min interval between running a new query to update the list of hosts from the system.cluster table \n    }\n}\n\n```\n\n\n### Indexer configuration\n\nInserting into clickhouse is done using an pekko stream. All the settings are applied on a per table basis.\nWe will do one insert when the maximum number of items `batch-size` or the maximum time has been exceeded `flush-interval`. Based on the number of `concurrent-requests` we can run multiple inserts in parallel for the same table.\n\n```\ncrobox.clickhouse {\n  indexer {\n    batch-size = 10000\n    concurrent-requests = 1\n    flush-interval = 5 seconds\n    fast-indexer {\n        flush-interval = 1 second\n        batch-size = 1000\n    }\n  }\n}\n```\n\n### Query settings\n\nTo set authentication or a settings profile for the client you can update the following configs.\nYou can also set custom settings as presented in the [clickhouse documentation](https://clickhouse.yandex/docs/en/operations/settings/settings/)\n\n```\ncrobox.clickhouse.client{\n    settings {\n      authentication {\n        user = \"default\"\n        password = \"\"\n      }\n      profile = \"default\"\n      http-compression = false\n//      https://clickhouse.yandex/docs/en/operations/settings/settings/\n      custom {\n           distributed_product_mode = \"local\"\n      }\n    }\n}\n```\n\n## Client API\n\n### Query execution\n\nRead only queries\n\n```scala\nval client: ClickhouseClient\nclient.query(\"SELECT 1\").map(result =\u003e println(result))\n```\n\nWrite queries\n\n```scala\nval client: ClickhouseClient\nclient.execute(\"ALTER TABLE my_table DELETE WHERE id = 'deleted'\").map(result =\u003e println(result))\n```\n\nStreaming delimited result (by new line)\n\n```scala\nval client: ClickhouseClient\nclient.source(\"SELECT * FROM my_table\").runWith(Sink.foreach(line =\u003e println(line)))\n```\n\nStreaming raw result (ByteString)\n\n```scala\nval client: ClickhouseClient\nclient.sourceByteString(\"SELECT * FROM my_table\").runWith(Sink.foreach(byteString =\u003e println(byteString)))\n```\n\nSink streaming body\n\n```scala\nval client: ClickhouseClient\nclient.sink(\"INSERT INTO my_table\", Source.single(ByteString(\"el1\"))).map(result =\u003e println(result))\n```\n\n### Query progress\n\n@Experimental - might not be complete\n\nWe only expose progress when running read only queries. The current implementation is recommended to be used only for long running queries which return a result relatively small in size (fits easily in memory).\nThe returned source is materialized with the query result.\n\nWhen running queries with progress we set a custom client transport for the super pool used by client to run the queries. Due to limitation in the pekko implementation which does not allow for the headers to be streamed we are parsing the raw http output and intercept the http headers to receive the progress.\n\nWe expose multiple events for the progress:\n * QueryAccepted - clickhouse returned the http response with code 200 (query might still fail)\n * QueryRejected - clickhouse returned the http response with a code different than 200 (it has not started execution)\n * QueryFailed - clickhouse returned an exception in the body, after the query was accepted and it started execution\n * Progress - contains the numbers of rows read and the number of total rows \n * QueryRetry - the same query is being retried by the client\n\n```scala\nval client: ClickhouseClient\nclient.queryWithProgress(\"SELECT uniq(timestamps), uniq(mosquito_name) FROM mosquito_bites\")\n      .toMat(Sink.forEach(progress =\u003e println(progress)))(Keep.left)\n      .run()\n      .map(result =\u003e println(result))\n```\n### Query settings\n\nEvery call to the client accepts an implicit `QuerySettings` object which can override settings for that specific query.\n\n - You can set the query id so that you can track/kill/replace running queries.\n - You can mark the query as idempotent and it will be retried for all exceptions when running the `ClickhouseSink`(Indexer), or running queries using `client.query/client.execute`.\n - You can set specific clickhouse query settings to override the default ones\n - You can use a different clickhouse profile\n - You can run the query as a different user\n\n```scala\nval client: ClickhouseClient\nimplicit val settings = QuerySettings(queryId = Some(\"expensive_query\"),settings = Map(\"replace_running_query\" -\u003e \"1\"))\nclient.query(\"SELECT uniq(expensive) FROM huge_table\")//start query\nclient.query(\"SELECT uniq(expensive) FROM huge_table\")//replaces existing query\n```\n\n### Query retrying\n\nQuery retrying takes advantage of host balancing and will request another host for each retry.\n\nThe queries that use the client api `source`, `sink` are not going to be retried.\n\nAll the read only queries are considered idempotent and are retried up to a maximum number of configurable times. (3 times by default, so 4 total execution, 1 the initial execution and 3 retries)\n```\ncrobox.clickhouse.retries = 3\n```\n\nBy using the `ClickhouseSink` you can also retry inserts by setting the `idempotent` setting to true on the query settings.\n\n# DSL\n\nTyped/composable DSL that is interpreted and parsed into queries, with ofcourse full seamless integration into the driver.\n\nFor more information see [the wiki](https://github.com/crobox/clickhouse-scala-client/wiki)\n\n# Test Kit\n\nWe also expose an utility test kit which provider a helpful spec with testing utilities. It automatically creates a single use database before all tests and drops it afterwards.\n\n```\n// https://mvnrepository.com/artifact/com.crobox/clickhouse-scala-client_2.12 \nlibraryDependencies += \"com.crobox.clickhouse\" %% \"testkit\" % \u003clatest_version\u003e\n```\n\nCheck [the spec](https://github.com/crobox/clickhouse-scala-client/blob/master/testkit/src/main/scala/com/crobox/clickhouse/testkit/ClickhouseSpec.scala) for more details.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrobox%2Fclickhouse-scala-client","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrobox%2Fclickhouse-scala-client","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrobox%2Fclickhouse-scala-client/lists"}