https://github.com/shlima/click_house

Modern Ruby database driver for ClickHouse
https://github.com/shlima/click_house
clickhouse gem ruby
Last synced: 7 months ago
JSON representation
Modern Ruby database driver for ClickHouse
Host: GitHub
URL: https://github.com/shlima/click_house
Owner: shlima
License: mit
Created: 2019-11-08T15:11:37.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2024-06-18T10:12:56.000Z (over 1 year ago)
Last Synced: 2024-12-18T12:42:57.893Z (12 months ago)
Topics: clickhouse, gem, ruby
Language: Ruby
Homepage: https://clickhouse.yandex/docs/en/
Size: 306 KB
Stars: 183
Watchers: 4
Forks: 25
Open Issues: 12
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project

awesome-clickhouse - shlima/click_house - A modern Ruby database driver providing a comprehensive and efficient interface to interact with ClickHouse databases via the HTTP protocol. (Language bindings / Ruby)
README

          ![](./doc/logo.svg?sanitize=true)

# ClickHouse Ruby driver

![CI](https://github.com/shlima/click_house/workflows/CI/badge.svg)

[![Code Climate](https://codeclimate.com/github/shlima/click_house/badges/gpa.svg)](https://codeclimate.com/github/shlima/click_house)

[![Gem Version](https://badge.fury.io/rb/click_house.svg)](https://badge.fury.io/rb/click_house)

```bash

gem install click_house

```

A modern Ruby database driver for ClickHouse. [ClickHouse](https://clickhouse.yandex)

is a high-performance column-oriented database management system developed by

[Yandex](https://yandex.com/company) which operates Russia's most popular search engine.

> This development was inspired by currently [unmaintainable alternative](https://github.com/archan937/clickhouse)

> but rewritten and well tested

### Why use the HTTP interface and not the TCP interface?

Well, the developers of ClickHouse themselves [discourage](https://github.com/yandex/ClickHouse/issues/45#issuecomment-231194134) using the TCP interface.

> TCP transport is more specific, we don't want to expose details.

Despite we have full compatibility of protocol of different versions of client and server, we want to keep the ability to "break" it for very old clients. And that protocol is not too clean to make a specification.

Yandex uses HTTP interface for working from Java and Perl, Python and Go as well as shell scripts.

# TOC

* [Configuration](#configuration)

* [Usage](#usage)

* [Queries](#queries)

* [Insert](#insert)

* [Create a table](#create-a-table)

* [Alter table](#alter-table)

* [Type casting](#type-casting)

* [Using with a connection pool](#using-with-a-connection-pool)

* [Using with Rails](#using-with-rails)

* [Using with ActiveRecord](#using-with-activerecord)

* [Using with RSpec](#using-with-rspec)

* [Development](#development)

## Configuration

```ruby

ClickHouse.config do |config|

  config.logger = Logger.new(STDOUT)

  config.adapter = :net_http

  config.database = 'metrics'

  config.url = 'http://localhost:8123'

  config.timeout = 60

  config.open_timeout = 3

  config.ssl_verify = false

  # set to true to symbolize keys for SELECT and INSERT statements (type casting)

  config.symbolize_keys = false

  config.headers = {}

  # or provide connection options separately

  config.scheme = 'http'

  config.host = 'localhost'

  config.port = 'port'

  # if you use HTTP basic Auth

  config.username = 'user'

  config.password = 'password'

  # if you want to add settings to all queries

  config.global_params = { mutations_sync: 1 }

  

  # choose a ruby JSON parser (default one)

  config.json_parser = ClickHouse::Middleware::ParseJson

  # or Oj parser

  config.json_parser = ClickHouse::Middleware::ParseJsonOj

  # JSON.dump (default one)

  config.json_serializer = ClickHouse::Serializer::JsonSerializer

  # or Oj.dump

  config.json_serializer = ClickHouse::Serializer::JsonOjSerializer

end

```

Alternative, you can assign configuration parameters via a hash

```ruby

ClickHouse.config.assign(logger: Logger.new(STDOUT))

```

Now you are able to communicate with ClickHouse:

```ruby

ClickHouse.connection.ping #=> true

```

You can easily build a new raw connection and override any configuration parameter

(such as database name, connection address)

```ruby

@connection = ClickHouse::Connection.new(ClickHouse::Config.new(logger: Rails.logger))

@connection.ping

```

## Usage

```ruby

ClickHouse.connection.ping #=> true

ClickHouse.connection.replicas_status #=> true

ClickHouse.connection.databases #=> ["default", "system"]

ClickHouse.connection.create_database('metrics', if_not_exists: true, engine: nil, cluster: nil)

ClickHouse.connection.drop_database('metrics', if_exists: true, cluster: nil)

ClickHouse.connection.tables #=> ["visits"]

ClickHouse.connection.describe_table('visits') #=> [{"name"=>"id", "type"=>"FixedString(16)", "default_type"=>""}]

ClickHouse.connection.table_exists?('visits', temporary: nil) #=> true

ClickHouse.connection.drop_table('visits', if_exists: true, temporary: nil, cluster: nil)

ClickHouse.connection.create_table(*) # see  section

ClickHouse.connection.truncate_table('name', if_exists: true, cluster: nil)

ClickHouse.connection.truncate_tables(['table_1', 'table_2'], if_exists: true, cluster: nil)

ClickHouse.connection.truncate_tables # will truncate all tables in database

ClickHouse.connection.rename_table('old_name', 'new_name', cluster: nil)

ClickHouse.connection.rename_table(%w[table_1 table_2], %w[new_1 new_2], cluster: nil)

ClickHouse.connection.alter_table('table', 'DROP COLUMN user_id', cluster: nil)

ClickHouse.connection.add_index('table', 'ix', 'has(b, a)', type: 'minmax', granularity: 2, cluster: nil)

ClickHouse.connection.drop_index('table', 'ix', cluster: nil)

ClickHouse.connection.select_all('SELECT * FROM visits')

ClickHouse.connection.select_one('SELECT * FROM visits LIMIT 1')

ClickHouse.connection.select_value('SELECT ip FROM visits LIMIT 1')

ClickHouse.connection.explain('SELECT * FROM visits CROSS JOIN visits')

```

## Queries

### Select All

Select all type-casted result set

```ruby

@result = ClickHouse.connection.select_all('SELECT * FROM visits')

# all enumerable methods are delegated like #each, #map, #select etc

# results of #to_a is TYPE CASTED

@result.to_a #=> [{"date"=>#, "id"=>1}]

# raw results (WITHOUT type casting)

# much faster if selecting a large amount of data

@result.data #=> [{"date"=>"2000-01-01", "id"=>1}, {"date"=>"2000-01-02", "id"=>2}]

# you can access raw data

@result.meta #=> [{"name"=>"date", "type"=>"Date"}, {"name"=>"id", "type"=>"UInt32"}]

@result.statistics #=> {"elapsed"=>0.0002271, "rows_read"=>2, "bytes_read"=>12}

@result.summary #=> ClickHouse::Response::Summary

@result.headers #=> {"x-clickhouse-query-id"=>"9bf5f604-31fc-4eff-a4b5-277f2c71d199"}

@result.types #=> [Hash]

```

### Select Value

Select value returns exactly one type-casted value

```ruby

ClickHouse.connection.select_value('SELECT COUNT(*) from visits') #=> 0

ClickHouse.connection.select_value("SELECT toDate('2019-01-01')") #=> #

ClickHouse.connection.select_value("SELECT toDateOrZero(NULL)") #=> nil

```

### Select One

Returns a record hash with the column names as keys and column values as values.

```ruby

ClickHouse.connection.select_one('SELECT date, SUM(id) AS sum FROM visits GROUP BY date')

#=> {"date"=>#, "sum"=>1}

```

### Execute Raw SQL

By default, gem provides parser for `JSON` and `CSV` response formats. Type conversion

available for the `JSON`.

```ruby

# format not specified

response = ClickHouse.connection.execute <<~SQL

  SELECT count(*) AS counter FROM rspec

SQL

response.body #=> "2\n"

# JSON

response = ClickHouse.connection.execute <<~SQL

  SELECT count(*) AS counter FROM rspec FORMAT JSON

SQL

response.body #=> {"meta"=>[{"name"=>"counter", "type"=>"UInt64"}], "data"=>[{"counter"=>"2"}], "rows"=>1, "statistics"=>{"elapsed"=>0.0002412, "rows_read"=>2, "bytes_read"=>4}}

# CSV

response = ClickHouse.connection.execute <<~SQL

  SELECT count(*) AS counter FROM rspec FORMAT CSV

SQL

response.body #=> [["2"]]

# You may use any format supported by ClickHouse

response = ClickHouse.connection.execute <<~SQL

  SELECT count(*) AS counter FROM rspec FORMAT RowBinary

SQL

response.body #=> "\u0002\u0000\u0000\u0000\u0000\u0000\u0000\u0000"

```

## Insert

When column names and values are transferred separately, data sends to the server 

using `JSONCompactEachRow` format by default.

```ruby

ClickHouse.connection.insert('table', columns: %i[id name]) do |buffer|

  buffer << [1, 'Mercury']

  buffer << [2, 'Venus']

end

# or

ClickHouse.connection.insert('table', columns: %i[id name], values: [[1, 'Mercury'], [2, 'Venus']])

```

When rows are passed as an Array or a Hash, data sends to the server

using `JSONEachRow` format by default.

```ruby

ClickHouse.connection.insert('table', [{ name: 'Sun', id: 1 }, { name: 'Moon', id: 2 }])

# or

ClickHouse.connection.insert('table', { name: 'Sun', id: 1 })

# for ruby < 3.0 provide an extra argument

ClickHouse.connection.insert('table', { name: 'Sun', id: 1 }, {})

# or

ClickHouse.connection.insert('table') do |buffer|

  buffer << { name: 'Sun', id: 1 }

  buffer << { name: 'Moon', id: 2 }

end

```

Sometimes it's needed to use other format than `JSONEachRow` For example if you want to send BigDecimal's 

you could use `JSONStringsEachRow` format so string representation of `BigDecimal` will be parsed:

```ruby

ClickHouse.connection.insert('table', { name: 'Sun', id: '1' }, format: 'JSONStringsEachRow')

# or

ClickHouse.connection.insert_rows('table', { name: 'Sun', id: '1' }, format: 'JSONStringsEachRow')

# or

ClickHouse.connection.insert_compact('table', columns: %w[name id], values: %w[Sun 1], format: 'JSONCompactStringsEachRow')

```

See the [type casting](#type-casting) section to insert the data in a proper way.

## Create a table

### Create table using DSL

```ruby

ClickHouse.connection.create_table('visits', if_not_exists: true, engine: 'MergeTree(date, (year, date), 8192)') do |t|

  t.FixedString :id, 16

  t.UInt16      :year, low_cardinality: true

  t.Date        :date

  t.DateTime    :time, 'UTC'

  t.Decimal     :money, 5, 4

  t.String      :event

  t.UInt32      :user_id

  t.IPv4        :ipv4

  t.IPv6        :ipv6

end

```

### Create nullable columns

```ruby

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|

  t.UInt16 :id, 16, nullable: true

end

```

### Set column options

```ruby

ClickHouse.connection.create_table('visits', engine: 'MergeTree(date, (year, date), 8192)') do |t|

  t.UInt16  :year

  t.Date    :date

  t.UInt16  :id, 16, default: 0, ttl: 'date + INTERVAL 1 DAY'

end

```

### Define column with custom SQL

```ruby

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|

  t << "vendor Enum('microsoft' = 1, 'apple' = 2)"

  t << "tags Array(String)"

end

```

### Define nested structures

```ruby

ClickHouse.connection.create_table('visits', engine: 'TinyLog') do |t|

  t.UInt8 :id

  t.Nested :json do |n|

    n.UInt8 :cid

    n.Date  :created_at

    n.Date  :updated_at

  end

end

```

### Set table options

```ruby

ClickHouse.connection.create_table('visits',

  order: 'year',

  ttl: 'date + INTERVAL 1 DAY',

  sample: 'year',

  settings: 'index_granularity=8192',

  primary_key: 'year',

  engine: 'MergeTree') do |t|

  t.UInt16  :year

  t.Date    :date

end

```

### Create table with raw SQL

```ruby

ClickHouse.connection.execute <<~SQL

  CREATE TABLE visits(int Nullable(Int8), date Nullable(Date)) ENGINE TinyLog

SQL

```

## Alter table

### Alter table with DSL

```ruby

ClickHouse.connection.add_column('table', 'column_name', :UInt64, default: nil, if_not_exists: nil, after: nil, cluster: nil)

ClickHouse.connection.drop_column('table', 'column_name', if_exists: nil, cluster: nil)

ClickHouse.connection.clear_column('table', 'column_name', partition: 'partition_name', if_exists: nil, cluster: nil)

ClickHouse.connection.modify_column('table', 'column_name', type: :UInt64, default: nil, if_exists: false, cluster: nil)

```

### Alter table with SQL

```ruby

# By SQL in argument

ClickHouse.connection.alter_table('table', 'DROP COLUMN user_id', cluster: nil)

# By SQL in a block

ClickHouse.connection.alter_table('table', cluster: nil) do

  <<~SQL

    MOVE PART '20190301_14343_16206_438' TO VOLUME 'slow'

  SQL

end

```

## Type casting

By default gem provides all necessary type casting, but you may overwrite or define

your own logic. if you need to redefine all built-in types with your implementation,

just clear the default type system:

```ruby

ClickHouse.types.clear

ClickHouse.types # => {}

ClickHouse.types.default #=> #

```

Type casting works automatically when fetching data, when inserting data, you must serialize the types yourself

```sql

CREATE TABLE assets(visible Boolean, tags Array(Nullable(String))) ENGINE Memory

```

```ruby

# cache table schema in a class variable

@schema = ClickHouse.connection.table_schema('assets')

# Json each row

ClickHouse.connection.insert('assets', @schema.serialize({'visible' => true, 'tags' => ['ruby']}))

# Json compact

ClickHouse.connection.insert('assets', columns: %w[visible tags]) do |buffer|

  buffer << [

    @schema.serialize_column("visible", true),

    @schema.serialize_column("tags", ['ruby']),

  ]

end

```

## Using with a connection pool

```ruby

require 'connection_pool'

ClickHouse.connection = ConnectionPool.new(size: 2) do

  ClickHouse::Connection.new(ClickHouse::Config.new(url: 'http://replica.example.com'))

end

ClickHouse.connection.with do |conn|

  conn.tables

end

```

## Using with Rails

```yml

# config/click_house.yml

default: &default

  url: http://localhost:8123

  timeout: 60

  open_timeout: 3

development:

  database: ecliptic_development

  <<: *default

test:

  database: ecliptic_test

  <<: *default

production:

  <<: *default

  database: ecliptic_production

```

```ruby

# config/initializers/click_house.rb

ClickHouse.config do |config|

  config.logger = Rails.logger

  config.assign(Rails.application.config_for('click_house'))

end

```

```ruby

# lib/tasks/click_house.rake

namespace :click_house do

  task prepare: :environment do

    @environments = Rails.env.development? ? %w[development test] : [Rails.env]

  end

  task drop: :prepare do

    @environments.each do |env|

      config = ClickHouse.config.clone.assign(Rails.application.config_for('click_house', env: env))

      connection = ClickHouse::Connection.new(config)

      connection.drop_database(config.database, if_exists: true)

    end

  end

  task create: :prepare do

    @environments.each do |env|

      config = ClickHouse.config.clone.assign(Rails.application.config_for('click_house', env: env))

      connection = ClickHouse::Connection.new(config)

      connection.create_database(config.database, if_not_exists: true)

    end

  end

end

```

Prepare the ClickHouse database:

```bash

rake click_house:drop click_house:create

```

If your are using SQL Database in Rails, you can manage ClickHouse migrations

using `ActiveRecord::Migration` mechanism

```ruby

class CreateAdvertVisits < ActiveRecord::Migration[6.0]

  def up

    ClickHouse.connection.create_table('visits', engine: 'MergeTree(date, (account_id, advert_id), 512)') do |t|

      t.UInt16   :account_id

      t.UInt16   :user_id

      t.Date     :date

    end

  end

  def down

    ClickHouse.connection.drop_table('visits')

  end

end

```

## Using with ActiveRecord

if you use `ActiveRecord`, you can use the ORM query builder by using fake models

(empty tables must be present in the SQL database `create_table :visits`)

```ruby

class ClickHouseRecord < ActiveRecord::Base

  self.abstract_class = true

  class << self

    def agent

      ClickHouse.connection

    end

    def insert(*argv, &block)

      agent.insert(table_name, *argv, &block)

    end

    def select_one

      agent.select_one(current_scope.to_sql)

    end

    def select_value

      agent.select_value(current_scope.to_sql)

    end

    def select_all

      agent.select_all(current_scope.to_sql)

    end

    def explain

      agent.explain(current_scope.to_sql)

    end

  end

end

````

````ruby

# FAKE MODEL FOR ClickHouse

class Visit < ClickHouseRecord

  scope :with_os, -> { where.not(os_family_id: nil) }

end

Visit.with_os.select('COUNT(*) as counter').group(:ipv4).select_all

#=> [{ 'ipv4' => 1455869, 'counter' => 104 }]

Visit.with_os.select('COUNT(*)').select_value

#=> 20_345_678

Visit.where(user_id: 1).select_one

#=> { 'ipv4' => 1455869, 'user_id' => 1 }

````

## Using with RSpec

You can clear the data table before each test with RSpec

```ruby

RSpec.configure do |config|

  config.before(:each, truncate_click_house: true) do

    ClickHouse.connection.truncate_tables

  end

end

```

```ruby

RSpec.describe Api::MetricsCountroller, truncate_click_house: true do

  it { }

  it { }

end

```

## Development

```bash

make dockerize

rspec

rubocop

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shlima/click_house

Awesome Lists containing this project

README