Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/IMSMWU/RClickhouse
A 'DBI' Interface to the Clickhouse Database Providing Basic 'dplyr' Support
https://github.com/IMSMWU/RClickhouse
clickhouse clickhouse-database dbi-interface dplyr dplyr-sql-backends r
Last synced: 2 months ago
JSON representation
A 'DBI' Interface to the Clickhouse Database Providing Basic 'dplyr' Support
- Host: GitHub
- URL: https://github.com/IMSMWU/RClickhouse
- Owner: IMSMWU
- License: gpl-2.0
- Created: 2017-03-10T23:08:47.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2024-01-19T12:07:08.000Z (12 months ago)
- Last Synced: 2024-11-04T01:50:08.882Z (2 months ago)
- Topics: clickhouse, clickhouse-database, dbi-interface, dplyr, dplyr-sql-backends, r
- Language: C++
- Homepage:
- Size: 36.5 MB
- Stars: 92
- Watchers: 10
- Forks: 26
- Open Issues: 25
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
- awesome-clickhouse - IMSMWU/RClickhouse - RClickhouse is an R package that provides a DBI interface to the Clickhouse database with basic dplyr support. (Language bindings / R)
- jimsghstars - IMSMWU/RClickhouse - A 'DBI' Interface to the Clickhouse Database Providing Basic 'dplyr' Support (C++)
README
# RClickhouse
![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg) ![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](https://img.shields.io/github/release/IMSMWU/RClickhouse.svg) [![Build Status](https://travis-ci.org/IMSMWU/RClickhouse.svg?branch=master)](https://travis-ci.org/IMSMWU/RClickhouse)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/RClickhouse)](https://cran.r-project.org/package=RClickhouse)
[![CRAN RStudio mirror downloads](http://cranlogs.r-pkg.org/badges/RClickhouse)](https://cran.r-project.org/package=RClickhouse)## Overview
[ClickHouse ©](https://clickhouse.com/) is a high-performance relational column-store database to enable big data exploration and 'analytics' scaling to petabytes of data. Methods are provided that enable working with 'Yandex Clickhouse' databases via 'DBI' methods and using 'dplyr'/'dbplyr' idioms.
This R package is a DBI interface for the Yandex Clickhouse database. It provides basic dplyr support by auto-generating SQL-commands using dbplyr and is based on the official [C++ Clickhouse Client](https://github.com/ClickHouse/clickhouse-cpp).
To cite this library, please use the BibTeX entry provided in **inst/CITATION**.
## Installation
This package is available on CRAN, and thus installable by running:```R
install.packages("RClickhouse")
```You can also install the latest development version directly from github using devtools:
```R
devtools::install_github("IMSMWU/RClickhouse")
```## Usage
#### Create a DBI Connection:
> *Note:* please be aware that {RClickhouse} doesn't use a HTTP interface in order to communicate with Clickhouse. Thus, You may use the native interface port (by default 9000) instead of the HTTP interface (8123).
``` r
con <- DBI::dbConnect(RClickhouse::clickhouse(), host="example-db.com")
```#### Write data to the database:
``` r
DBI::dbWriteTable(con, "mtcars", mtcars)dbListTables(con)
dbListFields(con, "mtcars")
```#### Query a database using [dplyr](https://dplyr.tidyverse.org/):
``` r
library(dplyr)
tbl(con, "mtcars") %>%
group_by(cyl) %>%
summarise(smpg=sum(mpg))
tbl(con, "mtcars") %>%
filter(cyl == 8, vs == 0) %>%
group_by(am) %>%
summarise(mean(qsec))# Close the connection
dbDisconnect(con)
```#### Query a database using [SQL-style commands](https://www.codecademy.com/articles/sql-commands) with `DBI::dbGetQuery`:
``` r
DBI::dbGetQuery(con, "SELECT
vs
,COUNT(*) AS 'number of cases'
,AVG(qsec) AS 'average qsec'
FROM mtcars
GROUP BY vs")# Save results of querying:
res <- DBI::dbGetQuery(con, "SELECT (*)
FROM mtcars
WHERE am = 1")# Or save the whole set of data (only useful for smaller datasets, for better performance and for larger datasets always use remote servers):
mtcars <- dbReadTable(con, mtcars)# Close the connection
dbDisconnect(con)
```#### Query a database using [ClickHouse functions](https://clickhouse.yandex/docs/en/query_language/functions/)
``` r
# Get the names of all the avaliable databases
DBI::dbGetQuery(con, "SHOW DATABASES")# Get information about the variable names and types
DBI::dbGetQuery(con, "DESCRIBE TABLE mtcars")# Compact CASE - WHEN - THEN conditionals
DBI::dbGetQuery(con, "SELECT multiIf(am='1', 'automatic', 'manual') AS 'transmission'
,multiIf(vs='1', 'straight', 'V-shaped') AS 'engine'
FROM mtcars")# Close the connection
dbDisconnect(con)
```### Config File
You may use a config file that is looked up for automatic initialization of the dbConnect parameters.To do so, create a yaml file (default ```RClickhouse.yaml```), in at least one directory (default lookup paths of parameter config_paths: ```./RClickhouse.yaml, ~/.R/RClickhouse.yaml, /etc/RClickhouse.yaml```), e.g. ```~/.R/configs/RClickhouse.yaml``` and pass a vector of the corresponding file paths to ```dbConnect``` as ```config_paths``` parameter.
In ```RClickhouse.yaml```, you may specify a variable number of parameters (```host, port, db, user, password, compression```) to be initialized using the following format (example):
```YAML
host: example-db.com
port: 1111
```
The actual initialization of the parameters of ```dbConnect``` follows a hierarchical structure with varying priorities (1 to 3, where 1 is highest):
1. Specified input parameters when calling ```dbConnect```. If parameters are unspecified, fall back to (2)
2. Parameters specified in ```RClickhouse.yaml```, where the level of priority depends on the position of the path in the config_path input vector (first position, highest priority etc.). If parameters are unspecified, fall back to (3).
3. Default parameters (```host="localhost", port = 9000, db = "default", user = "default", password = "", compression = "lz4"```).## Acknowledgements
Big thanks to Kirill Müller, Maxwell Peterson, Artemkin Pavel and Hannes Mühleisen.