Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DmitryBe/spark-clickhouse
spark to yandex clickhouse connector
https://github.com/DmitryBe/spark-clickhouse
Last synced: 2 months ago
JSON representation
spark to yandex clickhouse connector
- Host: GitHub
- URL: https://github.com/DmitryBe/spark-clickhouse
- Owner: DmitryBe
- License: other
- Created: 2017-01-20T02:42:06.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2019-09-04T07:57:44.000Z (over 5 years ago)
- Last Synced: 2024-08-03T18:20:22.469Z (6 months ago)
- Language: Scala
- Homepage:
- Size: 1020 KB
- Stars: 69
- Watchers: 7
- Forks: 41
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-clickhouse - DmitryBe/spark-clickhouse - This project is a connector for integrating Apache Spark with Yandex ClickHouse. (Language bindings / JavaScript/Typescript)
README
clickhouse spark connector
==========================> connector #spark DataFrame -> Yandex #ClickHouse table
Example
``` scalaimport io.clickhouse.ext.ClickhouseConnectionFactory
import io.clickhouse.ext.spark.ClickhouseSparkExt._
import org.apache.spark.sql.SparkSession// spark config
val sparkSession = SparkSession.builder
.master("local")
.appName("local spark")
.getOrCreate()val sc = sparkSession.sparkContext
val sqlContext = sparkSession.sqlContext
// create test DF
case class Row1(name: String, v: Int, v2: Int)
val df = sqlContext.createDataFrame(1 to 1000 map(i => Row1(s"$i", i, i + 10)) )// clickhouse params
// any node
val anyHost = "localhost"
val db = "tmp1"
val tableName = "t1"
// cluster configuration must be defined in config.xml (clickhouse config)
val clusterName = Some("perftest_1shards_1replicas"): Option[String]// define clickhouse datasource
implicit val clickhouseDataSource = ClickhouseConnectionFactory.get(anyHost)
// create db / table
//df.dropClickhouseDb(db, clusterName)
df.createClickhouseDb(db, clusterName)
df.createClickhouseTable(db, tableName, "mock_date", Seq("name"), clusterNameO = clusterName)// save DF to clickhouse table
val res = df.saveToClickhouse("tmp1", "t1", (row) => java.sql.Date.valueOf("2000-12-01"), "mock_date", clusterNameO = clusterName)
assert(res.size == 1)
assert(res.get("localhost") == Some(df.count()))```
Docker image
[Docker](https://hub.docker.com/r/dmitryb/clickhouse-spark-connector/)