Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fsanaulla/chronicler-spark

InfluxDB connector to Apache Spark on top of Chronicler
https://github.com/fsanaulla/chronicler-spark

chronicler dataframe influxdb rdd scala spark streaming

Last synced: 2 months ago
JSON representation

InfluxDB connector to Apache Spark on top of Chronicler

Awesome Lists containing this project

README

        

# chronicler-spark

[![Scala CI](https://github.com/fsanaulla/spark-http-rdd/actions/workflows/scala.yml/badge.svg)](https://github.com/fsanaulla/chronicler-spark/actions/workflows/scala.yml)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.fsanaulla/chronicler-spark-core_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.fsanaulla/chronicler-spark-core_2.12)
[![Scala Steward badge](https://img.shields.io/badge/Scala_Steward-helping-blue.svg?style=flat&logo=)](https://scala-steward.org)

Open-source [InfluxDB](https://www.influxdata.com/) connector for [Apache Spark](https://spark.apache.org/index.html) on top of [Chronicler](https://github.com/fsanaulla/chronicler).

## Get Started

At the beginning add required module to your `build.sbt`:

```
// For RDD
libraryDependencies += "com.github.fsanaulla" %% "chronicler-spark-rdd" %

// For Dataset
libraryDependencies += "com.github.fsanaulla" %% "chronicler-spark-ds" %

// For Structured Streaming
libraryDependencies += "com.github.fsanaulla" %% "chronicler-spark-structured-streaming" %

// For DStream
libraryDependencies += "com.github.fsanaulla" %% "chronicler-spark-streaming" %
```

## Usage

Default configuration:
```
final case class InfluxConfig(
host: String,
port: Int = 8086,
credentials: Option[InfluxCredentials] = None,
compress: Boolean = false,
ssl: Boolean = false)
```
It's recommended to enable data compression to decrease network traffic.

For `RDD[T]`:

```
import com.github.fsanaulla.chronicler.spark.rdd._

val rdd: RDD[T] = _
rdd.saveToInfluxDBMeas("dbName", "measurementName")

// to save with dynamicly generated measurement
rdd.saveToInfluxDB("dbName")
```
For `Dataset[T]`:
```
import com.github.fsanaulla.chronicler.spark.ds._

val ds: Dataset[T] = _
ds.saveToInfluxDBMeas("dbName", "measurementName")

// to save with dynamicly generated measurement
ds.saveToInfluxDB("dbName")

```
For `DataStreamWriter[T]`
```

import com.github.fsanaulla.chronicler.spark.structured.streaming._

val structStream: DataStreamWriter[T] = _
val saved = structStream.saveToInfluxDBMeas("dbName", "measurementName")

// to save with dynamicly generated measurement
val saved = structStream.saveToInfluxDB("dbName")
..
saved.start().awaitTermination()

```

For `DStream[T]`:
```
import com.github.fsanaulla.chronicler.spark.streaming._

val stream: DStream[T] = _
stream.saveToInfluxDBMeas("dbName", "measurementName")
stream,saveToInfluxDB("dbName")
```