Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harbby/spark-sql-kafka-0-8
spark structured-streaming kafka0.8 source
https://github.com/harbby/spark-sql-kafka-0-8
Last synced: about 1 month ago
JSON representation
spark structured-streaming kafka0.8 source
- Host: GitHub
- URL: https://github.com/harbby/spark-sql-kafka-0-8
- Owner: harbby
- License: apache-2.0
- Created: 2019-04-28T08:25:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-10-13T13:04:52.000Z (about 4 years ago)
- Last Synced: 2023-07-28T14:32:55.524Z (over 1 year ago)
- Language: Java
- Size: 78.1 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
## spark-sql-kafka-0-8
Spark Structured Streaming kafka source
support kafka-0.8.2.1+ kafka-0.9
## License
Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0## maven
```xmlcom.github.harbby
spark-sql-kafka-0-8
1.0.0```
## limit
* must spark2.3+
* must writeStream().trigger(Trigger.Continuous...)### Use
+ create
```
val sparkSession = ...val kafka:DataFrame = sparkSession.readStream()
.format("kafka08")
.option("topics", "topic1,topic2")
.option("bootstrap.servers", "broker1:9092,broker2:9092")
.option("group.id", "test1")
.option("auto.offset.reset", "largest") //largest or smallest
.option("zookeeper.connect", "zk1:2181,zk2:2181")
.option("auto.commit.enable", "true")
.option("auto.commit.interval.ms", "5000")
.load();
```
+ schema
```
kafka.printSchema();root
|-- _key: binary (nullable = true)
|-- _message: binary (nullable = true)
|-- _topic: string (nullable = false)
|-- _partition: integer (nullable = false)
|-- _offset: long (nullable = false)
```+ sink
```
dataFrame.writeStream()
.trigger(Trigger.Continuous(Duration.apply(90, TimeUnit.SECONDS))) //it is necessary
...
```