https://github.com/epishova/structured-streaming-cassandra-sink
An example of how to create and use Cassandra sink in Spark Structured Streaming application
https://github.com/epishova/structured-streaming-cassandra-sink
cassandra scala sink spark structured-streaming
Last synced: 12 months ago
JSON representation
An example of how to create and use Cassandra sink in Spark Structured Streaming application
- Host: GitHub
- URL: https://github.com/epishova/structured-streaming-cassandra-sink
- Owner: epishova
- Created: 2018-07-16T19:00:20.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-06T18:37:06.000Z (over 7 years ago)
- Last Synced: 2025-06-25T13:07:33.386Z (12 months ago)
- Topics: cassandra, scala, sink, spark, structured-streaming
- Language: Scala
- Size: 52.7 KB
- Stars: 8
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Structured-Streaming-Cassandra-Sink
### An example of how to create and use Cassandra sink in Spark Structured Streaming application
This code was developed as part of the Insight Data Engineering [project](https://github.com/epishova/FXTrue-Structured-Streaming-Insight-Project). This is a simple example of how to create and use Cassandra sink in Spark Structured Streaming. I hope it will be useful for those who have just begun to work with Structured Streaming API. I am new to it too, so comments and suggestions on how to improve the application are very welcome.
The idea of this application is very simple. It reads messages from Kafka, parses them, and saves them into Cassandra. This example was run on AWS cluster, so if you'd like to test it just replace the addresses of my AWS instances with yours (everything that looks like `ec2-xx-xxx-xx-xx.compute-1.amazonaws.com`).
This repo contains `pom.xml` and can be built with Maven by `mvn package`. After that you can execute the application using
`./bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1,datastax:spark-cassandra-connector:2.3.0-s_2.11 --class com.insight.app.CassandraSink.KafkaToCassandra --master spark://ec2-18-232-26-53.compute-1.amazonaws.com:7077 target/cassandra-sink-0.0.1-SNAPSHOT.jar`.
You can read the detailed description in `blog_draft.md` or [here](https://dzone.com/articles/cassandra-sink-for-spark-structured-streaming).