https://github.com/johngodoi/scalasparkkafka
This code just loads data to kafka through apache spark and reads it back.
https://github.com/johngodoi/scalasparkkafka
docker docker-compose kafka spark spark-kafka spark-sql spark-streaming
Last synced: 3 months ago
JSON representation
This code just loads data to kafka through apache spark and reads it back.
- Host: GitHub
- URL: https://github.com/johngodoi/scalasparkkafka
- Owner: johngodoi
- Created: 2021-04-01T01:06:31.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-04-11T13:50:16.000Z (about 4 years ago)
- Last Synced: 2025-02-06T19:49:04.735Z (4 months ago)
- Topics: docker, docker-compose, kafka, spark, spark-kafka, spark-sql, spark-streaming
- Language: Scala
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ScalaSparkKafka
It just loads data to kafka through spark and reads it back.
This repository is partially based in those tutorials:
* [kafka on docker](https://medium.com/trainingcenter/apache-kafka-codifica%C3%A7%C3%A3o-na-pratica-9c6a4142a08f)
* [kafka and spark on azure](https://docs.microsoft.com/pt-br/azure/hdinsight/hdinsight-apache-kafka-spark-structured-streaming)
* [kafka and pyspark locally](https://github.com/PritomDas/Real-Time-Streaming-Data-Pipeline-and-Dashboard/blob/13e2f4e6cb19f61d82cff053aa63e572d5e55a29/datamaking_real_time_data_pipeline%20(PySpark)/real_time_streaming_data_pipeline.py)## Requirements
In order to execute this code, you are going to need:
* sbt
* Java 8+
* docker
* on Windows:
* setup winutil.exe and hadoop.dll, like [here](https://sparkbyexamples.com/spark/spark-hadoop-exception-in-thread-main-java-lang-unsatisfiedlinkerror-org-apache-hadoop-io-nativeio-nativeiowindows-access0ljava-lang-stringiz/).## Setting up services
```shell
docker-compose up -d #starting kafka
``````shell
docker-compose ps #checking if expected services are running
``````shell
docker-compose logs zookeeper | grep -i binding #check logs from zookeeper
docker-compose logs kafka | grep -i started #check logs from kafka
```## Test drive
```shell
# creating a new topic
docker-compose exec kafka kafka-topics --create --topic meu-topico-legal --partitions 1 --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181
``````shell
#checking topic existence
docker-compose exec kafka kafka-topics --describe --topic meu-topico-legal --zookeeper zookeeper:2181
``````shell
#Producing 100 messages
docker-compose exec kafka bash -c "seq 100 | kafka-console-producer --request-required-acks 1 --broker-list kafka:9092 --topic meu-topico-legal && echo 'Produced 100 messages.'"
``````shell
#Consuming 100 messages
docker-compose exec kafka kafka-console-consumer --bootstrap-server kafka:9092 --topic meu-topico-legal --from-beginning --max-messages 100
```## How to run
```shell
sbt run
```