Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/snexus/streaming-playground
Exploring streaming design patterns with Kafka and Spark Structural Streaming
https://github.com/snexus/streaming-playground
kafka kafka-producer python spark spark-streaming
Last synced: 29 days ago
JSON representation
Exploring streaming design patterns with Kafka and Spark Structural Streaming
- Host: GitHub
- URL: https://github.com/snexus/streaming-playground
- Owner: snexus
- Created: 2023-01-10T12:33:11.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-01-15T09:44:37.000Z (about 2 years ago)
- Last Synced: 2024-11-22T19:33:12.602Z (3 months ago)
- Topics: kafka, kafka-producer, python, spark, spark-streaming
- Language: Python
- Homepage:
- Size: 43.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# streaming-playground
Additional links
* https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/json_producer.py# Prerequisites
## Install Confluent Kafka
Follow https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html#install-cp-using-zip-and-tar-archives
## Install Apache Spark
https://computingforgeeks.com/how-to-install-apache-spark-on-ubuntu-debian/
1) Edit .zshrc
```bash
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
```2) Start master and worker processes
```bash
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/start-slave.sh spark://ubuntu:7077
```3) When finished, shutdown the master and slave
```bash
$SPARK_HOME/sbin/stop-slave.sh
$SPARK_HOME/sbin/stop-master.sh
```## Starting Kafka and Schema Registry
Run in separate terminals, from confluent folder:
1) Start ZooKeeper
```bash
bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
```
2) Start Kafka.```bash
bin/kafka-server-start ./etc/kafka/server.properties
```3) Start Schema Registry
```bash
bin/schema-registry-start ./etc/schema-registry/schema-registry.properties
```## Create test topic
```bash
cd producer
python ./kafka_create_topic.py -t sensor_events
```# Generating events
This example sends 15 events with an average rate of 100 events / minute. Rate's distribution is Poisson.
```bash
cd producer
python ./kafka_event_generator.py -t sensor_events -n 15 -r 100
```# Showing events
Using Kafka's console consumer:
```bash
bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic sensor_events --from-beginning
```# Showing Schemas
```bash
curl -X GET http://localhost:8081/subjects
curl -X DELETE http://localhost:8081/subjects/sensor_events_avro-value```
# Processing events using Spark Structural Streaming
## Consuming and parsing JSON messages
```bash
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.12:3.3.1,org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,org.apache.spark:spark-avro_2.12:3.3.1 ./spark_stream_kafka_json.py
```## Consuming and parsing AVRO messages
```bash
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.12:3.3.1,org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,org.apache.spark:spark-avro_2.12:3.3.1 ./spark_stream_kafka_avro.py```