Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/javadbahoosh/spark-streaming-multi-language-docker
Dockerized infrastructure and boilerplate code for consuming Kafka topics with Spark Streaming in Scala, Python, and Java, featuring Redis integration for result aggregation.
https://github.com/javadbahoosh/spark-streaming-multi-language-docker
docker kafka spark
Last synced: 16 days ago
JSON representation
Dockerized infrastructure and boilerplate code for consuming Kafka topics with Spark Streaming in Scala, Python, and Java, featuring Redis integration for result aggregation.
- Host: GitHub
- URL: https://github.com/javadbahoosh/spark-streaming-multi-language-docker
- Owner: JavadBahoosh
- Created: 2024-11-22T17:54:06.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-23T13:17:18.000Z (3 months ago)
- Last Synced: 2025-01-30T20:54:15.276Z (16 days ago)
- Topics: docker, kafka, spark
- Language: Dockerfile
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multi-Language Spark Streaming with Kafka and Redis: A Comparative Boilerplate
This project aims to create the required Docker infrastructure for consuming data from a Kafka topic using
Spark Streaming in different languages (Scala, Python, and Java). Additionally, it includes boilerplate code
for implementing Spark Streaming consumers in these languages.
## Prerequisites- Docker
- redis-cli## Setup
1. Clone the repository.
2. Build the Docker images: `docker compose build`
3. Run the project: `./run.sh`## Monitoring
The `run.sh` script monitors Redis keys every 5 seconds:
- Scala: `scala_total_messages`, `scala_total_sum`
- Python: `python_total_messages`, `python_total_sum`
- Java: `java_total_messages`, `java_total_sum`