Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scrapcodes/kafkaproducer
Benchmarks to measure latency using spark and kafka.
https://github.com/scrapcodes/kafkaproducer
benchmark kafka spark
Last synced: about 1 month ago
JSON representation
Benchmarks to measure latency using spark and kafka.
- Host: GitHub
- URL: https://github.com/scrapcodes/kafkaproducer
- Owner: ScrapCodes
- Created: 2016-12-16T07:53:08.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-01-12T04:54:25.000Z (about 8 years ago)
- Last Synced: 2024-11-09T06:39:04.631Z (3 months ago)
- Topics: benchmark, kafka, spark
- Language: Scala
- Size: 287 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Benchmark
To run the tests locally.
## Start kafka.
* Start two kafka broker locally.
* Create two topics `test` and `output`, with 10 partitions each.
## Start Spark.
* To run Spark locally do,
```
bin/spark-submit --class com.github.scrapcodes.kafka.SparkSQLKafkaConsumer --master local[20] --executor-memory 6G /path/Kafka-Producer-assembly.jar```
## Start Kafka Producer.
This should write to `test` topic, which spark job is subscribed to. And writes to `output` topic.
```
cd Kafka-Producer./sbt "run-main com.github.scrapcodes.kafka.LongRunningProducer test"
```
## Start Kafka Consumer.
This would read from the `output` topic and display statistics, by reading for 10 minutes.
```
./sbt "run-main com.github.scrapcodes.kafka.KafkaConsumerStatistics output /path/timeStats.txt 10```