https://github.com/abhirockzz/kafka-streams-example
Kafka Streams based microservice
https://github.com/abhirockzz/kafka-streams-example
jax-rs jersey kafka kafka-streams
Last synced: 6 months ago
JSON representation
Kafka Streams based microservice
- Host: GitHub
- URL: https://github.com/abhirockzz/kafka-streams-example
- Owner: abhirockzz
- License: apache-2.0
- Created: 2017-02-28T13:27:46.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-12T09:37:19.000Z (over 8 years ago)
- Last Synced: 2025-03-24T00:24:19.963Z (7 months ago)
- Topics: jax-rs, jersey, kafka, kafka-streams
- Language: Java
- Homepage:
- Size: 20.5 KB
- Stars: 25
- Watchers: 5
- Forks: 16
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
This is an example of a [Kafka Streams](https://kafka.apache.org/documentation/streams) based microservice (packaged in form of an Uber JAR). The scenario is simple
## Basics
- A producer application continuously emits CPU usage metrics into a Kafka topic (cpu-metrics-topic)
- The consumer is a Kafka Streams application which uses the Processor (low level) Kafka Streams API to calculate the [Cumulative Moving Average](https://en.wikipedia.org/wiki/Moving_average#Cumulative_moving_average) of the CPU metrics of each machine
- Consumers can be horizontally scaled - the processing work is distributed amongst many nodes and the process is elastic and flexible thanks to Kafka Streams (and the fact that it leverages Kafka for fault tolerance etc.)
- Each instance has its own (local) state for the calculated average. A custom REST API has been (using Jersey JAX-RS implementation) to tap into this state and provide a unified view of the entire system (moving averages of CPU usage of all machines)### Setup
This project has two modules
- [Producer](https://github.com/abhirockzz/kafka-streams-example/tree/master/kafka-producer) application
- [Consumer](https://github.com/abhirockzz/kafka-streams-example/tree/master/kstreams-consumer) application (uses Kafka Streams)## To try things out...
- Start Kafka broker. Configure `num.partitions` in Kafka broker `server.properties` file to 5 (to experiment with this application)
- Build producer & consumer application - browse to the respective directory and execute `mvn clean install`
- Trigger producer application - `java -jar kafka-cpu-metrics-producer.jar`. It will start emitting records to Kafka
- Start one instance of consumer application - `java -jar kafka-cpu-metrics-consumer-1.0.jar -DKAFKA_CLUSTER=` (defaults to `localhost:9092` if not provided). Note the auto-selected port. It will start calculating the moving average of machine CPU metrics
- Access the metrics on this instance - `http://localhost:/metrics`
- Start another instance of consumer application - `java -jar kafka-cpu-metrics-consumer-1.0.jar -DKAFKA_CLUSTER=` (note the auto-selected port), wait for a few seconds - the load will now be distributed amongst the two instances. Access the metrics `http://localhost:/metrics` - you will see the metrics (JSON/XML payload) for all the machines as well as the instance on which the Cumulative Moving Average has been calculatedsample output - https://gist.github.com/abhirockzz/48e89873ae23c93d0a5cc721c87cc536
- You can also search for metrics for a specific machine ID `http://localhost:/metrics/`
sample output - https://gist.github.com/abhirockzz/2ca2297fbc9aec269d31707f61b4c45e
You can keep increasing the number of instances such that they are less than or equal to the number of partitions of your Kafka topic
> Having more instances than number of partitions is not going to have any effect on parallelism and that instance will be inactive until any of the existing instance is stopped