Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/josephc45/spark_microservice_for_typeahead_application
Spark service ingesting Kafka stream from Typeahead Application that aggregates words on occurrences and sinks to HBase.
https://github.com/josephc45/spark_microservice_for_typeahead_application
docker docker-compose hbase java jenkins junit5 kafka-consumer mockito spark-streaming spring-boot
Last synced: about 2 months ago
JSON representation
Spark service ingesting Kafka stream from Typeahead Application that aggregates words on occurrences and sinks to HBase.
- Host: GitHub
- URL: https://github.com/josephc45/spark_microservice_for_typeahead_application
- Owner: josephC45
- Created: 2024-10-31T01:18:35.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-15T02:14:39.000Z (3 months ago)
- Last Synced: 2024-11-15T03:19:48.823Z (3 months ago)
- Topics: docker, docker-compose, hbase, java, jenkins, junit5, kafka-consumer, mockito, spark-streaming, spring-boot
- Language: Java
- Homepage:
- Size: 19.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark Microservice
The Spark Microservice consumes messages from the Typeahead Service via Kafka, processes the incoming text data in real time, and aggregates word occurrences.
These aggregates are stored in HBase for fast, scalable access. This setup enables efficient handling of typeahead suggestions and analytics at scale.## Architecture Overview
Typeahead Service: Produces messages (words) to a Kafka topic.
Kafka Broker: Acts as the messaging layer, delivering messages from the Typeahead Service to the Spark Microservice.
Spark Microservice: Consumes messages from Kafka, processes them to aggregate word occurrences, and stores the aggregated data in HBase.
HBase: A distributed, scalable NoSQL database used for storing and retrieving the aggregated word occurrences efficiently.## Features
- Real-time Processing: Consumes data in real time from the Kafka topic, enabling up-to-date analysis of user input.
- Scalable Aggregation: Aggregates word occurrences using Apache Spark, with results stored in HBase for efficient querying.
- HBase Integration: The word frequency data is stored in HBase, providing a fast and scalable solution for reading and updating word counts.
- Spark for Data Processing: Uses Apache Spark for processing large-scale data, performing operations such as filtering, grouping, and aggregation.## License
This project uses the following open-source libraries:
- **Spring Kafka**: Licensed under the [Apache License 2.0](https://github.com/spring-projects/spring-kafka/blob/main/LICENSE).
- **Apache Spark**: Licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).