https://github.com/josephc45/spark_microservice_for_typeahead_application

Spark service ingesting Kafka stream from Typeahead Application that aggregates words on occurrences and sinks to HBase.
https://github.com/josephc45/spark_microservice_for_typeahead_application

docker docker-compose hbase java jenkins junit5 kafka-consumer mockito spark-streaming spring-boot

Last synced: 3 months ago
JSON representation

Spark service ingesting Kafka stream from Typeahead Application that aggregates words on occurrences and sinks to HBase.

Host: GitHub
URL: https://github.com/josephc45/spark_microservice_for_typeahead_application
Owner: josephC45
Created: 2024-10-31T01:18:35.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-02-21T23:42:47.000Z (5 months ago)
Last Synced: 2025-04-06T21:47:24.003Z (3 months ago)
Topics: docker, docker-compose, hbase, java, jenkins, junit5, kafka-consumer, mockito, spark-streaming, spring-boot
Language: Java
Homepage:
Size: 43 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Spark Microservice
The Spark Microservice consumes messages from the Typeahead Service via Kafka, processes the incoming text data in real time, and aggregates word occurrences.
These aggregates are stored in HBase for fast, scalable access. This setup enables efficient handling of typeahead suggestions and analytics at scale.

## Architecture Overview
Typeahead Service: Produces messages (words) to a Kafka topic.
Kafka Broker: Acts as the messaging layer, delivering messages from the Typeahead Service to the Spark Microservice.
Spark Microservice: Consumes messages from Kafka, processes them to aggregate word occurrences, and stores the aggregated data in HBase.
HBase: A distributed, scalable NoSQL database used for storing and retrieving the aggregated word occurrences efficiently.

## Features
- Real-time Processing: Consumes data in real time from the Kafka topic, enabling up-to-date analysis of user input.
- Scalable Aggregation: Aggregates word occurrences using Apache Spark, with results stored in HBase for efficient querying.
- HBase Integration: The word frequency data is stored in HBase, providing a fast and scalable solution for reading and updating word counts.
- Spark for Data Processing: Uses Apache Spark for processing large-scale data, performing operations such as filtering, grouping, and aggregation.

## License

This project uses the following open-source libraries:
- **Spring Kafka**: Licensed under the [Apache License 2.0](https://github.com/spring-projects/spring-kafka/blob/main/LICENSE).
- **Apache Spark**: Licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/josephc45/spark_microservice_for_typeahead_application

Awesome Lists containing this project

README