{"id":30287086,"url":"https://github.com/luscasjb/data-enrichment-processor","last_synced_at":"2026-04-28T23:35:11.139Z","repository":{"id":308365209,"uuid":"1032484343","full_name":"luscasjb/data-enrichment-processor","owner":"luscasjb","description":"A real-time data enrichment microservice built with Java \u0026 Spring Boot. It consumes Debezium CDC events from a Kafka topic, enriches the data and produces a consolidated message to a downstream Kafka topic.","archived":false,"fork":false,"pushed_at":"2025-08-05T13:52:23.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-05T15:35:08.886Z","etag":null,"topics":["java","kafka","mysql","postgresql","spring"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luscasjb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-05T11:24:33.000Z","updated_at":"2025-08-05T13:53:39.000Z","dependencies_parsed_at":"2025-08-05T15:48:40.587Z","dependency_job_id":null,"html_url":"https://github.com/luscasjb/data-enrichment-processor","commit_stats":null,"previous_names":["luscasjb/data-enrichment-processor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/luscasjb/data-enrichment-processor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luscasjb%2Fdata-enrichment-processor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luscasjb%2Fdata-enrichment-processor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luscasjb%2Fdata-enrichment-processor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luscasjb%2Fdata-enrichment-processor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luscasjb","download_url":"https://codeload.github.com/luscasjb/data-enrichment-processor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luscasjb%2Fdata-enrichment-processor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270775808,"owners_count":24642961,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","kafka","mysql","postgresql","spring"],"created_at":"2025-08-16T21:38:04.008Z","updated_at":"2026-04-28T23:35:11.070Z","avatar_url":"https://github.com/luscasjb.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Enrichment Streaming Project\n\n## Summary\n\nThis project represents a **key component within a larger message processing flow**. Acting as a real-time data enrichment service (streaming), its primary purpose is to intercept a specific stage of this pipeline to consume events, cross-reference and validate this information with data from multiple databases, and finally, produce a consolidated message for the **next phase of the flow** in a new Apache Kafka topic.\n\nThe service is designed to be resilient and follows the Single Responsibility Principle, decoupling the responsibilities of consumption, processing, data access, and message production, ensuring its maintainability and scalability within the overall architecture.\n\n## Data Flow and Architecture\n\nThe processing flow of a message follows these steps:\n\n1.  **Event Consumption (CDC)**: The service subscribes to the Kafka topic `mysql.local.listener`, which receives events generated by Debezium from changes in the `movements` table of a MySQL database.\n\n2.  **Relevance Validation**: For each consumed event, the service verifies if it corresponds to the **latest and most recent state** of a `request_id`. Events that do not represent the final status (e.g., arriving out of order) are discarded to ensure consistency.\n\n3.  **Situation-Based Routing Logic**: Based on the movement's situation, the flow is directed:\n    * **Situations `PD` (Pending) or `IP` (In Progress)**: Trigger the enrichment process.\n    * **Situations `CP` (Completed) or `CN` (Cancelled)**: Trigger the sending of a \"Tombstone\" message (with a null value) to the final topic. This is a Kafka pattern to signal the removal or invalidation of the key (`request_id`).\n\n4.  **Enrichment Process**:\n    * The service queries the `people_requests` table (MySQL) to obtain a `person_id` from the `request_id`.\n    * With the `person_id`, it queries the `people` table in a **PostgreSQL** database to obtain the person's `name`.\n    * The situation code (e.g., `PD`) is mapped to a readable description (e.g., `PENDING`).\n\n5.  **Final Message Production**:\n    * A new message, in the `EnrichedData` format, is constructed containing the original and enriched data (`request_id`, `person_id`, `person_name`, `status_description`).\n    * This message is serialized and sent to the Kafka topic `final_enriched_data`, using the `request_id` as the message key to ensure correct partitioning.\n\n## Main Components\n\nThe application is divided into the following components:\n\n* **`listener`**: Solely responsible for consuming messages from the source topic and delegating processing.\n* **`service`**: Orchestrates the business flow, applying validation and routing rules.\n* **`repository`**: Abstraction layer for data access, with specific implementations for MySQL and PostgreSQL databases.\n* **`mapper`**: Performs data transformations, such as converting situation codes to their descriptions.\n* **`producer`**: Encapsulates the logic for building and sending messages (both enriched and \"tombstones\") to the destination Kafka topic.\n\n## Technologies Used\n\n* Java \u0026 Spring Boot\n* Apache Kafka (Consumer and Producer)\n* Debezium (for Change Data Capture)\n* MySQL (as a data source)\n* PostgreSQL (as a data source for enrichment)\n* JDBC Template","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluscasjb%2Fdata-enrichment-processor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluscasjb%2Fdata-enrichment-processor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluscasjb%2Fdata-enrichment-processor/lists"}