{"id":22190898,"url":"https://github.com/yuhexiong/kafka-data-pipeline-flink-java","last_synced_at":"2026-04-28T20:36:10.706Z","repository":{"id":229103937,"uuid":"774208153","full_name":"yuhexiong/kafka-data-pipeline-flink-java","owner":"yuhexiong","description":"Data pipeline from Kafka to Kafka, Doris, MongoDB and Doris to Kafka using Flink Java.","archived":false,"fork":false,"pushed_at":"2024-11-25T03:23:21.000Z","size":72,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T01:14:35.363Z","etag":null,"topics":["datapipeline","doris","flink","java","jdbc","kafka","mongodb"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yuhexiong.png","metadata":{"files":{"readme":"README-CH.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-19T06:14:33.000Z","updated_at":"2024-12-05T06:05:10.000Z","dependencies_parsed_at":"2024-05-02T06:54:15.636Z","dependency_job_id":"64bc5837-83e9-4bca-ab80-fba0d477efff","html_url":"https://github.com/yuhexiong/kafka-data-pipeline-flink-java","commit_stats":null,"previous_names":["yuhexiong/kafka-datapipeline-flink-java","yuhexiong/kafka-data-pipeline-flink-java"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fkafka-data-pipeline-flink-java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fkafka-data-pipeline-flink-java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fkafka-data-pipeline-flink-java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yuhexiong%2Fkafka-data-pipeline-flink-java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yuhexiong","download_url":"https://codeload.github.com/yuhexiong/kafka-data-pipeline-flink-java/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245351757,"owners_count":20601087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datapipeline","doris","flink","java","jdbc","kafka","mongodb"],"created_at":"2024-12-02T12:13:17.469Z","updated_at":"2026-04-28T20:36:05.684Z","avatar_url":"https://github.com/yuhexiong.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kafka Data Pipeline Flink\n使用 Flink 寫的資料管道，用於將資料從 Kafka 傳輸到 Kafka、Doris 和 MongoDB，也支援合併兩個資料來源。  \n\n## Overview\n\n- 平台: JDK 11\n- 構建工具: Apache Maven v3.9.6\n- 資料處理框架: Flink v1.18.1\n\n\n## Run\n\n### Build Maven Project\n```\nmvn clean package\n```\n\n### Build Image\n```\ndocker compose build\n```\n\n### Run Docker Container\n\n編輯 `YourJavaClass` 成你想要跑的 Class  \n```\ndocker compose run --rm -e MY_CLASS=YourJavaClass myFlinkJob\n```\n\n\n\n## Entry\n\n### 1. KafkaToKafka\n\n將 Kafka (localhost:9092) 中的 `topic-source` 的所有 message 轉換至 Kafka (localhost:9092) 中的 `topic-sink`  \n\n\n### 2. KafkaRegexTopicsToKafka\n\n\n將 Kafka (localhost:9092) 中所有符合正則表達式 `^topicV.*` 的 Topic 備份到 Kafka (localhost:9093)、 Kafka (localhost:9094) 和 Kafka (localhost:9095) 中的相同 Topic。  \n\n### 3. KafkaToDorisByJDBCSink / KafkaToDorisByDorisSink\n\n將 Kafka (localhost:9092) 中的 `topic-sensor` 的 `data` 這個 array/list 拆解後轉入 Doris (localhost:9030) 資料庫 (database.sensor)  \n\n- Kafka Topic `topic-sensor` Message\n```json\n{\n    \"location\": \"Area A\",\n    \"timestamp\": \"2024-03-25T08:00:00\",\n    \"data\": [\n        {\n            \"sensorId\": \"sensor001\",\n            \"sensorType\": \"Temperature\",\n            \"value\": 25.5,\n            \"unit\": \"Celsius\"\n        },\n        {\n            \"sensorId\": \"sensor002\",\n            \"sensorType\": \"Humidity\",\n            \"value\": 60.2,\n            \"unit\": \"%\"\n        }\n    ]\n}\n```\n\n- Doris Table `database.sensor`\n```\n| id        | type          | location    | timestamp           | value | unit    |  \n|-----------|---------------|-------------|---------------------|-------|---------|  \n| sensor001 | Temperature   | Area A      | 2024-03-25T08:00:00 | 25.5  | Celsius |  \n| sensor002 | Humidity      | Area A      | 2024-03-25T08:00:00 | 60.2  | %       |  \n```\n\n### 4. DorisToKafka\n\n將 Doris (localhost:9030) 資料庫 `database.sensor` 的資料轉換成 `data` 名稱的 array/list 轉入 Kafka (localhost:9092) 的 `topic-sensor`  \n\n- Doris Table `database.sensor`\n```\n| id        | type          | location    | timestamp           | value | unit    |  \n|-----------|---------------|-------------|---------------------|-------|---------|  \n| sensor001 | Temperature   | Area A      | 2024-03-25T08:00:00 | 25.5  | Celsius |  \n| sensor002 | Humidity      | Area A      | 2024-03-25T08:00:00 | 60.2  | %       |  \n```\n\n- Kafka Topic `topic-sensor` Message\n```json\n{\n    \"location\": \"Area A\",\n    \"timestamp\": \"2024-03-25T08:00:00\",\n    \"data\": [\n        {\n            \"sensorId\": \"sensor001\",\n            \"sensorType\": \"Temperature\",\n            \"value\": 25.5,\n            \"unit\": \"Celsius\"\n        }\n    ]\n}\n```\n\n\n\n### 5. TwoKafkaToDoris\n\n將 Kafka (localhost:9092) 中的 `topic-sensor` 的 `data` 這個 array/list 拆解並結合 `topic-setting` 的 equipments 和 sensors 設定後 轉入 Doris (localhost:9030) 資料庫 `database.monitoring_data`  \n\n- Kafka Topic `topic-sensor` Message\n```json\n{\n    \"location\": \"Area A\",\n    \"timestamp\": \"2024-03-25T08:00:00\",\n    \"data\": [\n        {\n            \"sensorId\": \"sensor001\",\n            \"sensorType\": \"Temperature\",\n            \"value\": 25.5,\n            \"unit\": \"Celsius\"\n        },\n        {\n            \"sensorId\": \"sensor002\",\n            \"sensorType\": \"Humidity\",\n            \"value\": 60.2,\n            \"unit\": \"%\"\n        }\n    ]\n}\n```\n\n- Kafka Topic `topic-setting` Message\n```json\n{\n    \"equipments\": [\n        {\n            \"id\": \"equipment001\",\n            \"name\": \"機器1\",\n            \"location\": \"Area A\"\n        }\n    ],\n    \"sensors\": [\n        {\n            \"id\": \"sensor001\",\n            \"equipments\": [\"equipment001\", \"equipment002\"]\n        },\n        {\n            \"id\": \"sensor002\",\n            \"equipments\": [\"equipment001\", \"equipment003\"]\n        }\n    ]\n}\n```\n\n- Doris Table `database.monitoring_data`\n```\n| equipment_id  | sensor_id | sensor_type   | sensor_timestamp      | sensor_value | sensor_unit  |  \n|---------------|-----------|---------------|-----------------------|--------------|--------------|  \n| equipment001  | sensor001 | Temperature   | 2024-05-02T08:00:00   | 25.5         | Celsius      |  \n| equipment001  | sensor002 | Humidity      | 2024-05-02T08:00:00   | 60.2         | %            |  \n```\n\n### 6. KafkaToMongoDB\n\n將 Kafka (localhost:9092) 中 `topic` 的訊息轉換並存入 MongoDB (localhost:27017) 的 `database.collection`。  \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuhexiong%2Fkafka-data-pipeline-flink-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyuhexiong%2Fkafka-data-pipeline-flink-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyuhexiong%2Fkafka-data-pipeline-flink-java/lists"}