{"id":15208880,"url":"https://github.com/hieuung/streaming-kafka","last_synced_at":"2026-02-27T05:37:16.242Z","repository":{"id":235707868,"uuid":"791088381","full_name":"hieuung/Streaming-Kafka","owner":"hieuung","description":"Using various data processing tool for real time data pipeline with Kafka","archived":false,"fork":false,"pushed_at":"2024-04-30T08:41:15.000Z","size":4908,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-17T04:45:55.367Z","etag":null,"topics":["apache-beam","apache-flink","apache-spark","kafka","kafka-consumer","kafka-producer","spark-streaming","spark-streaming-kafka"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hieuung.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-24T04:30:55.000Z","updated_at":"2024-04-30T08:41:18.000Z","dependencies_parsed_at":"2024-04-24T10:27:18.682Z","dependency_job_id":"c1a4de3b-e518-44cf-b5f5-7e9121895c43","html_url":"https://github.com/hieuung/Streaming-Kafka","commit_stats":{"total_commits":8,"total_committers":1,"mean_commits":8.0,"dds":0.0,"last_synced_commit":"1c76d8f05c3462c02728ca65313317c87afe67c3"},"previous_names":["hieuung/spark-streaming-kafka"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hieuung%2FStreaming-Kafka","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hieuung%2FStreaming-Kafka/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hieuung%2FStreaming-Kafka/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hieuung%2FStreaming-Kafka/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hieuung","download_url":"https://codeload.github.com/hieuung/Streaming-Kafka/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242276892,"owners_count":20101528,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","apache-flink","apache-spark","kafka","kafka-consumer","kafka-producer","spark-streaming","spark-streaming-kafka"],"created_at":"2024-09-28T07:02:59.182Z","updated_at":"2026-02-27T05:37:16.168Z","avatar_url":"https://github.com/hieuung.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# STREAMING ITERGRATION WITH KAFKA\n\n## Tech stack\n- Kafka, Kafka-ui\n- Apache Spark\n- Apache Flink\n- Apache Beam\n- Docker-compose\n\n## Implementation\n### Setup Kafka.\n```sh\ndocker-compose up -d kafka zookeeper kafka-ui\n```\nVerify Kafka cluster on [Kafka-ui](http://localhost:8080)\n\n### Publish message.\n```sh\ndocker-compose up -d producer\n```\nVerify message on [Kafka-ui](http://localhost:8080)\n\n### STREAMING WITH APACHE SPARK. \n#### Setup long-running spark-cluster.\n```sh\ndocker-compose up -d spark-master spark-worker\n```\n---\n\u003e **_NOTE:_**  If the docker image build is too slow (cause by slow downloading), you should download the following file\n```sh\nexport SPARK_VERSION=3.0.2\nexport HADOOP_VERSION=3.2\nexport SPARK_HOME=/opt/spark\nhttps://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz\n```\n---\n\n#### Verify Spark cluster on [Spark-ui](http://localhost:9090)\n\n#### Access master node to start the streaming job. Transform and Publish to Source (Kafka)\n\n```sh\ndocker exec -it spark-master /bin/sh\n```\n\n#### Submit job\n```sh\n/opt/spark/bin/spark-submit --master spark://spark-master:7077 \\ --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.2 \\ /opt/ spark-apps/spark-consumer.py\n```\n\n#### Verify spark job on [Spark-ui](http://localhost:9090), verify message on [Kafka-ui](http://localhost:8080)\n\n### STREAMING WITH APACHE FLINK. \n\n#### Setup long-running Flink-cluster.\n##### Download essential `external_jars` \n```\n    flink-sql-connector-kafka-3.1.0-1.18.jar\n```\n##### Build pyflink docker image\n```sh\ndocker build --tag pyflink:latest ./pyflink\n```\n##### Start Flink cluster\n```sh\ndocker-compose up -d taskmanager jobmanager\n```\n---\n\n##### Verify Flink cluster on [Flink-ui](http://localhost:8081)\n\n#### Access taskmanager node to start the streaming job.\n```sh\ndocker exec -it streaming-kakfa_jobmanager_1 /bin/sh\n```\n\n#### Submit job\n```sh\n./bin/flink run -py ./app/test-stream.py \\ \n--jarfile ./app/external_jars/flink-sql-connector-kafka-3.1.0-1.18.jar\n```\n\n##### Verify Flink job on [Flink-ui](http://localhost:8081), verify message on [Kafka-ui](http://localhost:8080)\n\n### STREAMING WITH APACHE BEAM.\n\n#### Setup long-running Flink-cluster (or Spark-cluster).\n\n#### Verify Flink cluster on [Flink-ui](http://localhost:8081)\n\n#### Submit Beam job to cluster\n```sh\npython3 beam-consumer.py --runner FlinkRunner \\\n                        --bootstrap_servers localhost:9092 \\\n                        --topics hieuung \\\n                        --flink_master localhost:8081 \\\n```\n\n#### Verify Flink job on [Flink-ui](http://localhost:8081)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhieuung%2Fstreaming-kafka","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhieuung%2Fstreaming-kafka","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhieuung%2Fstreaming-kafka/lists"}