{"id":16352252,"url":"https://github.com/bugthesystem/scream-processing","last_synced_at":"2025-04-12T20:09:07.111Z","repository":{"id":149223179,"uuid":"94526597","full_name":"bugthesystem/scream-processing","owner":"bugthesystem","description":" Playground for Apache Kafka, Apache Flink (CEP \u0026 ML), Elasticsearch, Kibana in Scala /w Testing practices","archived":false,"fork":false,"pushed_at":"2017-07-31T12:29:12.000Z","size":13,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-12T20:08:43.939Z","etag":null,"topics":["cep","elasticsearch","flink","flink-cep","flink-ml","kafka","kibana","scala"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bugthesystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-16T09:08:17.000Z","updated_at":"2023-08-17T18:32:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"d8c564b7-446e-4855-84b5-9e81c6dd4fb5","html_url":"https://github.com/bugthesystem/scream-processing","commit_stats":null,"previous_names":["bugthesystem/scream-processing"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fscream-processing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fscream-processing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fscream-processing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fscream-processing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bugthesystem","download_url":"https://codeload.github.com/bugthesystem/scream-processing/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625493,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cep","elasticsearch","flink","flink-cep","flink-ml","kafka","kibana","scala"],"created_at":"2024-10-11T01:25:27.982Z","updated_at":"2025-04-12T20:09:07.072Z","avatar_url":"https://github.com/bugthesystem.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scream processing (Patoz)\nPlayground for Apache Kafka, Apache Flink (CEP, ML) Elasticsearch and Kibana in Scala\n\n```sh\n _ _ _ _ _ _ _ _                _ _ _ _           _ _ _ _ _ _ _         _ _ _ _ _ _ \n/         Akka   \\             /        \\        /               \\     |x          |\n|   Vert.x       |             | Flink  |    _ _ | Elasticsearch | --- |  Kibana   |\n|                |             |        |   /    \\ _ _ _ _ _ _ _ /     |_ _ _ _ _ _|\n|       Node.js  | -- Kafka ---|   Job  | /       _ _ _ _ _ _ _   \n|   Spring Boot  |             | Job    | \\      /              \\      _ _ _ _ _ _  \n|      .NET Core |             |   Job  |   \\ _ _|    Kafka     | -- / Other apps  \\\n\\_ _ _ _ _ _ __ /              \\ _ _ _ _/        \\ _ _ _ _ _ _ _/    \\ _ _ _ _ _ _ /\n\n```\n\n# Contents\n - [Tech / Tools](#tech--tools)\n - [Env Setup (Docker Compose)](#env-setup-docker-compose)\n - [Env Setup (Kubernetes)](#env-setup-kubernetes)\n\n# Tech / Tools\n- [Scala 2.11.7](https://www.scala-lang.org/)\n- [Sbt 0.13](http://www.scala-sbt.org/)\n- [Kafka 0.10](https://kafka.apache.org/)\n- [Flink 1.3.1](https://flink.apache.org/)\n  - [FlinkCEP - Complex event processing for Flink](https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/cep.html)\n  - [FlinkML - FlinkML - Machine Learning for Flink](https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/ml/index.html)\n- [Elasticsearch 5.5.0](https://www.elastic.co/products/elasticsearch)\n- [Kibana 5.5.0](https://www.elastic.co/products/kibana)\n- [Docker](https://www.docker.com/)\n- [Kubernetes 1.7](https://kubernetes.io/)\n- [Jenkins Pipelines](https://jenkins.io/doc/book/pipeline/)\n\n# Env Setup (Local)\n## Install Kafka\n```sh\n# TODO: convert to `docker-compose`\nwget http://mirror.netinch.com/pub/apache/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz\n\ntar -xzf kafka_2.11-0.10.2.0.tgz\n\ncd kafka_2.11-0.10.2.0\n\n#  Start Zookeeper\nbin/zookeeper-server-start.sh config/zookeeper.properties\n\n# Start Kafka server\nbin/kafka-server-start.sh config/server.properties\n\n# Create topic\nbin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic scream-processing\n```\n\n## Install Flink\n```sh\nwget http://mirror.netinch.com/pub/apache/flink/flink-1.3.0/flink-1.3.1-bin-hadoop27-scala_2.11.tgz\n\ntar xzf flink-*.tgz   # Unpack the downloaded archive\ncd flink-1.3.1\n\n# Start Flink\n./bin/start-local.sh\n\n#OR\ndocker pull flink\n\ndocker run -t -p 8081:8081 flink local\n```\n\n## Install ES/Kibana\n:warning: [Flink 1.3.0] `Flink Elasticsearch connector` for `Elasticsearch 5` is missing in `Maven` repository atm.\n\n```sh\n#Elasticsearch 5.5.0\ndocker run --name scream-processing-elasticsearch -p 9200:9200 -p 9300:9300 \\\n           -e \"http.host=0.0.0.0\" -e \"transport.host=127.0.0.1\" -d elasticsearch:5.5.0\n\n# Kibana 5.5.0\ndocker run --name scream-processing-kibana --link scream-processing-elasticsearch:elasticsearch -p 5601:5601 -d kibana:5.5.0\n```\n\n# Env Setup (Kubernetes)\n\n## Flink\n```sh\n# clone docker-flink/examples repo\ngit clone git@github.com:docker-flink/examples.git\n \ncd docker-flink\n \n# Build the Helm archive:\nhelm package helm/flink/\n\n# Create namespace for `flink`\nkubectl create ns flink\n\n# Deploy a non-HA Flink cluster with a single taskmanager:\nhelm install --name scream-processing-flink  --set global.namespace=flink flink*.tgz\n \n# Deploy a non-HA Flink cluster with three taskmanagers:\nhelm install --name scream-processing-flink --set flink.num_taskmanagers=3 --set global.namespace=flink flink*.tgz\n \n# Deploy an HA Flink cluster with three taskmanagers:\ncat \u003e values.yaml \u003c\u003cEOF\nflink:\n  num_taskmanagers: 3\n  highavailability:\n    enabled: true\n    zookeeper_quorum: \u003czookeeper quorum string\u003e\n    state_s3_bucket: \u003cs3 bucket\u003e\n    aws_access_key_id: \u003caws access key\u003e\n    aws_secret_access_key: \u003caws secret key\u003e\nEOF\n \n# use modified values.yaml\nhelm install --name scream-processing-flink --set global.namespace=flink --values values.yaml flink*.tgz\n```\n\n## Kafka\n```sh\n# Add helm repo\nhelm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator\n \n# Create namespace for `kafka`\nkubectl create ns kafka\n \n# Installs\nhelm install --name scream-processing-kafka --set global.namespace=kafka incubator/kafka\n \n# Delete\nhelm delete scream-processing-kafka\n```\n\n# Project Setup\n\n## From stratch /w sbt\n```sh\n# Run following script ot clone starter repo\nbash \u003c(curl https://flink.apache.org/q/sbt-quickstart.sh)\n```\n## Using boilerplate\n```sh\n# TODO:\n```\n\n## Project structure and dependencies\n```sh\n# TODO: boilerplate project will be shared\n```\n\n### Testing\n_:warning: Kafka as datasource and Elasticsearch for output in my case_\n\n- [x] Test Base using `LocalFlinkMiniCluster`\n\n#### Unit testing\n- [x] Mocking Data Source from collection with  timestamp assigner\n- [x] Mocking Sink to store data and get back when processing completed\n\n#### Integration Testing\n\n- EmbeddedKafka\n- Embedded Elasticsearch (Test helpers will be provided)\n\n### CI\n**Jenkinsfile will be shared soon**\n\n- [x] Build Job\n- [x] Run tests (using embedded kafka and embedded elasticsearch in my case)\n- [x] Filter running jobs to get `current job id` using `Flink REST API`\n```sh\ncurl http://localhost:8081 | ./jq '.jobs[] | select(.name | startswith(\"Awesome Job\")) | .jid'\n```\n- [x] Cancel job with savepoint\n- [x] Upload new `{Your Job name-version}.jar`\n- [x] Run newly uploaded job by starting from previously saved savepoint\n\n### Warnings\n - If you want to upload fat-jars and if you get `413` (Entity Too Large) add following annotations to your ingress.\n \n ```yaml\nkind: Ingress\nmetadata:\n  annotations:\n    ingress.kubernetes.io/proxy-body-size: \u003cyour max size\u003em\n    nginx.org/client-max-body-size: \u003cyour max size\u003em\n ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugthesystem%2Fscream-processing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbugthesystem%2Fscream-processing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugthesystem%2Fscream-processing/lists"}