{"id":19464188,"url":"https://github.com/hpgrahsl/wearedevs-2018","last_synced_at":"2025-10-20T04:22:45.383Z","repository":{"id":152889387,"uuid":"133664160","full_name":"hpgrahsl/wearedevs-2018","owner":"hpgrahsl","description":"Code for my talk \"Stateful \u0026 Reactive Streaming Applications Without a Database\" at WeAreDevelopers 2018","archived":false,"fork":false,"pushed_at":"2018-05-20T16:02:00.000Z","size":1465,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-25T08:43:48.734Z","etag":null,"topics":["emojis","java","kafka","kafka-connect","kafka-streams","reactive","reactive-programming","spring-boot-2","stream-processing","tweets"],"latest_commit_sha":null,"homepage":"https://www.wearedevelopers.com/","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hpgrahsl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-05-16T12:43:21.000Z","updated_at":"2023-02-25T16:44:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"62fbb46f-ff3d-4ae3-9c88-91bb42bb57b7","html_url":"https://github.com/hpgrahsl/wearedevs-2018","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hpgrahsl/wearedevs-2018","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpgrahsl%2Fwearedevs-2018","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpgrahsl%2Fwearedevs-2018/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpgrahsl%2Fwearedevs-2018/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpgrahsl%2Fwearedevs-2018/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hpgrahsl","download_url":"https://codeload.github.com/hpgrahsl/wearedevs-2018/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hpgrahsl%2Fwearedevs-2018/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266353001,"owners_count":23915870,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-21T11:47:31.412Z","response_time":64,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["emojis","java","kafka","kafka-connect","kafka-streams","reactive","reactive-programming","spring-boot-2","stream-processing","tweets"],"created_at":"2024-11-10T18:13:47.262Z","updated_at":"2025-10-20T04:22:40.336Z","avatar_url":"https://github.com/hpgrahsl.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Near Real-Time Emoji Tracking\n\n![meme2](docs/images/meme2.png)\n\n## Overview\nThis repository contains a working example of how to build a modern data-centric application to track occurrences of emojis in near real-time, which are found in publicly available tweets. It uses the following main technologies:\n\n- data ingestion: [Apache Kafka Connect](https://kafka.apache.org/documentation/#connect)\n- persistence: [Apacha Kafka](https://kafka.apache.org)\n- stream processing: [Apacha Kafka Streams](https://kafka.apache.org/documentation/streams/)\n- RPC integration layer \u0026 reactive WebAPI: [Spring Boot 2.0](https://projects.spring.io/spring-boot/)\n\nThe accompanying slide deck for my talk about **Stateful \u0026 Reactive Streaming Applications Without a Database** at [WeAreDevelopers 2018](https://www.wearedevelopers.com/) can be found on [SpeakerDeck](https://speakerdeck.com/hpgrahsl/stateful-and-reactive-streaming-applications-without-a-database)\n\n![meme1](docs/images/meme1.png)\n\n## Usage example:\n\nThe following paragraphs give a detailed step-by-step explanation to setup and run the application on your local machine.\n\n#### 1 Launch your Kafka environment:\nThe example application needs a fully working Kafka environment, ideally on your local machine. If you are into containers and know how to use Docker feel free to make use of pre-built Docker images for Apache Kafka of your choice (e.g. the ones provided by [Confluent](https://hub.docker.com/r/confluentinc/)). For simplicity reasons, it is probably a good idea to launch all Kafka realted processes based on a [convenient CLI](https://docs.confluent.io/current/cli/index.html) that ships with the Open Source version of [Confluent's Platform](https://www.confluent.io/download/) - currently version 4.1.0.\n\nChange to your installation folder (e.g. /usr/local/confluent-4.1.0/) and run \n\n```bash\nbin/confluent start\n```\n\nThis should successfully launch all Kafka related processes, namely _zookeeper, kafka, schema-registry, kafka-rest and connect_ and may take a few moments before resulting in an **[UP] status** for each of them:\n\n```bash\nStarting zookeeper\nzookeeper is [UP]\nStarting kafka\nkafka is [UP]\nStarting schema-registry\nschema-registry is [UP]\nStarting kafka-rest\nkafka-rest is [UP]\nStarting connect\nconnect is [UP]\nStarting ksql-server\nksql-server is [UP]\n```\n\n_In case you are facing any issues while bringing up the Confluent Platform read through their amazing documentation which hopefully helps you getting fixed any issues :)_\n\n#### 2 Create a Kafka topic to store tweets:\n\nBefore being able to ingest live tweets a Kafka topic needs to be created. This can be easily achieved with the command line tools that ship with Kafka. The following command creates a topic called **live-tweets** with _4 partitions_ and a _replication factor of 1._\n\n```bash\nbin/kafka-topics --zookeeper localhost:2181 --topic live-tweets --create --replication-factor 1 --partitions 4\n```\n\n#### 3 Run a twitter source connector to harvest public live tweets:\n\nThere is a [plethora of Kafka connectors](https://www.confluent.io/product/connectors/) available in order to read data from a variety of sources and write data to different sinks. This application uses a [twitter source connector](https://github.com/jcustenborder/kafka-connect-twitter) from the community. In order to make this connector available in your local installation you have to copy a folder containing the build artefacts or a [pre-built version](https://github.com/jcustenborder/kafka-connect-twitter/releases/tag/0.2.26) together with its dependencies to a specific folder in your Confluent Platform installation. After unzipping the connector artefact copy the contained folder \n\n```bash\nkafka-connect-twitter-0.2.26/usr/share/kafka-connect/kafka-connect-twitter\n```\nto \n\n```bash\n/usr/local/confluent-4.1.0/share/java/\n```\n\nIn order for kafka connect to detect the availability of this newly installed connector simply restart the _connect_ process with the CLI by first running\n\n```bash\nbin/confluent stop connect\n```\n\nfollowed by \n\n```bash\nbin/confluent start connect\n```\n\nNow the twitter source connector is ready to use. It can be easily configured and managed by means of the [Kafka connect REST API](https://docs.confluent.io/current/connect/restapi.html). First check if the connector is indeed available by sending the following GET request e.g. using CURL, Postman or some other tools:\n\n```bash\ncurl http://localhost:8083/connector-plugins\n```\n\nThis should result in JSON array containing all _connectors_ currently available to your Kafka connect installation. Somewhere along the lines you should see the twitter source connector:\n\n```json\n[\n  ...,\n  {\n    \"class\": \"com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector\",\n    \"type\": \"source\",\n    \"version\": \"0.2.26\"\n  }\n  ...,\n]\n```\n\n We can run the connector to track a subset of live tweets related to a few key words of our choice (see **filter.keywords** entry below) based on the following JSON configuration. Simply insert your _OAuth tokens/secrets_ which you get by creating a Twitter application in your account. This must be created first in order to get access to the Twitter API. Send the JSON configuration as a POST request to the endpoint e.g. using CURL or Postman:\n\n```json\n{ \"name\": \"twitter_source_01\",\n  \"config\": {\n    \"connector.class\": \"com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector\",\n    \"twitter.oauth.accessToken\": \"...\",\n    \"twitter.oauth.consumerSecret\": \"...\",\n    \"twitter.oauth.consumerKey\": \"...\",\n    \"twitter.oauth.accessTokenSecret\": \"...\",\n\t\"kafka.status.topic\": \"live-tweets\",\n\t\"process.deletes\": false,\n\t\"value.converter\": \"org.apache.kafka.connect.json.JsonConverter\",\n\t\"value.converter.schemas.enable\": false, \n    \"key.converter\": \"org.apache.kafka.connect.json.JsonConverter\",\n    \"key.converter.schemas.enable\": false,\n    \"filter.keywords\": \"money,bitcoin,cryptocurrency,blockchain,ethereum,shitcoin,bitcoinbubble\"\n    }\n}\n```\n\nThis should result in a _HTTP status 201 created_ response.\n\n#### 4 Check data ingestion\nBy means of the Kafka command line tools it's easy to check if tweets are flowing into the topic. Running the following in your Confluent Platform folder\n\n```bash\nbin/kafka-console-consumer --bootstrap-server localhost:9092 --topic live-tweets --from-beginning\n```\n\nshould consume all the tweets in the **live-tweets** topic and write them directly to _stdout_ as they come in. The JSON structure of the tweets based on the source connector is pretty verbose. The example application deserialzes only the following 4 fields while actually only making use of the Text field in order to extract any emojis during the stream processing:\n\n```json\n{\n    \"CreatedAt\": 1515751050000,\n    \"Id\": 951755003055820800,\n    \"Text\": \"Google details how it protected services like Gmail from Spectre https://t.co/jyuEixDaQq #metabloks\",\n    \"Lang\": \"en\"\n}\n```\n\n#### 5 Launch Spring Boot 2.0 emoji tracker\nEverything is setup now to start the stream processing application. Just build the maven project by running:\n\n```bash\nmvn clean package\n```\n\nthen run the application from the command line using:\n\n```bash\njava -jar -Dserver.port=8881 -Dkstreams.tweetsTopic=live-tweets target/kafka-streams-emojitracker-0.5-SNAPSHOT.jar\n```\n\n#### 6 Interactively query the kstreams application state stores\nAfter the application successfully started you can perform REST calls against it to query for current emoji counts:\n\n![meme3](docs/images/meme3.png)\n\n##### query for all emojis tracked so far:\n\n```bash\ncurl -X GET http://localhost:8881/interactive/queries/emojis/\n```\n\nThe result is in no particular order and might look like the following based on a sample run:\n\n```json\n[\n    ...,\n    {\n        \"emoji\": \"🐾\",\n        \"count\": 4\n    },\n    {\n        \"emoji\": \"👇\",\n        \"count\": 113\n    },\n    {\n        \"emoji\": \"👉\",\n        \"count\": 16\n    },\n    {\n        \"emoji\": \"💀\",\n        \"count\": 29\n    },\n    {\n        \"emoji\": \"💋\",\n        \"count\": 1\n    },\n    {\n        \"emoji\": \"💖\",\n        \"count\": 1\n    },\n    {\n        \"emoji\": \"💥\",\n        \"count\": 2\n    },\n    ...\n]\n```\n\n_NOTE: the numbers you get will obviously vary!_\n\n##### query for a specific emoji tracked so far:\nWhen using CURL you need to specify the emoji by means of its URL escape code. Thus, it's more convenient to query with Postman or your browser as this allow to directly put the emoji into the URL then.\n\nhttp://localhost:8881/interactive/queries/emojis/👇\n\n```bash\ncurl -X GET http://localhost:8881/interactive/queries/emojis/%F0%9F%91%87 \n```\n\n{\n    \"emoji\": \"👇\",\n    \"count\": 113\n}\n\n_NOTE: the numbers you get will obviously vary!_\n\n##### query for the top N emojis tracked so far:\n\n```bash\ncurl -X GET http://localhost:8881/interactive/queries/emojis/stats/topN\n```\n\n```json\n[\n    {\n        \"emoji\": \"👇\",\n        \"count\": 113\n    },\n    {\n        \"emoji\": \"😭\",\n        \"count\": 100\n    },\n    {\n        \"emoji\": \"➡\",\n        \"count\": 81\n    },\n    {\n        \"emoji\": \"✨\",\n        \"count\": 80\n    },\n    {\n        \"emoji\": \"⚡\",\n        \"count\": 79\n    },\n    {\n        \"emoji\": \"🌎\",\n        \"count\": 77\n    },\n    {\n        \"emoji\": \"😂\",\n        \"count\": 64\n    },\n    {\n        \"emoji\": \"💀\",\n        \"count\": 29\n    },\n    {\n        \"emoji\": \"❤\",\n        \"count\": 21\n    },\n    {\n        \"emoji\": \"🔥\",\n        \"count\": 17\n    },\n    ...\n]\n```\n_NOTE: the numbers you get will obviously vary!_\n\n\n##### SSE change stream of emoji count updates\n\n![meme5](docs/images/meme5.png)\n\nClient applications can subscribe to a reactive change stream of emoji count updates while the kstreams applications is processing new data. This results in SSE being continuously streamed towards clients in order to consume them with your JS framework of choice and build a nice HTML dashboard.\n\n```bash\ncurl -X GET http://localhost:8881/interactive/queries/emojis/updates/notify\n```\n\n```json\n\n...\n\ndata: {\"emoji\": \"🌎\",\"count\": 77}\n\ndata: {\"emoji\": \"💀\",\"count\": 29}\n\ndata: {\"emoji\": \"😂\",\"count\": 64}\n\ndata: {\"emoji\": \"👇\",\"count\": 113}\n\ndata: {\"emoji\": \"🔥\",\"count\": 17}\n\n...\n\n```\n\n#### 7 Optional: Run multiple instance of the kstreams application\n\n![meme4](docs/images/meme4.png)\n\nIn case you want to run multiple instances to experiment with scalability and fault-tolerance of kstreams just launch the application multiple times. **Beware to use different _server.port_ and _live.demo.instance.id_ settings for each further instance**\n\ne.g. start a 2nd instance like so:\n\n```bash\njava -jar -Dserver.port=8882 -Dlive.demo.instance.id=2 -Dkstreams.tweetsTopic=live-tweets target/kafka-streams-emojitracker-0.5-SNAPSHOT.jar\n```\n\nNow you can query any of the two instances to get the emoji count results!\n\n### Have fun tracking emojis in near real-time based on public tweets... 😊\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhpgrahsl%2Fwearedevs-2018","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhpgrahsl%2Fwearedevs-2018","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhpgrahsl%2Fwearedevs-2018/lists"}