{"id":19373739,"url":"https://github.com/waelson/stream-processing-with-ksql","last_synced_at":"2026-05-09T02:31:46.895Z","repository":{"id":120617227,"uuid":"315944000","full_name":"Waelson/Stream-Processing-with-KSQL","owner":"Waelson","description":"This project show how to use KSQL (Streaming SQL Engine for Apache Kafka) to stream processing.","archived":false,"fork":false,"pushed_at":"2020-11-27T01:29:35.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-06T00:56:10.695Z","etag":null,"topics":["docker","docker-compose","docker-container","kafka","kafka-streams","ksql","ksql-server"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Waelson.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-11-25T13:14:06.000Z","updated_at":"2020-11-27T01:29:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"9e7eeda3-c920-4c14-8954-6605adf585a2","html_url":"https://github.com/Waelson/Stream-Processing-with-KSQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Waelson/Stream-Processing-with-KSQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Waelson%2FStream-Processing-with-KSQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Waelson%2FStream-Processing-with-KSQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Waelson%2FStream-Processing-with-KSQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Waelson%2FStream-Processing-with-KSQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Waelson","download_url":"https://codeload.github.com/Waelson/Stream-Processing-with-KSQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Waelson%2FStream-Processing-with-KSQL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32804900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T08:22:46.396Z","status":"online","status_checked_at":"2026-05-09T02:00:06.633Z","response_time":123,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","docker-container","kafka","kafka-streams","ksql","ksql-server"],"created_at":"2024-11-10T08:31:24.153Z","updated_at":"2026-05-09T02:31:46.878Z","avatar_url":"https://github.com/Waelson.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stream Processing using KSQL\n\nThis project show how to use KSQL (Streaming SQL Engine for Apache Kafka) to stream processing.\n\n## Enviroment\n\nFor you to use this repository you will need the following softwares:\n\n- [Python](https://www.python.org/downloads/)\n- [Pip](https://pip.pypa.io/en/stable/installing/)\n- [Docker](https://docs.docker.com/engine/install/)\n- [Docker Compose](https://docs.docker.com/engine/install/)\n- Zookeeper\n- Kafka\n- KSQL Server\n\nHowever, only Docker and Docker Compose need is installed in your machine. All Kafka ecosystem will be embedded via docker images.\n\n## Steps\n\n1. Install Python and Pip\n2. Install Docker and Docker Compose\n3. Load Images\n4. Create Topics\n5. Start Simulator\n\n### 1 - Install Python and Pip\n\nThe installation process of the Python and Pip is very easy. So this tutorial dont't will cover this steps. I recommend you look for more information in [www.python.org](https://www.python.org/downloads/) and [pip.pypa.io](https://pip.pypa.io/en/stable/installing/).\n\nAfter you install Python and Pip run the command below to install all dependencies need to execute \u003ccode\u003eclick_simulator.py\u003c/code\u003e application. This code is responsible to simulate the click events into an web page. It will generate unbounded click events, sending a flow continuous messages to a Kafka topic.\n\n```bash\npip install -r requirements.txt\n```\n\n### 2 - Install Docker and Docker Compose\n\nThis tutorial does not demonstrate the installation process for Docker and Docker Compose. I strongly recommend you to visit the Docker installation link for more informations. [Please click here](https://docs.docker.com/engine/install/).\n\n### 3 - Loading Images\n\n```bash\ndocker-compose up\n```\n\nor\n\n```bash\ndocker-compose up -d\n```\n\nThe last command allow you to run \u003ccode\u003edocker-compose\u003c/code\u003e in the background.\n\n### 4 - Create Topics\n\n```bash\ndocker-compose exec kafka kafka-topics --create --topic com.mywebsite.streams.pages --bootstrap-server localhost:9092\n```\n\n```bash\ndocker-compose exec kafka kafka-topics --create --topic com.mywebsite.streams.clickevents --bootstrap-server localhost:9092\n```\n\n### 5 - Start Simulator\n\n```bash\npython click_simulator.py\n```\n\nIf you executed all steps correctly. You will see an image similar that below.\n\n```bash\nStarting application\nMessage: {\"email\": \"anoble@yahoo.com\", \"timestamp\": \"1986-03-10T16:38:40\", \"uri\": \"https://mitchell.info/login.php\", \"number\": 358}\nMessage: {\"email\": \"leonardpatrick@mason-clark.info\", \"timestamp\": \"1971-04-25T10:09:26\", \"uri\": \"https://www.bailey.com/search/about/\", \"number\": 431}\nMessage: {\"email\": \"morriskatie@villarreal-villa.biz\", \"timestamp\": \"1996-11-22T00:12:20\", \"uri\": \"http://www.woodard.info/terms.php\", \"number\": 838}\nMessage: {\"email\": \"kenneth79@rogers.info\", \"timestamp\": \"2005-10-24T22:16:59\", \"uri\": \"http://www.king.com/wp-content/blog/blog/index/\", \"number\": 793}\nMessage: {\"email\": \"wbailey@wu-martinez.net\", \"timestamp\": \"1995-06-20T12:44:44\", \"uri\": \"https://www.smith-neal.com/categories/login/\", \"number\": 509}\nMessage: {\"email\": \"tkennedy@hall-wolfe.org\", \"timestamp\": \"2009-01-27T14:04:20\", \"uri\": \"https://www.marshall-holmes.info/\", \"number\": 336}\nMessage: {\"email\": \"steven15@yahoo.com\", \"timestamp\": \"2019-12-13T16:09:11\", \"uri\": \"https://www.sims.net/main.html\", \"number\": 263}\nMessage: {\"email\": \"hobbsmario@hotmail.com\", \"timestamp\": \"1990-08-16T05:09:04\", \"uri\": \"http://www.smith.com/search/tags/explore/about.jsp\", \"number\": 61}\n...\n```\n\n## Connecting to KSQL Server\n\n```bash\ndocker-compose exec ksql ksql http://localhost:8088\n```\n\nAfter you connect to KSQL Server you will see the image below:\n\n```bash\n                  ===========================================\n                  =        _  __ _____  ____  _             =\n                  =       | |/ // ____|/ __ \\| |            =\n                  =       | ' /| (___ | |  | | |            =\n                  =       |  \u003c  \\___ \\| |  | | |            =\n                  =       | . \\ ____) | |__| | |____        =\n                  =       |_|\\_\\_____/ \\___\\_\\______|       =\n                  =                                         =\n                  =  Streaming SQL Engine for Apache Kafka® =\n                  ===========================================\n\nCopyright 2017-2019 Confluent Inc.\n\nCLI v5.4.1, Server v5.4.1 located at http://localhost:8088\n\nHaving trouble? Type 'help' (case-insensitive) for a rundown of how things work!\n\nksql\u003e\n```\n\n#### Some Commands\n\nShow all topics\n\n```bash\nksql\u003e SHOW TOPICS;\n\n Kafka Topic                       | Partitions | Partition Replicas\n---------------------------------------------------------------------\n com.mywebsite.streams.clickevents | 5          | 1\n com.mywebsite.streams.pages       | 1          | 1\n---------------------------------------------------------------------\n```\n\nShow all streams\n\n```bash\nksql\u003e SHOW STREAMS;\n\n Stream Name | Kafka Topic                     | Format\n--------------------------------------------------------\n CLICKEVENTS | com.mywebsite.streams.clickevents | JSON\n--------------------------------------------------------\n```\n\n### Creating a Stream\n\nIf you need run it in the background mode.\n\n```SQL\nCREATE STREAM clickevents\n  (email VARCHAR,\n  timestamp VARCHAR,\n  uri VARCHAR,\n  number INTEGER)\nWITH (KAFKA_TOPIC='com.mywebsite.streams.clickevents',\n  VALUE_FORMAT='JSON');\n```\n\n### Creating a Table\n\n```SQL\nCREATE TABLE pages\n  (uri VARCHAR,\n   description VARCHAR,\n   created VARCHAR)\n  WITH (KAFKA_TOPIC='com.mywebsite.streams.pages',\n        VALUE_FORMAT='JSON',\n        KEY='uri');\n```\n\n### Creating a Table from a Query\n\n```SQL\nCREATE TABLE a_pages AS\n  SELECT * FROM pages WHERE uri LIKE 'http://www.a%';\n```\n\n### Querying a Table or Stream\n\n```SQL\nSELECT * FROM clickevents EMIT CHANGES;\n```\n\n### Describing a Table and Stream\n\n```bash\nksql\u003e DESCRIBE PAGES;\n\nName                 : PAGES\n Field       | Type\n-----------------------------------------\n ROWTIME     | BIGINT           (system)\n ROWKEY      | VARCHAR(STRING)  (system)\n URI         | VARCHAR(STRING)\n DESCRIPTION | VARCHAR(STRING)\n CREATED     | VARCHAR(STRING)\n-----------------------------------------\nFor runtime statistics and query details run: DESCRIBE EXTENDED \u003cStream,Table\u003e;\n```\n\n### Managing Offsets\n\nLike all Kafka Consumers, KSQL by default begins consumption \u003ci\u003eat the latest offset\u003c/i\u003e. This can be a problem for some scenarios. In the following example we're going to create a pages table -- but -- we want \u003ci\u003eall\u003c/i\u003e the data available to us in this table. In other words, we want KSQL to start from the earliest offset. To do this, we will use the \u003ccode\u003eSET\u003c/code\u003e command to set the configuration variabl \u003ccode\u003eauto.offset.reset\u003c/code\u003e for our session -- and before we run any commands.\n\n```bash\nSET 'auto.offset.reset' = 'earliest';\n```\n\nAlso note that this can be set at the KSQL server level, if you'd like.\nOnce you're done querying or creating tables or streams with this value, you can set it back to its original setting by simply running:\n\n```bash\nUNSET 'auto.offset.reset';\n```\n\n### Scalar Functions\n\nKSQL Provides a number of [Scalar functions for us to make use of](https://docs.confluent.io/current/ksql/docs/developer-guide/syntax-reference.html#scalar-functions).\n\nLets write a function that takes advantage of some of these features:\n\n```bash\nSELECT UCASE(SUBSTRING(uri, 12))\n  FROM clickevents\n  WHERE number \u003e 100\n    AND uri LIKE 'http://www.k%' EMIT CHANGES;\n```\n\nNotice that as soon as you hit CTRL+C your query ends\n\n### Deleting a Table\n\nAs with Streams, we must first find the running underlying query, and then drop the table.\nFirst, find your query:\n\n```bash\nksql\u003e SHOW QUERIES;\n\n Query ID                | Kafka Topic      | Query String\n----------------------------------------------------------------------------------------------\n  CTAS_A_PAGES_1      | A_PAGES      | CREATE TABLE a_pages AS\n    SELECT * FROM pages WHERE uri LIKE 'http://www.a%';\n----------------------------------------------------------------------------------------------\nFor detailed information on a Query run: EXPLAIN \u003cQuery ID\u003e;\n```\n\nFind your query, which in this case is CTAS_A_PAGES_1\nand then, finally, TERMINATE the query and DROP the table:\n\n```bash\nTERMINATE QUERY CTAS_A_PAGES_1;\nDROP TABLE A_PAGES;\n```\n\n### Windowing\n\n#### Hopping and Tumbling Windows\n\nIn this demonstration we'll see how to create Tables with windowing enabled.\n\n#### Tumbling Windows\n\nLet's create a tumbling clickevents table, where the window size is 30 seconds.\n\n```SQL\nCREATE STREAM clickevents_tumbling AS\n  SELECT * FROM clickevents\n  WINDOW TUMBLING (SIZE 30 SECONDS);\n```\n\n#### Hopping Windows\n\nNow we can create a Table with a hopping window of 30 seconds with 5 second increments.\n\n```SQL\nCREATE TABLE clickevents_hopping AS\n  SELECT uri FROM clickevents\n  WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 5 SECONDS)\n  WHERE uri LIKE 'http://www.b%'\n  GROUP BY uri;\n```\n\nThe above window is 30 seconds long and advances by 5 second. If you query the table you will see\nthe associated window times!\n\n#### Session Windows\n\nFinally, lets see how session windows work. We're going to define the session as 5 minutes in\norder to group many events to the same window\n\n```SQL\nCREATE TABLE clickevents_session AS\n  SELECT uri FROM clickevents\n  WINDOW SESSION (5 MINUTES)\n  WHERE uri LIKE 'http://www.b%'\n  GROUP BY uri;\n```\n\n## Kafka CLI Basic Commands\n\n### Creating a topic:\n\n```bash\ndocker-compose exec kafka kafka-topics --create --topic \u003ctopic-name\u003e --bootstrap-server localhost:9092\n```\n\n### Writing a topic:\n\n```bash\ndocker-compose exec kafka kafka-console-producer --topic \u003ctopic-name\u003e --bootstrap-server localhost:9092\n```\n\nYou must type on console the press key enter.\n\n### Reading a topic:\n\n```bash\ndocker-compose exec kafka kafka-console-consumer.sh --topic \u003ctopic-name\u003e --from-beginning --bootstrap-server localhost:9092\n```\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaelson%2Fstream-processing-with-ksql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwaelson%2Fstream-processing-with-ksql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaelson%2Fstream-processing-with-ksql/lists"}