{"id":24175840,"url":"https://github.com/iht/beam-late-data","last_synced_at":"2026-06-09T17:31:41.292Z","repository":{"id":74375342,"uuid":"288523928","full_name":"iht/beam-late-data","owner":"iht","description":"A unit test for Apache Beam streaming, to check if your window would drop data, and how many times would the window be triggered","archived":false,"fork":false,"pushed_at":"2021-09-21T21:18:50.000Z","size":34,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-02T14:49:27.199Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iht.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-18T17:43:30.000Z","updated_at":"2023-12-04T16:21:18.000Z","dependencies_parsed_at":null,"dependency_job_id":"81d2d3de-864d-4f49-add5-fdd6d9ad1372","html_url":"https://github.com/iht/beam-late-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iht/beam-late-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iht%2Fbeam-late-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iht%2Fbeam-late-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iht%2Fbeam-late-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iht%2Fbeam-late-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iht","download_url":"https://codeload.github.com/iht/beam-late-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iht%2Fbeam-late-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34118751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-13T02:33:17.730Z","updated_at":"2026-06-09T17:31:41.276Z","avatar_url":"https://github.com/iht.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Understanding Exactly-Once Processing And Windowing In Streaming Pipelines (with Apache Beam)\n\nThis repo contains the code showcased in the Beam Summit talk \n_Understanding Exactly-Once Processing And Windowing In Streaming Pipelines_\n\nIn that talk, I explained how windowing works in streaming pipelines, and what are the decisions you have\nto make in order to do complex event processing in streaming with Apache Beam.\n\nWhenever we apply a window, there are always doubts about whether the window will drop data, or how many \ntimes (and when) will be the output triggered.\n\nThis repo contains a sample pipeline that uses unit testing to check if your window would drop data, \nand how many times would the window be trigered. You write your window in a function, and then use the unit\ntest to check the output of that pipeline. If the window drops data, the test will fail. In addition, you \nget a CSV output that you can examine to see how and when your window produced output.\n\n# Watch the talk\n\nWatch the video at the Beam Summit website, or at YouTube:\n* https://2020.beamsummit.org/sessions/understanding-exactly-once-processing/\n\n[Check also the slides and the notes of each slide](https://drive.google.com/file/d/1XOZ5EMSjVv1WwJe_X7kqhqSVm502pzfs/view?usp=sharing).\n\n# The tested pipeline\n\nThe pipeline processes 60 messages. 50 messages produced on time, and 10 messages that arrive after the\nwatermark (late data).\n\n* [We first add the 50 messages and advance the watermark](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/test/java/com/google/cloud/pso/LateDropOrNotTest.java#L117-L131)\n* [Then we add 10 messages with a timestamp older than the watermark, and we advance the watermark some seconds per message (to simulate some elapsed time)](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/test/java/com/google/cloud/pso/LateDropOrNotTest.java#L133-L142)\n* [We read the messages, and count them before applying the window](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/test/java/com/google/cloud/pso/LateDropOrNotTest.java#L149-L161)\n* [Then we apply the window, group, calculate a sum (and update another metric to count the aggregated messages), and generate a CSV](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/test/java/com/google/cloud/pso/LateDropOrNotTest.java#L163-L178)\n* [We can now check if the window dropped any message or not](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/test/java/com/google/cloud/pso/LateDropOrNotTest.java#L201-L211)\n\n# How to test your own window?\n\n\n## First: add your window\n\nAdd a new window to `src/main/java/com/google/cloud/pso/windows/SomeSampleWindow.java`.\n\nFor that, just add a new method with this signature:\n\n`public Window\u003cKV\u003cString, MyDummyEvent\u003e\u003e myCustomWindow()`\n\n(maybe with some input parameters if you want to use those in your window).\n\nSee [some examples of windows in that file](https://github.com/iht/beam-late-data/blob/df3b504e9c36e69dd60c22d66d9ff0efc9a849f3/src/main/java/com/google/cloud/pso/windows/SomeSampleWindow.java#L64-L110)\n\n## Second: apply your window\n\nTODO\n\n# Copyright\n\nCopyright 2020 Google LLC\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n * http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiht%2Fbeam-late-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiht%2Fbeam-late-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiht%2Fbeam-late-data/lists"}