{"id":24087950,"url":"https://github.com/tinybirdco/low-latency-engine","last_synced_at":"2026-03-05T13:02:01.253Z","repository":{"id":241142006,"uuid":"804267383","full_name":"tinybirdco/low-latency-engine","owner":"tinybirdco","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-06T09:01:25.000Z","size":24,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-27T05:24:40.199Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-22T09:16:13.000Z","updated_at":"2024-06-07T20:40:17.000Z","dependencies_parsed_at":"2024-05-22T17:01:55.724Z","dependency_job_id":"831d0055-17f5-4285-b762-2433a60b398f","html_url":"https://github.com/tinybirdco/low-latency-engine","commit_stats":null,"previous_names":["tinybirdco/low-latency-engine"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tinybirdco/low-latency-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Flow-latency-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Flow-latency-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Flow-latency-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Flow-latency-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/low-latency-engine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Flow-latency-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30127204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T12:40:50.676Z","status":"ssl_error","status_checked_at":"2026-03-05T12:39:32.209Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:25.363Z","updated_at":"2026-03-05T13:02:01.192Z","avatar_url":"https://github.com/tinybirdco.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Low Latency Engine\n\nThe Low Latency Engine (LLE) is a guide to help you build a real-time data processing pipeline using Tinybird. The guide is divided into two main sections:\n\n* The first section is focused on the problem statement.\n* The second section is focused on the data model.\n\n## 1. Problem Statement\n\n### Problem Statement\n\nYou are working for a hotel booking platform and you have been tasked with building a real-time data processing pipeline to detect long term discounts and at the same potential fraudsters. The platform generates many events such as searches and bookings per second, and you need to process these events and get a response in real time (less than a second). In doing so, you will be able to offer appropiate discounts to users who are likely to book a property for an extended period of time, and flag potential fraudsters in real-time. \n\n### Technical Challenges and potential solutions using Tinybird\n\nLow latency use cases could be challenging because it entails two main problems: \n\n* processing the data in real-time, and\n* providing a response in real-time.\n\nThe first problem consists of processing the data as soon as it arrives. ClickHouse is really good at ingesting data and transforming it at the same time. [It achieves this thanks to materialized views](https://www.tinybird.co/docs/guides/publish/master-materialized-views), which are precomputed tables that are updated in real-time as new data arrives. \n\nThe second problem consists of providing a response in real-time. Tinybird endpoints will make it possible to query the data in real-time but we would need to structure the data based on the queries we want to run. Each of [our use cases will tell us how to order and configure our engine to provide the best possible response time](https://www.tinybird.co/blog-posts/thinking-in-tinybird). The toolkit that we would use would be adjusting our data types, sorting keys, partitioning and index granularity.\n\n### Use cases\n\n#### Long term discounts\n\nYour platform wants to offer long term visit discounts to users who are likely to book a property for an extended period of time. So if an user that is searching for a booking at the same location fulfills at least 5 of the following conditions, he or she will be eligible for a long term discount:\n\n* the duration of the booking search (`end_date` - `start_date`) more than 2 months\n* the `price` is more than 300\n* country in `['FR', 'PT', 'IT', 'ES']`\n* `property_type` in `['house', 'apartment']`\n* `has_wifi` is True\n* `has_parking` is True\n* `are_pets_allowed` is True\n\nIn addition, the query time should be less than 700 ms. The discount should be offered to the user in real-time, before he or she completes the booking. Finally, the parameters in this query should be dynamic in part to allow our marketing team to adjust the conditions for the long term discounts.\n\n#### Fraud detection\n\nIf an user performs 3 transactions in less than 5 minutes (`event_type` is `booking`) and fulfills at least 3 of the following conditions, he or she will be flagged as a potential fraudster:\n\n* each transaction `price` is more than 300\n* 3 different `device` in less than an hour\n* 3 different `browser` in less than an hour\n* 3 different os in less than an hour\n* 3 different `user_location` in less than an hour\n* 3 different `card_id` in less than an hour\n\nIn addition, the query time should be less than 500 ms. The user should be flagged as a potential fraudster in real-time. \n\n## 2. Data Model\n\n### Ingestion\n\nAs stated above, in our booking platform we are generating many events per second. In real life, these events could being captured using Kafka or Tinybird's events API. We could mock the ingestion using Tinybird's Events API with [Mockingbird](https://mockingbird.tinybird.co/) based on [this json schema](/datasources/booking_events.json). \n\n### Data Source\n\n* `event_id` (Int32): Unique identifier for each event.\n* `event_time` (DateTime): Timestamp of the event.\n* `event_type` (String): Type of event (e.g., search, booking, cancellation).\n* `device` (String): Device used for the event (e.g., mobile, desktop).\n* `browser` (String): Browser used during the event.\n* `os` (String): Operating system used during the event.\n* `user_id` (Int32): Identifier for the user involved in the event.\n* `user_location` (String): Location of the user at the time of the event.\n* `booking_city` (String): City of the booking.\n* `booking_country` (String): Country of the booking.\n* `start_datetime` (Date): Check-in date for the booking.\n* `end_datetime` (Date): Check-out date for the booking.\n* `price` (Float64): Price per night of the booking.\n* `currency` (String): Currency used for the booking.\n* `product_id` (Int32): Identifier for the product.\n* `property_type` (String): Type of property (e.g., hotel, apartment).\n* `are_pets_allowed` (Int8): Indicates if pets are allowed (0 for no, 1 for yes).\n* `has_wifi` (Int8): Indicates if the property has Wi-Fi (0 for no, 1 for yes).\n* `has_parking` (Int8): Indicates if the property has parking (0 for no, 1 for yes).\n* `card_id` (Int32): Identifier for the card used for booking.\n* `card_issuer` (String): Issuer of the card used.\n\n### Data Preparation\n\nThere are some data preparation steps that will be common to both use cases:\n\n* converting dates instead of timestamps,\n* calculating the duration of the booking,\n* normalizing prices using just one currencty (e.g., USD),\n* removing unnecessary columns (e.g., `event_id`, `product_id`, `currency`, etc.),\n* applying a type modifier such as low cardinality to strings with fewer categories (e.g. `event_type`, `device`, `browser`, etc.).\n\n### Partitioning and index granularity\n\n* The data will be partitioned by `event_time` but reducing the partition size. Small partitions will be merged faster so queries will be faster. Notice that small partition sizes could break the ingestion due too many parts errors so we should be mindful of this issue and adjust the partition size accordingly. Partitioning keys can be set in [Tinybird's data source](https://www.tinybird.co/docs/version-control/datafiles#data-source) as follows:\n\n```\nENGINE_PARTITION_KEY \"toHour(event_time)\"\n```\n\n* The data source will be indexed by `user_id` and `event_type`. But for each use case we will use other indexes in every materialized view to adjust to the particular filtering conditions.\n\n* Finally, we could tweak our index granularity to reduce the amount of data you read. [ClickHouse by default sets 8192 rows per each granule](https://clickhouse.com/docs/en/optimize/skipping-indexes), so we could have less number of rows (e.g. 2048). We would need to play with the size since a very small size impacts the inserts and other kind of queries (e.g. range). Index granularity could be set as follows in [Tinybird's data source](https://www.tinybird.co/docs/version-control/datafiles#data-source):\n\n```\nSETTINGS \"index_granularity=2048\"\n```\n\n### Long term discounts\n\nIn order to precalculate the conditions for the long term discounts, we will create a materialized view that will filter the events that fulfill the conditions for the long term discounts:\n* first, we will prepare the conditions that need to be checked building a discount matrix,\n* then, we will filter the users that fulfill the discount value set by our analysts,\n* finally, we will get the unique customer ids to send them the discount.\n\n```sql\nTOKEN \"long_term_discount_endpoint_read\" READ\n\nNODE discount_matrix\nSQL \u003e\n\n    %\n    SELECT \n      user_id,\n      if(booking_duration \u003e= {{Int16(months, 2)}}*30, 1, 0) as duration_value,\n      if(price_in_usd \u003e {{Int16(usd, 300)}}, 1, 0) as price_value,\n      if(booking_country in {{Array(countries, 'String', default='FR,PT')}}, 1, 0) as countries_value,\n      if(property_type in {{Array(countries, 'String', default='house,apartment')}}, 1, 0) as property_value,\n      if(has_wifi = {{Int16(wifi_flag, 1)}}, 1, 0) as wifi_value,\n      if(has_parking = {{Int16(parking_flag, 1)}}, 1, 0) as parking_value,\n      if(are_pets_allowed = {{Int16(pets_flag, 1)}}, 1, 0) as pets_value\n    FROM\n      booking_events_mv\n    WHERE\n      event_type = 'search'\n      and event_time \u003e= toTimezone(now(), 'Europe/Berlin') - interval 10 second\n\nNODE filter_by_discount_index\nSQL \u003e\n\n    %\n    WITH (\n      duration_value+price_value+countries_value+property_value+wifi_value+parking_value+pets_value\n    ) as discount_index\n    SELECT\n      user_id,\n      discount_index\n    FROM discount_matrix\n    WHERE discount_index \u003e= {{Int16(discount, 5)}}\n    ORDER BY discount_index DESC\n\nNODE get_unique_customers\nSQL \u003e\n\n    SELECT DISTINCT user_id FROM filter_by_discount_index\n```\n\nAn example of an API endpoint call would be:\n\n```\nhttps://api.tinybird.co/v0/pipes/long_term_discount.json?discount=5\u0026months=2\u0026usd=300\u0026countries=house%2Capartment\u0026wifi_flag=1\u0026parking_flag=1\u0026pets_flag=1\n```\n\nAnd the results would be:\n\n```json\n{\n  \"meta\": [\n    {\n      \"name\": \"user_id\",\n      \"type\": \"Int32\"\n    }\n  ],\n  \"data\": [\n    {\n      \"user_id\": 189012\n    },\n    {\n      \"user_id\": 345678\n    },\n    {\n      \"user_id\": 201234\n    },\n    {\n      \"user_id\": 678901\n    }\n  ],\n  \"rows\": 4,\n  \"statistics\": {\n    \"elapsed\": 0.018713724,\n    \"rows_read\": 65,\n    \"bytes_read\": 2065\n  }\n}\n```\n\n18 ms! Not bad.\n\n### Fraud detection\n\nFor the fraud detection use case, we will build a non dynamic endpoint that will filter the users that fulfill the conditions for the potential fraudsters:\n* first, we will group the events by user and count the number of events for each user,\n* then, we will build a fraud detection matrix with the conditions that need to be checked,\n* finally, we will filter the users that fulfill the fraud detection value set by our analysts.\n\n\n```sql\nTOKEN \"fraud_detection_endpoint_read\" READ\n\nNODE booking_events_last_5_minutes\nSQL \u003e\n\n    SELECT\n        user_id,\n        event_time,\n        device,\n        browser,\n        os,\n        user_location,\n        card_id,\n        price_in_usd\n    FROM\n        booking_events_mv\n    WHERE\n        event_type = 'booking'\n    AND event_time \u003e toTimezone(now(), 'Europe/Berlin') - INTERVAL 5 MINUTE\n\nNODE group_and_count_by_user\nSQL \u003e\n\n    SELECT\n        user_id,\n        count() AS booking_count,\n        count(DISTINCT device) AS device_count,\n        count(DISTINCT browser) AS browser_count,\n        count(DISTINCT os) AS os_count,\n        count(DISTINCT user_location) AS user_location_count,\n        count(DISTINCT card_id) AS card_id_count,\n        sum(if(price_in_usd \u003e 300, 1, 0)) AS high_price_count\n    FROM\n        booking_events_last_5_minutes\n    GROUP BY\n        user_id\n    HAVING\n        booking_count \u003e= 5\n\nNODE fraud_detection_matrix\nSQL \u003e\n\n    SELECT\n        user_id,\n        if(device_count \u003e 3, 1, 0) as device_value,\n        if(browser_count \u003e 3, 1, 0) as browser_value,\n        if(os_count \u003e 3, 1, 0) as os_value,\n        if(user_location_count \u003e 3, 1, 0) as user_location_value,\n        if(card_id_count \u003e 3, 1, 0) as card_id_value,\n        if(high_price_count \u003e 3, 1, 0) as price_in_usd_value\n    FROM\n        group_and_count_by_user\n\nNODE filter_by_fraud_detection_index\nSQL \u003e\n\n    %\n    WITH (\n      device_value+browser_value+os_value+user_location_value+card_id_value+price_in_usd_value\n    ) as fraud_detection_index\n    SELECT\n      user_id,\n      fraud_detection_index\n    FROM fraud_detection_matrix\n    WHERE fraud_detection_index \u003e= 3\n    ORDER BY fraud_detection_index DESC\n\nNODE get_potential_fraudsters\nSQL \u003e\n\n    SELECT DISTINCT user_id FROM filter_by_fraud_detection_index\n```\n\nIn this use case because the conditions are not dynamic, we could have built a materialized view to precalculate the fraud detection matrix and filter the users that fulfill the conditions for the potential fraudsters. \n\nAn example of an API endpoint call would be:\n\n```\nhttps://api.tinybird.co/v0/pipes/fraud_detection.json\n```\n\nAnd the results would be:\n\n```json\n{\n \"meta\": [\n    {\n      \"name\": \"user_id\",\n      \"type\": \"Int32\"\n    }\n  ],\n  \"data\": [\n    {\n      \"user_id\": 234567\n    }\n  ],\n  \"rows\": 1,\n  \"rows_before_limit_at_least\": 1,\n  \"statistics\": {\n    \"elapsed\": 0.027963912,\n    \"rows_read\": 1035,\n    \"bytes_read\": 25875\n  }\n}\n```\n\nSame as the other use case, 28 ms! Not bad at all.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Flow-latency-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Flow-latency-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Flow-latency-engine/lists"}