{"id":22213931,"url":"https://github.com/xmlking/streaming-playground","last_synced_at":"2026-03-19T22:07:35.725Z","repository":{"id":262720157,"uuid":"887635404","full_name":"xmlking/streaming-playground","owner":"xmlking","description":"Testing arroyo with bufstream","archived":false,"fork":false,"pushed_at":"2024-12-17T16:20:31.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-30T05:43:15.722Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xmlking.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-13T02:39:48.000Z","updated_at":"2024-12-17T16:20:35.000Z","dependencies_parsed_at":"2024-12-13T06:19:33.375Z","dependency_job_id":"75073d7a-80fa-402c-a7c8-ea808937cb6d","html_url":"https://github.com/xmlking/streaming-playground","commit_stats":null,"previous_names":["xmlking/arroyo-experiments","xmlking/streaming-playground"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmlking%2Fstreaming-playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmlking%2Fstreaming-playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmlking%2Fstreaming-playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmlking%2Fstreaming-playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xmlking","download_url":"https://codeload.github.com/xmlking/streaming-playground/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245409480,"owners_count":20610531,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-02T21:12:39.302Z","updated_at":"2026-01-05T20:30:53.695Z","avatar_url":"https://github.com/xmlking.png","language":null,"readme":"# Streaming Adventures\n\nExperiment with:\n- [x] [Arroyo](https://www.arroyo.dev/)\n- [x] [Redpanda Connect](https://www.redpanda.com/connect), \n- [x] [Bufstream](https://buf.build/product/bufstream), [demo](https://github.com/bufbuild/bufstream-demo), demo with [iceberg](https://github.com/bufbuild/buf-examples/tree/main/bufstream/iceberg-quickstart)\n- [x] [SQLFlow](https://sql-flow.com/docs/introduction/basics)\n- [ ] [Timeplus](https://docs.timeplus.com/proton-howto)\n- [ ] [RisingWave](https://risingwave.com/overview/)\n- [ ] [Tableflow](https://www.confluent.io/product/tableflow/)\n\n## Prerequisites\n\nInstall rpk CLI to use as kafka CLI\n\n```shell\nbrew install redpanda-data/tap/redpanda\n# add zsh completions\nrpk generate shell-completion zsh \u003e \"${fpath[1]}/_rpk\"\n```\n\nCreate rpk **profile** to connect to **local** redpanda kafka cluster\n\n```shell\nrpk profile create local \\\n-s brokers=localhost:19092 \\\n-s registry.hosts=localhost:8081 \\\n-s admin.hosts=localhost:9644\n```\n\nInstall psql CLI for Mac\n\n```shell\nbrew install libpq\n# Finally, symlink psql (and other libpq tools) into /usr/local/bin:\nbrew link --force libpq\n# to connect to local database\npsql \"postgresql://postgres:postgres@localhost/postgres?sslmode=require\"\n```\n\n## Start\n\nFirst time setup\n\n```aiignore\n# pull docker images to local\ndocker compose --profile optional pull\n```\n\n```shell\ndocker compose up\n# docker compose --profile optional up\ndocker compose ps\nopen http://localhost:5115/ # Arroyo Console\nopen http://localhost:8080/ # Redpanda Console\nopen http://localhost:8081/subjects # Redpanda Registry\ndocker compose down\n# (DANGER) - shutdown and delete volumes\ndocker compose down -v\n```\n\n**Benthos** example\n\n```shell\n# to start with benthos\ndocker compose up connect\ndocker compose down\n# (DANGER) - shutdown and delete volumes\ndocker compose down -v\n```\n\nThis will start:\n\n1. Postgres Database\n2. Kafka - [Redpanda](https://www.redpanda.com/) or [Bufstream](https://buf.build/product/bufstream)\n3. [Redpanda Console](https://www.redpanda.com/redpanda-console-kafka-ui)\n4. [Redpanda Connect](https://www.redpanda.com/connect) (optional)\n5. [MinIO](https://min.io/) (optional)\n6. [ClickHouse](https://clickhouse.com/) (optional)\n7. [Arroyo](https://www.arroyo.dev/)\n\n## Config\n\n### Kafka\n\nAdd a new topics\n\n\u003e [!TIP]\n\u003e You can also use [Redpanda Console](http://localhost:8080/overview) to create topics.\n\n```shell\nrpk topic list\nrpk topic create -r 1 -p 1 customer-source\nrpk topic create -r 1 -p 1 customer-sink\n```\n\n### Arroyo Pipeline\n\nin from [Arroyo Console](http://localhost:5115/), Create a pipeline with:\n\n\u003e [!WARNING]\n\u003e By default preview doesn't write to sinks to avoid accidentally writing bad data.\n\u003e You can run the pipeline for real by clicking \"Launch\" or you can enable web sinks in preview:\n\n```sql\nCREATE TABLE customer_source (\n    name TEXT,\n    age INT,\n    phone TEXT\n) WITH (\n    connector = 'kafka',\n    format = 'json',\n    type = 'source',\n    bootstrap_servers = 'redpanda:9092',\n    topic = 'customer-source'\n);\n\nCREATE TABLE customer_sink (\n    count BIGINT,\n    age INT\n) WITH (\n    connector = 'kafka',\n    format = 'json',\n    type = 'sink',\n    bootstrap_servers = 'redpanda:9092',\n    topic = 'customer-sink'\n);\n\nSELECT count(*),  age\nFROM customer_source\nGROUP BY age, hop(interval '2 seconds', interval '10 seconds');\n\nINSERT INTO customer_sink SELECT count(*),  age\nFROM customer_source\nGROUP BY age, hop(interval '2 seconds', interval '10 seconds');\n```\n\n## Test\n\npublish a couple of messages to `customer-source` topic using **Redpanda Console** e.g:\n\n\u003e [!IMPORTANT]  \n\u003e Use TYPE: **JSON**\n\n```json\n{\n    \"name\": \"sumo\",\n    \"age\": 70,\n    \"phone\": \"111-222-4444\"\n}\n```\n\nCheck any new messages in `customer-sink` topic.\n\n## Examples\n\nTry more [examples](https://github.com/ArroyoSystems/arroyo/tree/master/crates/arroyo-planner/src/test/queries) in Arroyo Console: http://localhost:5115/\n\nbasic_tumble_aggregate\n\n```shell\nCREATE TABLE nexmark WITH (\n    connector = 'nexmark',\n    event_rate = 10\n);\n\nSELECT\n    bid.auction as auction,\n    tumble(INTERVAL '1' second) as window,\n    count(*) as count\nFROM\n    nexmark\nwhere\n    bid is not null\nGROUP BY\n    1,\n    2\n```\n\nbitcoin_exchange_rate\n\n```shell\nCREATE TABLE coinbase (\n  type TEXT,\n  price TEXT\n) WITH (\n  connector = 'websocket',\n  endpoint = 'wss://ws-feed.exchange.coinbase.com',\n  subscription_message = '{\n      \"type\": \"subscribe\",\n      \"product_ids\": [\n        \"BTC-USD\"\n      ],\n      \"channels\": [\"ticker\"]\n    }',\n      format = 'json'\n);\n\nSELECT avg(CAST(price as FLOAT)) from coinbase\nWHERE type = 'ticker'\nGROUP BY hop(interval '5' second, interval '1 minute');\n```\n\nfirst_pipeline\n\n```shell\nCREATE TABLE nexmark with (\n    connector = 'nexmark',\n    event_rate = '100'\n);\n\n \n\n-- SELECT * from nexmark where auction is not null;\n\nSELECT * FROM (\n    SELECT *, ROW_NUMBER() OVER (\n        PARTITION BY window\n        ORDER BY count DESC) AS row_num\n    FROM (SELECT count(*) AS count, bid.auction AS auction,\n        hop(interval '2 seconds', interval '60 seconds') AS window\n            FROM nexmark WHERE bid is not null\n            GROUP BY 2, window)) WHERE row_num \u003c= 5;\n```\n\ncreate_table_updating\n\n```shell\nCREATE TABLE nexmark with (\n    connector = 'nexmark',\n    event_rate = '100'\n);\n\nCREATE TABLE bids (\n    auction    BIGINT,\n    bidder     BIGINT,\n    channel    VARCHAR,\n    url        VARCHAR,\n    datetime   DATETIME,\n    avg_price  BIGINT\n) WITH (\n    connector = 'filesystem',\n    type = 'sink',\n    path = '/home/data',\n    format = 'parquet',\n    parquet_compression = 'zstd',\n    rollover_seconds = 60,\n    time_partition_pattern = '%Y/%m/%d/%H',\n    partition_fields = 'bidder'\n);\n\n-- SELECT bid from nexmark where bid is not null;\n\nINSERT INTO bids\nSELECT\n    bid.auction, bid.bidder, bid.channel, bid.url, bid.datetime, bid.price as avg_price\nFROM\n    nexmark\nwhere\n    bid is not null\n```\n\n\n```shell\nCREATE TABLE nexmark with (\n    connector = 'nexmark',\n    event_rate = '100'\n);\n\nSELECT avg(bid.price) as avg_price\nFROM nexmark\nWHERE bid IS NOT NULL\nGROUP BY hop(interval '2 seconds', interval '10 seconds');\n```\n\n## TODO\n- Try [Redpanda Iceberg Topics for SQL-based analytics with zero ETL](https://github.com/redpanda-data/redpanda-labs/tree/main/docker-compose/iceberg) \n- [Build a Streaming CDC Pipeline with MinIO and Redpanda into Snowflake](https://blog.min.io/build-a-streaming-cdc-pipeline-with-minio-and-redpanda-into-snowflake/)\n- [SQLFlow](https://sql-flow.com/docs/introduction/basics) - Enables SQL-based stream-processing, powered by DuckDB.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxmlking%2Fstreaming-playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxmlking%2Fstreaming-playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxmlking%2Fstreaming-playground/lists"}