{"id":29840617,"url":"https://github.com/graphform/gen-ai-stream-operators","last_synced_at":"2025-08-18T03:32:38.239Z","repository":{"id":253164993,"uuid":"835837726","full_name":"graphform/gen-ai-stream-operators","owner":"graphform","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-29T10:11:00.000Z","size":148,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-29T14:59:31.989Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graphform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-07-30T16:18:35.000Z","updated_at":"2025-06-20T22:33:42.000Z","dependencies_parsed_at":"2024-08-29T11:44:02.128Z","dependency_job_id":null,"html_url":"https://github.com/graphform/gen-ai-stream-operators","commit_stats":null,"previous_names":["nstreamio/gen-ai-stream-operators","graphform/gen-ai-stream-operators"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/graphform/gen-ai-stream-operators","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphform%2Fgen-ai-stream-operators","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphform%2Fgen-ai-stream-operators/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphform%2Fgen-ai-stream-operators/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphform%2Fgen-ai-stream-operators/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graphform","download_url":"https://codeload.github.com/graphform/gen-ai-stream-operators/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graphform%2Fgen-ai-stream-operators/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270940406,"owners_count":24671669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-18T02:00:08.743Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T14:16:52.204Z","updated_at":"2025-08-18T03:32:38.141Z","avatar_url":"https://github.com/graphform.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dynamic Stream Operator Generation using OpenAI and SwimOS\n\nDiscover how combining the power of \u003ca href=\"https://openai.com/api/\"\u003eOpenAI’s\u003c/a\u003e ChatGPT with the robust real-time data processing capabilities of \u003ca href=\"https://www.swimos.org/\"\u003eSwimOS\u003c/a\u003e can transform your data streams into actionable insights on-the-fly. Streaming systems handle continuous and potentially unbounded streams of data, processing each piece of data in upon arrival rather than waiting for the next batch. Stream operators are designed to process data incrementally while taking into account the boundless nature of the stream. They perform various transformations, computations, or aggregations on this real-time data. In this article, we’ll demonstrate how to dynamically generate stream operators using ChatGPT and seamlessly inject them into your application at runtime.\n\n## Technical Overview\n\nOur application consists of the following components:\n\n- **OpenAI ChatGPT**: Utilized for generating stream operator code based on user input.\n- **SwimOS**: Handles real-time streaming data and provides a Python client for interacting with Nstream’s SwimOS streaming data application platform.\n- **Client code**: A Python script that integrates OpenAI and SwimOS to facilitate dynamic stream operator generation.\n\n## Code Walkthrough\n\n### Dependencies\n\nWe start by importing the required libraries, including `openai` for ChatGPT and `swimos` for SwimOS. We also load environment variables from a `.env` file, which contains our OpenAI API key.\n\n```python\nimport json\nimport os\nimport re\nimport time\n\nfrom dotenv import load_dotenv\nfrom openai import OpenAI\nfrom swimos import SwimClient\n```\n\n### SwimOS and OpenAI Client Setup\n\nWe initialize the SwimOS client, specifying the host URI for either the live or simulated feed. We configure two stock data feeds: a 24/7 simulated feed and a live feed, commented out, available during market hours.\n\n```python\n# Load environment variables from .env file\nload_dotenv()\n\n# host_uri = \"wss://stocks-live.nstream-demo.io\"    # live feed during market hours\nhost_uri = \"wss://stocks-simulated.nstream-demo.io\" # simulated feed 24/7\ncurrent_exchange_rate = 1.2\ncurrent_alert_threshold = 50.0\nsynced = False\n\nllm_client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\"))\nswim_client = SwimClient(debug=True)\nswim_client.start()\n```\n\n### Downlink Setup\n\nWe create a downlink using the SwimOS Python client, providing a callback function to receive updates. We synchronize on the initialization of the Swim client before returning. We also set up a callback function to know when we are ready to receive updates.\n\n```python\ndef wait_did_sync():\n    global synced\n    synced = True\n\ndef setup_value_downlink(node_uri: str, callback=None):\n    global swim_client\n    global synced\n    value_downlink = swim_client.downlink_value()\n    value_downlink.set_host_uri(host_uri)\n    value_downlink.set_node_uri(node_uri)\n    value_downlink.set_lane_uri(\"status\")\n    if callback is not None:\n        value_downlink.did_set(callback)\n    value_downlink.did_sync(wait_did_sync)\n    value_downlink.open()\n    while not synced:\n        time.sleep(1)\n    return value_downlink\n```\n\n### ChatGPT Code Generation\n\nWe define a function to generate code using ChatGPT. We utilize a retry loop to handle requests to ChatGPT, as the responses can be unpredictable. We make a call to the `chat.completions.create` endpoint and ensure we extract only the JSON data using a regular expression.\n\n```python\ndef generate_llm_code(\n    prompt: str,\n    expect_json: bool = False,\n    max_retries: int = 3, \n    retry_delay: int = 1):\n\n    retries = 0\n    while retries \u003c max_retries:\n        try:\n            response = llm_client.chat.completions.create(\n                messages=[\n                    {\n                        \"role\": \"user\",\n                        \"content\": prompt,\n                    }\n                ],\n                model=\"gpt-4\",\n                max_tokens=1000\n            )\n            response_content = response.choices[0].message.content.strip()\n\n            if expect_json:\n                # Use regex to extract JSON object with non-greedy match\n                json_match = re.search(\n                    r'\\{(?:[^{}]|\\{(?:[^{}]|\\{[^{}]*\\})*\\})*\\}',\n                    response_content, re.DOTALL)\n                if json_match:\n                    json_str = json_match.group(0)\n                    try:\n                        result = json.loads(json_str)\n                        return result['result'] if 'result' in result else result\n                    except (json.JSONDecodeError, KeyError) as e:\n                        raise ValueError(\"Failed to decode JSON from LLM response\") from e\n                else:\n                    raise ValueError(\"No valid JSON found in LLM response\")\n            else:\n                return response_content\n\n        except Exception as e:\n            retries += 1\n            print(f\"Error: {e}, retrying... ({retries}/{max_retries})\")\n            time.sleep(retry_delay)\n\n    raise ValueError(\"Max retries exceeded, failed to get valid response from LLM\")\n```\n\n### Dynamic Function Generation\n\nWe ask ChatGPT to dynamically generate Python code for a stream operator. This approach provides flexibility, allowing us to handle various data processing needs dynamically without redeploying the application. Specifically, we request a function to maintain a simple moving average with a window size of 5.\n\n```python\ndef accumulate_generate(symbol: str, streaming_operator: str, parameters: dict):\n    \"\"\"Generate a function to accumulate stock prices (min/max/avg) using LLM\"\"\"\n    global acc\n    acc = {}\n\n    prompt = f\"\"\"\n    Return a JSON result, and only a JSON result. The JSON must have a single\n    top-level key: `result`. In this `result` key, store a string that contains\n    a python function with the following signature\n    `def func(acc: dict, new_value: float, params: dict):`\n    and the implementation must be as follows: calculate the {streaming_operator}\n    on `new_value` given accumulator state of `acc` that your function has\n    defined in order to continue applying the {streaming_operator} as each new\n    value arrives. Your function must return a tuple consisting of `acc` followed\n    by the result of its calculation. The parameters for this operation are: {parameters}.\n    \"\"\"\n    function_code_str = generate_llm_code(prompt, expect_json=True)\n\n    # Generate and save a dynamic function\n    local_vars = {}\n    exec(function_code_str, {}, local_vars)\n    func_name = function_code_str.split('(')[0].split()[1]\n    func = local_vars[func_name]\n\n    def accumulate_generate_callback(new_value: dict, _old_value: dict):\n        global acc\n        print(f\"accumulate_generate_callback received for {symbol}: {new_value}.\\n\")\n        acc, summary = func(acc, new_value['price'], parameters)\n        print(f\"{symbol} -- summary: {summary}; acc: {acc}\")\n\n    node_uri = f\"/stock/{symbol}\"\n    print('Streaming data, press Ctrl+C to stop')\n    value_downlink = setup_value_downlink(node_uri, accumulate_generate_callback)\n    try:\n        while True:\n            time.sleep(1)\n    except KeyboardInterrupt:\n        value_downlink.close()\n        print('Streaming stopped')\n\nif __name__ == \"__main__\":\n    result = accumulate_generate(\"AAAA\", \"simple moving average\", {\"window_size\": 5})\n    print(result)\n```\n\n## Running the code\n\nTwo scripts have been provided within the `src` folder. The first script, `snippet.py` corresponds to this tutorial, and gives a streamlined walk-through of how to generate stream operators from OpenAI, and then inject them into a SwimOS client callback. You can simply invoke it by running:\n\n```sh\npython snippet.py\n```\n\nBy default, it uses a simulated stock feed, as it is always available. It is also slowed down a little, to make it easier to follow. To change to the live feed, swap the commenting of lines 16 and 17 to reflect what's shown below:\n\n```python\nhost_uri = \"wss://stocks-live.nstream-demo.io\"    # live feed during market hours\n# host_uri = \"wss://stocks-simulated.nstream-demo.io\" # simulated feed 24/7\n```\n\nThen, go to line 131, under `__main__` to change it to a real stock symbol like \"NVDA\":\n\n```python\nif __name__ == \"__main__\":\n    result = accumulate_generate(\"NVDA\", \"simple moving average\", {\"window_size\": 5})\n```\n\nThe second script, `main.py`, corresponds to a console app that builds on the core functionality to support natural language queries and utilize the full range of functionality.\n\nTo run either script, you'll need to copy `./src/dot-env-file` to `./src/.env` and replace `\u003cyour_api_key\u003e` with a valid OpenAI key:\n\n```\n# so just set your OpenAI key and rename this file to .env\nOPENAI_API_KEY=\u003cyour_api_key\u003e\n```\n\nIf you don't have one, you can get one here:\n\u003ca href=\"https://openai.com/api/\"\u003ehttps://openai.com/api/\u003c/a\u003e.\n\nThen, within the `./src` folder, you can invoke the console app by running main.py. Here are some examples:\n\n```sh\npython main.py --help\n\npython main.py execute \"Give me the stock price for AAAA\"\npython main.py execute \"Stream stock prices for AAAA\"\npython main.py execute \"Convert stock price for AAAA using an exchange rate of 1.2\"\npython main.py execute \"Create a function to convert stock prices for AAAA with an exchange rate of 1.2\"\npython main.py execute \"Alert me if stock price for AAAA goes below 20\"\npython main.py execute \"Create a function to alert me if stock price for AAAA goes below 20\"\npython main.py execute \"Accumulate stock prices for AAAA using average with a window size of 5\"\npython main.py execute \"Create a function to accumulate stock prices for AAAA using average with a window size of 5\"\n\npython main.py read-adhoc AAAA\npython main.py read-streaming AAAA\npython main.py accumulate-direct AAAA avg --operation-config '{\"window_size\": 5}'\npython main.py accumulate-generate AAAA avg --operation-config '{\"window_size\": 5}'\npython main.py map-direct AAAA '{\"description\": \"apply exchange rate\", \"parameters\": {\"exchange_rate\": \"1.2\"}}'\npython main.py map-generate AAAA '{\"description\": \"apply exchange rate\", \"parameters\": {\"exchange_rate\": \"1.2\"}}'\npython main.py filter-direct AAAA '{\"description\": \"flag any values under 20\", \"parameters\": {\"threshold\": 20}}'\npython main.py filter-generate AAAA '{\"description\": \"flag any values under 20\", \"parameters\": {\"threshold\": 20}}'\n```\n\n## Conclusion\n\nIn this article, we demonstrated how to integrate OpenAI's ChatGPT and SwimOS for dynamic stream operator generation. By leveraging ChatGPT's code generation capabilities and SwimOS's real-time streaming data processing, we can create efficient and scalable data processing pipelines for ad-hoc use cases.\n\nNow it's your turn! Try implementing these patterns in your own projects and see the benefits of dynamic, real-time data processing firsthand.\n\nTo learn more about SwimOS, please visit \u003ca href=\"https://www.swimos.org/\"\u003ehttps://www.swimos.org/\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphform%2Fgen-ai-stream-operators","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraphform%2Fgen-ai-stream-operators","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraphform%2Fgen-ai-stream-operators/lists"}