{"id":19888241,"url":"https://github.com/project-codeflare/rayvens","last_synced_at":"2025-05-02T17:31:57.864Z","repository":{"id":38440070,"uuid":"378178542","full_name":"project-codeflare/rayvens","owner":"project-codeflare","description":"Rayvens makes it possible for data scientists to access hundreds of data services within Ray with little effort.","archived":false,"fork":false,"pushed_at":"2022-11-29T17:29:44.000Z","size":1195,"stargazers_count":44,"open_issues_count":11,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-13T11:45:19.572Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/project-codeflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-18T14:36:30.000Z","updated_at":"2024-09-02T11:20:16.000Z","dependencies_parsed_at":"2023-01-21T13:03:13.255Z","dependency_job_id":null,"html_url":"https://github.com/project-codeflare/rayvens","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Frayvens","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Frayvens/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Frayvens/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/project-codeflare%2Frayvens/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/project-codeflare","download_url":"https://codeload.github.com/project-codeflare/rayvens/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224324498,"owners_count":17292521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T18:06:43.545Z","updated_at":"2024-11-12T18:06:43.616Z","avatar_url":"https://github.com/project-codeflare.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\n# Copyright IBM Corporation 2021\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n--\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"resources/logo.png\" /\u003e\n\u003c/p\u003e\n\n[![Build\nStatus](https://travis-ci.com/project-codeflare/rayvens.svg?branch=main)](https://travis-ci.com/github/project-codeflare/rayvens)\n[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0)\n\nRayvens augments [Ray](https://ray.io) with events. With Rayvens, Ray\napplications can subscribe to event streams, process and produce events. Rayvens\nleverages [Apache Camel](https://camel.apache.org) to make it possible for data\nscientists to access hundreds of data services with little effort.\n\nFor example, we can periodically fetch the AAPL stock price from a REST API with\ncode:\n```python\nsource_config = dict(\n    kind='http-source',\n    url='https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL',\n    period=3000)\nsource = rayvens.Stream('http', source_config=source_config)\n```\n\nWe can publish messages to Slack with code:\n```python\nsink_config = dict(kind='slack-sink',\n                   channel=slack_channel,\n                   webhook_url=slack_webhook)\nsink = rayvens.Stream('slack', sink_config=sink_config)\n```\n\nWe can delivers all events from the `source` stream to the `sink` using code:\n```python\nsource \u003e\u003e sink\n```\n\nWe also process events on the fly using Python functions, Ray tasks, or Ray\nactors and actor methods for stateful processing. For instance, we can log\nevents to the console using code:\n```python\nsource \u003e\u003e (lambda event: print('LOG:', event))\n```\n\n## Setup Rayvens\n\nRayvens is compatible with Ray 1.3 and up.\n\nRayvens is intended to run anywhere Ray can. Rayvens is routinely tested on\nmacOS 11 (Big Sur) and Ubuntu 18 (Bionic Beaver). Rayvens is distributed both as\na Python package on [pypi.org](https://pypi.org/project/rayvens/) and as a\ncontainer image on [quay.io](https://quay.io/repository/ibm/rayvens).\n\nTo install the latest Rayvens release run:\n```shell\npip install rayvens\n```\n\nWe recommend cloning this repository to obtain the example programs it offers:\n```shell\ngit clone https://github.com/project-codeflare/rayvens\n```\n\nThe Rayvens package makes it possible to run Ray programs that leverage Rayvens\nstreams to produce and consume _internal_ events. This package does not install\nApache Camel, which is necessary to run programs that connect to _external_\nevent sources and sinks. We discuss the Camel setup and the Rayvens container\nimage [below](#camel-setup).\n\n## A First Example\n\nThe [stream.py](examples/stream.py) file demonstrates an elementary Rayvens\nprogram.\n```python\nimport ray\nimport rayvens\n\n# initialize ray\nray.init()\n\n# initialize rayvens\nrayvens.init()\n\n# create a stream\nstream = rayvens.Stream('example')\n\n# log all future events\nstream \u003e\u003e (lambda event: print('LOG:', event))\n\n# append two events to the stream in order\nstream \u003c\u003c 'hello' \u003c\u003c 'world'\n```\n\nThis program initialize Ray and Rayvens and creates a `Stream` instance. Streams\nand events are the core facilities offered by Rayvens. Streams bridge event\npublishers and subscribers.\n\nIn this example, a subscriber is added to the stream using syntax `stream \u003e\u003e\nsubscriber`. The `\u003e\u003e` operator is a shorthand for the `send_to` method:\n```python\nstream.send_to(lambda event: print('LOG:', event))\n```\nAll events appended to the stream _after_ the invocation of the `\u003e\u003e` operator\n(or `send_to` method) will be delivered to the subscriber. Multiple subscribers\nmay be attached to the same stream. In general, subscribers can be Python\nfunctions, Ray tasks, or Ray actors. Hence, streams can interface publishers and\nsubscribers running on different Ray nodes.\n\nA couple of events are then published to the stream using the syntax `stream \u003c\u003c\nvalue`. In contrast to subscribers that are registered with the stream, there is\nno registration needed to publish event to the stream.\n\nAs illustrated here, events are just arbitrary values in general, but of course\npublishers and subscribers can agree on specific event schemas. The `\u003c\u003c`\noperator has left-to-right associativity making it possible to send multiple\nevents with one statement. The `\u003c\u003c` operator is a shorthand for the `append`\nmethod:\n```python\nstream.append('hello').append('world')\n```\n\nConceptually, the `append` method adds an event _at the end_ of the stream, just\nlike the `append` method of Python lists. But in contrast with lists, a stream\ndoes not persist events. It simply delivers events to subscribers as they come.\nIn particular, appending events to a stream without subscribers (and without an\noperator, see below) is a no-op.\n\nRun the example program with:\n```shell\npython rayvens/examples/stream.py\n```\n```\n(pid=37214) LOG: hello\n(pid=37214) LOG: world\n```\n\nObserve the two events are delivered in order. Events are delivered to function\nand actor subscribers in order, but task subscribers offer no ordering\nguarantees. See the [function.py](examples/function.py),\n[task.py](examples/task.py), and [actor.py](examples/actor.py) examples for\ndetails.\n\nThe `\u003c\u003c` and `\u003e\u003e` operator are not symmetrical. The `send_to` method (resp. `\u003e\u003e`\noperator) invokes its argument (resp. right-hand side) for every event appended\nto the stream. The `append` method and `\u003c\u003c` operator only append one event to\nthe stream.\n\n## Stream and StreamActor\n\nUnder the hood, streams are implemented as Ray actors. Concretely, the `Stream`\nclass is a stateless, serializable, wrapper around the `StreamActor` Ray actor\nclass. All rules applicable to Ray actors (lifecycle, serialization, queuing,\nordering) are applicable to streams. In particular, the stream actor will be\nreclaimed when the original stream handle goes out of scope.\n\nThe configuration of the stream actor can be tuned using `actor_options`:\n```python\nstream = rayvens.Stream('example', actor_options={num_cpus: 0.1})\n```\n\nFor convenience, most methods of the `Stream` class including the `send_to`\nmethod encapsulate the remote invocation of the homonymous `StreamActor` method\nand block until completion using `ray.get`. The `append` method is the\nexception. It returns immediately. Nevertheless, Ray actor's semantics\nguarantees that sequences of `append` invocations are processed in order.\n\nFor more control, it is possible to invoke methods directly on the stream actor,\nfor example:\n```python\nstream.actor.send_to.remote(lambda event: print('LOG:', event))\n```\n\n## Camel Setup\n\nRayvens uses\n[Camel-K](https://developers.redhat.com/blog/2020/05/12/six-reasons-to-love-camel-k).\nto interact with a wide range of external source and sink types such as Slack, Cloud\nObject Storage, Telegram or Binance (to name a few). Camel-K augments Apache\nCamel's extensive component catalog with support for Kubernetes and serverless\nplatforms. Rayvens is compatible with Camel-K 1.3 and up.\n\nTo run Rayvens programs including Camel sources and sinks, there are two\nchoices:\n- local mode: run a Camel source or sink in the same execution context as the\n  stream actor it is attached to using the [Camel-K\n  client](https://camel.apache.org/camel-k/latest/cli/cli.html): same container,\n  same virtual or physical machine.\n- operator mode: run a Camel source or sink inside a Kubernetes cluster relying\n  on the [Camel-K\n  operator](https://camel.apache.org/camel-k/latest/architecture/operator.html)\n  to manage dedicated Camel pods.\n\nThe default mode is the local mode. The mode can be specified when initializing\nRayvens:\n```python\nrayvens.init(mode='operator')\n```\n\nThe mode can also be specified using environment variable `RAYVENS_MODE`. \nThe mode specified in the code (if any) takes precedence.\n\n### Local Mode Prerequisites\n\nLocal mode is intended to permit running Rayvens anywhere Ray runs: on a\ndeveloper laptop, in a virtual machine, inside a Ray cluster (running on\nKubernetes or OpenShift for example), or standalone.\n\nLocal mode requires the [Camel-K\nclient](https://camel.apache.org/camel-k/latest/cli/cli.html), Java, and Maven\nto be installed in the context in which the source or sink will be run.\nWhen running in a cluster, Java and Maven can be added to an existing Ray\ninstallation or image. The Rayvens image is based on a Ray image onto which we\nadd the necessary dependencies to enable the running of Camel-K sources and\nsinks in local mode inside the container.\nThe all-in-one Rayvens container image distributed on\n[quay.io](https://quay.io/repository/ibm/rayvens) adds Camel-K 1.5.1 to a base\n`rayproject/ray:1.13.0-py38` image. See [Dockerfile.release](Dockerfile.release)\nfor specifics.\n\n### Operator Mode Prerequisites\n\nOperator mode requires access to a Kubernetes cluster running the Camel-K\noperator and configured with the proper RBAC rules. See\n[below](#ray-cluster-setup) for details.\n\nAt this time, the operator mode requires the Ray code to also run inside the same\nKubernetes cluster and requires the Camel-K client to be deployed to the Ray\nnodes. We intend to lift these restrictions shortly.\n\nInstalling and using the Camel-K operator to deploy sources and sinks does not\nrequire Java or Maven.\n\n## Dynamic Dependencies\n\nCamel-K is designed to pull dependencies dynamically from Maven Central at run\ntime. While it is possible to preload dependencies to support air-gapped\nexecution environments, Rayvens does not handle this yet.\n\n## Ray Cluster Setup\n\nThe Rayvens [container image](https://quay.io/repository/ibm/rayvens) makes it\neasy to deploy Rayvens-enabled Ray clusters to various container platforms. The\n[rayvens-setup.sh](scripts/rayvens-setup.sh) script supports several\nconfigurations out of the box: existing Kubernetes and OpenShift clusters,\ndevelopment [Kind](https://kind.sigs.k8s.io) cluster, [IBM Cloud Code\nEngine](https://www.ibm.com/cloud/code-engine). This script is distributed as\npart of the Rayvens package and should typically have been added to the\nexecutable search path by `pip install`. It is self-contained and therefore can\nalso be obtained directly:\n```shell\ncurl -Lo rayvens-setup.sh https://raw.githubusercontent.com/project-codeflare/rayvens/main/scripts/rayvens-setup.sh\n```\nThe full documentation for the script is available [here](docs/setup.md).\n\nThe script is provided for convenience. It is of course possible to setup a\nRayvens-enabled Ray cluster directly. We provide an example cluster\nconfiguration in [cluster.yaml](scripts/cluster.yaml). This configuration file\nis derived from Ray's\n[example-full.yaml](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/kubernetes/example-full.yaml)\nconfiguration file. The key changes are:\n- use of Rayvens container image,\n- RBAC enhancements to support the Camel-K operator,\n- adjustments to resource requests and limits to account for the needs of Camel\n  in local mode.\n\nThe generated and example configuration files also set `RAY_ADDRESS=auto` on the\nhead node, making it possible to run our example codes on the Ray cluster\nunchanged.\n\n### Kind Cluster Setup\n\nTo test Rayvens on a development Kubernetes cluster we recommend using\n[Kind](https://kind.sigs.k8s.io).\n\nWe assume [Docker Desktop](https://www.docker.com/products/docker-desktop) is\ninstalled. We assume Kubernetes support in Docker Desktop is turned off. We\nassume [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl) is installed.\nFollow [instructions](https://kind.sigs.k8s.io/docs/user/quick-start) to install\nthe Kind client.\n\nTo create a Kind cluster and run a Rayvens-enabled Ray cluster on this Kind\ncluster, run:\n```shell\nrayvens-setup.sh --kind --registry --kamel\n```\nThe resulting cluster supports both local and operator modes. The command not\nonly initializes the Kind cluster but also launches a docker registry on port\n5000 to be used by the Camel-K operator. To skip the registry and Camel-K setup,\nrun instead:\n```shell\nrayvens-setup.sh --kind\n```\nIn this configuration, only local mode is supported. See [here](docs/setup.md)\nfor details.\n\nThe setup script produces a `rayvens.yaml` Ray cluster configuration file in the\ncurrent working directory. Try running on this cluster with:\n```shell\nray submit rayvens.yaml rayvens/examples/stream.py\n```\n\nTo take down the Kind cluster run:\n```shell\nkind delete cluster\n```\n\nTo take down the docker registry run:\n```shell\ndocker stop registry\ndocker rm registry\n```\n\n## Event Source Example\n\nThe [source.py](examples/source.py) example demonstrates how to process external\nevents with Rayvens.\n\nFirst, we create a stream connected to an external event source:\n```python\nsource = rayvens.Stream('http')\nsource_config = dict(\n    kind='http-source',\n    url='https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL',\n    period=3000)\nsource.add_source(source_config)\n```\n\nAn event source configuration is a dictionary. The `kind` key specifies the\nsource type. Other keys vary. An `http-source` periodically makes a REST call to\nthe specified `url`. The `period` is expressed in milliseconds. The events\ngenerated by this source are the bodies of the responses encoded as strings.\n\nFor convenience, the construction of the stream and addition of the source can\nbe combined into a single statement:\n```python\nsource = rayvens.Stream('http', source_config=source_config)\n```\n\nIn this example, we use the `http-source` to fetch the current price of the AAPL\nstock.\n\nWe then implement a Ray actor to process these events:\n```python\n@ray.remote\nclass Comparator:\n    def __init__(self):\n        self.last_quote = None\n\n    def append(self, event):\n        payload = json.loads(event)  # parse event string to json\n        quote = payload['quoteResponse']['result'][0]['regularMarketPrice']\n        try:\n            if self.last_quote:\n                if quote \u003e self.last_quote:\n                    print('AAPL is up')\n                elif quote \u003c self.last_quote:\n                    print('AAPL is down')\n                else:\n                    print('AAPL is unchanged')\n        finally:\n            self.last_quote = quote\n\ncomparator = Comparator.remote()\n```\n\nThis actor instance compares the current price with the last price and prints a\nmessage accordingly.\n\nWe then simply subscribe the `comparator` actor instance to the `source` stream.\n```python\nsource \u003e\u003e comparator\n```\n\nBy using a Ray actor to process events, we can implement stateful processing and\nguarantee that events will be processed in order.\n\nThe `Comparator` class follows the convention that it accepts events by means of\na method named `append`. If for instance this method were to be named `accept`\ninstead, then we would have to subscribe the actor to the source using syntax\n`source \u003e\u003e comparator.accept`. In other words, subscribing an actor `a` to a\nstream is a shorthand for subscribing the `a.append` method of this actor to the\nstream.\n\n### Running the example\n\nRun the example locally with:\n```shell\npython rayvens/examples/source.py\n```\n\nRun the example on Kind with:\n```shell\nray submit rayvens.yaml rayvens/examples/source.py\n```\n\nWhen running in local mode, the Camel-K client has to download and cache\ndependencies on first run from Maven Central. When running in operator mode, the\nCamel-K operator is used to build and cache a container image for the source. In\nboth cases, the source may take a minute or more to start the first time. The\nsource should start in matter of seconds on subsequent runs (unless it is\nscheduled to a different Ray worker in local mode, as the cache is not shared\nacross workers).\n\nRayvens manages the Camel processes and pods automatically and makes sure to\nterminate them all when the main Ray program exits (normally or abnormally).\n\n## Event Sink Example\n\nThe [slack.py](examples/slack.py) builds upon the previous example by pushing\nthe output messages to [Slack](https://slack.com).\n\nIn addition to the same source as before, it instantiates a sink:\n```python\nsink = rayvens.Stream('slack')\nsink_config = dict(kind='slack-sink',\n                   channel=slack_channel,\n                   webhook_url=slack_webhook)\nsink.add_sink(sink_config)\n```\n\nFor convenience, the construction of the stream and addition of the sink can be\ncombined into a single statement:\n```python\nsink = rayvens.Stream('slack', sink_config=sink_config)\n```\n\nThis sink sends messages to Slack. It requires two configuration parameters that\nmust be provided as command-line parameters to the example program:\n- the slack channel to publish to, e.g., `#test`, and\n- a webhook url for this channel.\n\nPlease refer to the [Slack webhooks](https://api.slack.com/messaging/webhooks)\ndocumentation for details on how to obtain these.\n\nThis example program includes a `Comparator` actor similar to the previous\nexample:\n```python\n@ray.remote\nclass Comparator:\n    def __init__(self):\n        self.last_quote = None\n\n    def append(self, event):\n        payload = json.loads(event)  # parse event string to json\n        quote = payload['quoteResponse']['result'][0]['regularMarketPrice']\n        try:\n            if self.last_quote:\n                if quote \u003e self.last_quote:\n                    return 'AAPL is up'\n                elif quote \u003c self.last_quote:\n                    return 'AAPL is down'\n                else:\n                    return 'AAPL is unchanged'\n        finally:\n            self.last_quote = quote\n\ncomparator = Comparator.remote()\n```\n\nIn contrast to the previous example, we don't want to simply print messages to\nthe console from the comparator, but rather to produce a new stream of events\ntransformed by the comparator. To this aim, we construct an operator stream:\n```python\noperator = rayvens.Stream('comparator')\noperator.add_operator(comparator)\n```\nor simply:\n```python\noperator = rayvens.Stream('comparator', operator=comparator)\n```\nLike any other stream, this operator stream can receive events and deliver\nevents to subscribers, but unlike earlier example, it applies a transformation\nto the events. Concretely, it invokes the `append` method of the comparator\ninstance on each event and delivers the returned value to subscribers. By\nconvention, when `append` does not return a value, i.e., returns `None`, no\nevent is delivered to subscribers. In this example, the first source event does\nnot generate a Slack message.\n\nWe can then connect the source and sink via this operator using code:\n```python\nsource \u003e\u003e operator \u003e\u003e sink\n```\nwhich is a shorthand for:\n```python\nsource.send_to(operator)\noperator.send_to(sink)\n```\n\nLike subscribers, the argument to the `add_operator` method may be a Python\nfunction, a Ray task, a Ray actor, or a Ray actor method. Using an actor like\n`comparator` is shorthand for the actor method `comparator.append`. Building an\noperator stream from a Ray task is not recommended however as it may reorder\nevents arbitrarily.\n\n### Running the example\n\nWe assume the `SLACK_CHANNEL` and `SLACK_WEBHOOK` environment variables contain\nthe necessary configuration parameters.\n\nRun the example locally with:\n```shell\npython rayvens/examples/slack.py \"$SLACK_CHANNEL\" \"$SLACK_WEBHOOK\"\n```\n\nRun the example on Kind with:\n```shell\nray submit rayvens.yaml rayvens/examples/slack.py \"$SLACK_CHANNEL\" \"$SLACK_WEBHOOK\"\n```\n\n## Combining Sources, Sinks, and Operators\n\nA stream can have zero, one, or multiple sources, zero, one, or multiple sinks,\nzero or one operator. For instance, rather than using three stream instances to\nbuild our Slack example, we could do everything with a single stream as follows:\n```python\nsource_config = dict(\n    kind='http-source',\n    url='https://query1.finance.yahoo.com/v7/finance/quote?symbols=AAPL',\n    period=3000)\n\nsink_config = dict(kind='slack-sink',\n                   channel=slack_channel,\n                   webhook_url=slack_webhook)\n\noperator = rayvens.Stream('comparator',\n                          source_config=source_config,\n                          operator=operator,\n                          sink_config=sink_config)\n```\nThis reduces the number of stream actors to one down from three and\nsignificantly cut the number of remote invocations on the critical path hence\nreducing latency.\n\n## Further Reading\n\n- Rayvens related blogs are published on [Medium](https://medium.com/codeflare).\n- The `rayvens-setup.sh` script is documented in [setup.md](docs/setup.md).\n- The configuration of the Camel sources and sinks is explained in\n  [connectors.md](docs/connectors.md).\n\n## License\n\nRayvens is an open-source project with an [Apache 2.0 license](LICENSE.txt).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Frayvens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fproject-codeflare%2Frayvens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fproject-codeflare%2Frayvens/lists"}