{"id":13742246,"url":"https://github.com/outerbounds/metaflow-ray","last_synced_at":"2025-10-24T09:07:06.706Z","repository":{"id":190703719,"uuid":"683148848","full_name":"outerbounds/metaflow-ray","owner":"outerbounds","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-28T11:45:43.000Z","size":106,"stargazers_count":22,"open_issues_count":3,"forks_count":1,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-01-07T08:39:32.114Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outerbounds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-25T18:05:50.000Z","updated_at":"2024-12-28T11:45:46.000Z","dependencies_parsed_at":"2025-02-11T02:35:33.929Z","dependency_job_id":null,"html_url":"https://github.com/outerbounds/metaflow-ray","commit_stats":null,"previous_names":["outerbounds/metaflow-ray"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-ray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-ray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-ray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-ray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outerbounds","download_url":"https://codeload.github.com/outerbounds/metaflow-ray/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250328578,"owners_count":21412639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T05:00:25.341Z","updated_at":"2025-10-24T09:07:06.624Z","avatar_url":"https://github.com/outerbounds.png","language":"Python","funding_links":[],"categories":["Models and Projects","Distributed Compute \u0026 Training"],"sub_categories":["Misc"],"readme":"# Metaflow-Ray\n\n### Introduction\n`metaflow-ray` is an extension for Metaflow that enables seamless integration with Ray, allowing users to easily leverage \nRay's powerful distributed computing capabilities within their Metaflow flows. With `metaflow-ray`, you can spin up ephemeral Ray clusters on AWS Batch or Kubernetes directly from your Metaflow steps using the `@metaflow_ray` decorator. This enables you to run your Ray applications that leverage Ray Core, Ray Train, Ray Tune, and Ray Data effortlessly within your Metaflow flow.\n\n### Features\n- \u003cb\u003eEffortless Ray Integration:\u003c/b\u003e This extension provides a simple and intuitive way to incorporate Ray \ninto your Metaflow workflows using the `@metaflow_ray` decorator.\n- \u003cb\u003eElastic Ephemeral Ray Clusters:\u003c/b\u003e Let Metaflow orchestrate the creation of ephemeral Ray clusters on top of either:\n    - AWS Batch multi-node parallel jobs\n    - Kubernetes JobSets\n- \u003cb\u003eSeamless Ray Initialization:\u003c/b\u003e The `@metaflow_ray` decorator handles the initialization of the Ray cluster for you, so you can focus on writing your Ray code without worrying about cluster setup\n- \u003cb\u003eWide Range of Applications:\u003c/b\u003e Run a wide variety of Ray applications, including hyperparameter tuning, distributed data processing, and distributed training, etc.\n\n### Installation\nYou can install `metaflow-ray` via `pip` alongside your existing Metaflow installation:\n```\npip install metaflow-ray\n```\n\n### Getting Started\n1. Import the `@metaflow_ray` decorator to enable integration:\n\n```python\nfrom metaflow import metaflow_ray\n```\n\n2. Decorate your step with `@metaflow_ray` and Initialize Ray within Your Step:\n\n```python\n@step\ndef start(self):\n    self.next(self.train, num_parallel=NUM_NODES)\n\n@metaflow_ray\n@pypi(packages={\"ray\": \"2.39.0\"})\n@batch(**RESOURCES) # You can even use @kubernetes \n@step\ndef train(self):\n    import ray\n    ray.init()\n    # Your step's training code here\n\n    self.next(self.join)\n\n@step\ndef join(self, inputs):\n    self.next(self.end)\n\n@step\ndef end(self):\n    pass\n```\n\n### Some things to consider:\n\n1. The `num_parallel` argument must always be specified in the step preceding the transition to a step decorated with `@metaflow_ray`. In the example above, the `start` step transitions to the `train` step, and it includes the `num_parallel` argument because the `train` step is decorated with `@metaflow_ray`. This ensures the `train` step can execute in parallel as intended.\n- As a consequence, there must always exist a corresponding `join` step as highlighted in the snippet above.\n\n2. For remote execution environments (i.e. `@metaflow_ray` is used in conjunction with `@batch` or `@kubernetes`), the value of `num_parallel` should greater than 1 i.e. at least 2. However, when using the `@metaflow_ray` decorator in a standalone manner, the value of `num_parallel` cannot be greater than 1 (on Windows and macOS) because locally spun up ray clusters do not support multiple nodes unless the underlying OS is linux based.\n- Ideally, `ray` should be available in the remote execution environments. If not, one can always use the `@pypi` decorator to introduce `ray` as a dependency.\n\n4. If the `@metaflow_ray` decorator is used in a local context i.e. without `@batch` or `@kubernetes`, a local ray cluster is spinned up, provided that the `ray` library (installable via `pip install ray`) is available in the underlying python environment. Running the flow again (locally) could result in the issue of:\n```\nConnectionError: Ray is trying to start at 127.0.0.1:6379, but is already running at 127.0.0.1:6379.\nPlease specify a different port using the `--port` flag of `ray start` command.\n```\nOne can simply run `ray stop` in another terminal to terminate the ray cluster that was spun up locally.\n\n### Examples\nCheck out the [examples](/examples) directory for sample Metaflow flows that demonstrate how to use the `metaflow-ray` extension \nwith various Ray applications.\n\n| Directory | Description |\n| :--- | ---: |\n| [Counter](examples/basic_counter/README.md) | Run a basic Counter with Ray that increments in Python, then do it inside a Metaflow task! |\n| [Process Dataframe](examples/dataframe_process/README.md) | Process a large dataframe in chunks with Ray and Python, then do it inside a Metaflow task! |\n| [Custom Docker Images](examples/custom_docker_images/README.md) | Specify custom docker images on kubernetes / batch with Ray on Metaflow |\n| [Train XGBoost](examples/train_xgboost/README.md) | Use [Ray Train](https://docs.ray.io/en/latest/train/train.html) to build XGBoost models on multiple nodes, including CPU and GPU examples. |\n| [Tune PyTorch](examples/tune_pytorch/README.md) | Use [Ray Tune](https://docs.ray.io/en/latest/tune/tune.html) to build PyTorch models on multiple nodes, including CPU and GPU examples. |\n| [PyTorch Lightning](examples/ray_torch_lightning/README.md) | Get started with running a PyTorch Lightning job on the Ray cluster formed in a `@metaflow_ray` step. |\n| [GPT-J Fine Tuning](examples/fine-tune-gpt-j/README.md) | Fine tune the 6B parameter GPT-J model on a Ray cluster. |\n| [vLLM Inference](examples/vllm_inference/README.md) | Run Inference on Llama models with vLLM and Ray via Metaflow. |\n| [End-to-end Batch Workflow](examples/e2e-batch/README.md) | Train models, evaluate them, and serve them. See how to use Metaflow workflows and various Ray abstractions together in a complete workflow. |\n\n### License\n`metaflow-ray` is distributed under the \u003cu\u003eApache License\u003c/u\u003e.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-ray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fouterbounds%2Fmetaflow-ray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-ray/lists"}