{"id":19227937,"url":"https://github.com/snakemake/snakemake-executor-plugin-kueue","last_synced_at":"2025-04-21T01:31:52.486Z","repository":{"id":180037518,"uuid":"664424405","full_name":"snakemake/snakemake-executor-plugin-kueue","owner":"snakemake","description":"A Snakemake executor plugin to enable running jobs with Kueue on Kubernetes","archived":false,"fork":false,"pushed_at":"2024-03-21T16:21:49.000Z","size":384,"stargazers_count":5,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-01T07:54:13.379Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snakemake.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-10T00:02:48.000Z","updated_at":"2024-07-25T17:19:09.000Z","dependencies_parsed_at":"2023-12-08T08:28:35.890Z","dependency_job_id":"e9c93043-3fdc-46ce-835d-6621641f95bb","html_url":"https://github.com/snakemake/snakemake-executor-plugin-kueue","commit_stats":null,"previous_names":["snakemake/snakemake-executor-kueue","snakemake/snakemake-executor-plugin-kueue"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakemake%2Fsnakemake-executor-plugin-kueue","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakemake%2Fsnakemake-executor-plugin-kueue/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakemake%2Fsnakemake-executor-plugin-kueue/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snakemake%2Fsnakemake-executor-plugin-kueue/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snakemake","download_url":"https://codeload.github.com/snakemake/snakemake-executor-plugin-kueue/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249982553,"owners_count":21355716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T15:25:57.620Z","updated_at":"2025-04-21T01:31:52.167Z","avatar_url":"https://github.com/snakemake.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Snakemake Executor Kueue\n\nThis is a [snakemake executor plugin](https://github.com/snakemake/snakemake-executor-plugin-interface/)\nthat enables interaction with [Kueue](https://kueue.sigs.k8s.io/docs/overview/). The plugin will\ninstall Python dependencies that are needed, and it's assumed that you have [installed Kueue and have queues configured](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/#before-you-begin).\n\n**under development** note that the base container is a custom build under my namespace (`vanessa`)\nthat has clones from main branches (as opposed to releases).\n\n## Overview\n\n[Kueue](https://kueue.sigs.k8s.io/docs/overview/) is (in simple terms) a job queueing system for Kubernetes. It doesn't just hold the queue, however, it also manages resource groups and decides when a job should be admitted (the pods allowed to be created so a job can run) and when they should be deleted. If you have used high performance computing workload managers, this would correpond to the queue of jobs. \n\n[Snakemake](https://snakemake.readthedocs.io/en/stable/) is a workflow management system. It is not concerned with a queue of work, but rather preparing steps from a directed acyclic graph (DAG) and then submitting the steps as jobs to a workload manager. Traditionally, many successful workflow tools have been developed for the biosciences, meaning that individual steps come down to running tools like bwa or samtools, and with little integration of high performance computing technologies like MPI.\n\nWhile you may not traditionally think of Kubernetes as a place to run MPI, with the movement for converged computing, this is changing. Technologies like the [Flux Operator](https://github.com/flux-framework/flux-operator) and [MPI Operator](https://github.com/kubeflow/mpi-operator) make it possible to run MPI workflows in Kubernetes. Since they are deployed as modular jobs (one or more pods working together) by an operator, this presents another opportunity for convergence - bringing together traditional workflow tools to submit not steps as jobs to an HPC system, but as [operator custom resource definitions](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) (CRD) to Kubernetes. This would allow simple steps to co-exist alongside steps that warrant more complex MPI. This is something I have been excited about for a while, and am (also) excited to share the first prototype here of that vision.  Let's talk about what this might look like with a simple example, below.\n\n![docs/kueue-snakemake.png](docs/kueue-snakemake.png)\n\nIn the above, we start with a workflow tool. In this case we are using Snakemake. The workflow tool is able to take a specification file, which in this case is the [Snakefile](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html), a human understandable definition of a workflow, and convert it into a directed acyclic graph, or DAG, which is essentially a directed graph. In this graph, each step can be thought of as a single job in the workflow that will receive it's own inputs, environment, and even container (especially in the case of Kubernetes) and then is expected to produce some output files. The modularity of a DAG also makes it amenable to operators. For example, if we have a step that runs LAMMPS simulations and needs MPI, we might submit a step to the Flux Operator to run a Flux Framework cluster in Kubernetes. If we just need to run a bash script for some analysis and don't need that complexity, we might choose a job instead. To go back to our picture, we see that the DAG generated for this faux workflow has 5 steps, and each of them is going to be given (by Snakemake) to our queueing software, which in this case is Kueue. The snakemake kueue executor knows how to read the Snakefile and see what CRDs are desired for each step, and then prepare those custom resource definitions (yaml definitions) that are going to be given to Kueue. Importantly, it's also the workflow software that manages timing of things and inputs and outputs. For example, Snakemake will be looking for the input for step 2 from step 1, and will throw an error if it's not there. Speaking of inputs and outputs, for this kind of setup where there isn't a shared filesystem, the common strategy in bioinformatics is to use object or remote storage, and this is [also built into Snakemake](https://snakemake.readthedocs.io/en/stable/snakefiles/storage.html). When all is said and done, Snakemke is creating jobs to run in Kubernetes that know how to find their inputs and send back their outputs, and the Snakemake Kueue executor here orchestrates the entire thing! \n\nIs that cool, or what? Note that this is just a prototype - I haven't even finished with the MPI Operator yet (or other types that might be of interest). For some additional background, this ability came with [Snakemake 8.0](https://github.com/snakemake/snakemake/issues/2409) where we introduced modules for executors, giving a developer like myself the freedom to prototype without needing to formally add to the Snakemake codebase.\n\n- [Excalidraw](https://excalidraw.com/#json=GxNYIdV0njBCQQ8Sq4laz,3R0qBW7_l5sntn0flqm3bw)\n\n## Usage\n\n### Setup\n\nYou will need to create a cluster first. For local development we recommend [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-source):\n\n```bash\nkind create cluster\n```\n\n#### Install Kueue\n\nYou will then need to [install Kueue](https://kueue.sigs.k8s.io/docs/installation/) and\n[create your local queues](https://kueue.sigs.k8s.io/docs/tasks/administer_cluster_quotas/) with cluster quotas\n\nE.g., here is an example:\n\n```bash\nVERSION=v0.4.0\nkubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml\n```\n\nThen (wait a few minutes until the jobset controller is running.) and:\n\n```bash\nkubectl  apply -f example/cluster-queue.yaml \nkubectl  apply -f example/resource-flavor.yaml \nkubectl  apply -f example/user-queue.yaml \n```\n```console\nclusterqueue.kueue.x-k8s.io/cluster-queue created\nresourceflavor.kueue.x-k8s.io/default-flavor created\nlocalqueue.kueue.x-k8s.io/user-queue created\n```\n\nYou'll also need kubernetes python installed, and of course Snakemake! Assuming you have snakemake and the plugin here installed, you should be good to go.\nHere is how I setup a local or development environment.\n\n```bash\npython -m venv env\nsource env/bin/activate\npip install .\n```\n\n### Container\n\nNote that while Snakemake still has a lot of moving pieces, the default container is built from the [Dockerfile](Dockerfile) here and provided as `vanessa/snakemake:kueue` in the executor code. Next go into an [example](example) directory to test out the Kueue executor.\n\n### Job Resources\n\nFor the different options below, each is exposed as a step option (as shown) or a flag, which would be applied globally and take the format:\n\n```console\n--kueue-\u003coption\u003e\n\n# E.g., for pull_always:\n--kueue-pull-always yes\n```\n\nnot all are supported for every operator. E.g., interactive mode is just for the Flux Operator,\nand some features are possible for other operators (but not implemented yet)! If there is a feature you want implemented or exposed,\nplease [open an issue](https://github.com/snakemake/snakemake-executor-plugin-kueue/issues).\n\n#### Operator\n\nBy default, Kueue will use a batchv1/Job for each step. However, you can\ncustomize this to a different operator with the job [resources](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources)\nvia the kueue_operator attribute:\n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_operator=flux-operator\n    shell:\n        \"...\"\n```\n\nAlong with a standard batchv1 Job, We currently support the following `operator`s:\n\n - flux-operator: deploy using the [Flux Operator](https://github.com/flux-framework/flux-operator)\n - job: (the default) either unset, or set to \"job\"\n\nAnd likely coming soon (or when someone requests it):\n\n - mpi-operator: deploy using the [MPI Operator](https://github.com/kubeflow/mpi-operator/)\n\nNote that you are in charge of installing and configuring the various operators on your cluster!\nSee the [Kueue tasks](https://kueue.sigs.k8s.io/docs/tasks/) for more details.\n\n#### Container\n\nYou can customize the container you are using, which should have minimally Snakemake and your application\nsoftware. We have prepared a container with Flux, Snakemake, and the various other plugins for you to get started.\nThe [Dockerfile](Dockerfile) is packaged here if you'd like to tweak it, e.g.,\n\n```yaml\nrule hello_world:\n    output:\n        \"...\",\n    resources: \n        container=\"ghcr.io/rse-ops/mamba:snakemake\",\n        kueue_operator=\"job\"\n    shell:\n        \"...\"\n```\n\nSee the [lammps](./example/flux-operator/lammps) example with the Flux Operator for a custom container.\n\n#### Working Directory\n\nThe working directory will be where the container starts, and if you don't define it, it will use the container default.\nThis can be defined globally with `--kueue-working-dir` or as a step attribute:\n\n```yaml\nrule hello_world:\n    output:\n        \"...\",\n    resources: \n        kueue_working_dir=\"/path/to/important/things\"\n    shell:\n        \"...\"\n```\n\n#### Memory\n\nThe memory defined for Kubernetes is in a string format, and while we could ideally\ndo a conversion for now we are lazy and ask you to define it directly:\n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_memory=200M1\n    shell:\n        \"...\"\n```\n\n#### Tasks\n\nThe Flux Operator can handle tasks for MPI, so you can set them as follows:\n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_tasks=1\n    shell:\n        \"...\"\n```\n\n\n#### Nodes\n\nThis should be the number of nodes (size) for the MiniCluster.\n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_nodes=4\n    shell:\n        \"...\"\n```\n\n\n#### Pull Always\n\nThis tells the Flux Operator to freshly pull containers. Note that this is only exposed for this operator, but is easy to add to the others too as\nthe `imagePullPolicy` -\u003e Always.\n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_pull_always=yes\n    shell:\n        \"...\"\n```\n\n\n#### Flux Container Base\n\nIt's important that the flux view (where Flux is installed from) has a view that matches the operating system you are using. By default we use an ubuntu:kammy image. You can see the views and containers available in [this repository](https://github.com/converged-computing/flux-views). \n\n```yaml\nrule a:\n    input:     ...\n    output:    ...\n    resources:\n        kueue_flux_container=\"ghcr.io/converged-computing/flux-view-rocky:tag-9\"\n        container=\"myname/myrockylinux:9\"\n    shell:\n        \"...\"\n```\n\nThe above would be used if you container is some rocky base.\n\n#### Interactive\n\nThis attribute is specific to the flux operator (for now). If set to true, we will turn interactive mode to true, meaning\nthat the entire flux instance will start in an interactive mode (with a sleep command) so you can shell into the container\nand look around.\n\n```yaml\nrule hello_world:\n    output:\n        \"...\",\n    resources: \n        kueue_interactive=true\n    shell:\n        \"...\"\n```\n\nNote that your script is written to `/tmp/run-job.sh` and you can connect to your flux instance as follows:\n\n```console\n. /mnt/flux/flux-view.sh\nflux proxy $fluxsocket /bin/bash\n```\n\nFor examples, check out the [example](example) directory.\n\n## Want to write a plugin?\n\nIf you are interested in writing your own plugin, instructions are provided via the [snakemake-executor-plugin-interface](https://github.com/snakemake/snakemake-executor-plugin-interface).\n\n## License\n\nHPCIC DevTools is distributed under the terms of the MIT license.\nAll new contributions must be made under this license.\n\nSee [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),\n[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and\n[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.\n\nSPDX-License-Identifier: (MIT)\n\nLLNL-CODE- 842614\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnakemake%2Fsnakemake-executor-plugin-kueue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnakemake%2Fsnakemake-executor-plugin-kueue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnakemake%2Fsnakemake-executor-plugin-kueue/lists"}