{"id":37183356,"url":"https://github.com/sylabs/wlm-operator","last_synced_at":"2026-01-14T21:11:03.472Z","repository":{"id":55453351,"uuid":"179032436","full_name":"sylabs/wlm-operator","owner":"sylabs","description":"Singularity implementation of k8s operator for interacting with SLURM.","archived":true,"fork":false,"pushed_at":"2020-12-29T20:45:34.000Z","size":67268,"stargazers_count":117,"open_issues_count":15,"forks_count":28,"subscribers_count":19,"default_branch":"master","last_synced_at":"2024-11-15T01:30:35.043Z","etag":null,"topics":["hpc","k8s","kubernetes","kubernetes-operator","singularity","slurm"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-02T08:25:49.000Z","updated_at":"2024-11-05T07:35:58.000Z","dependencies_parsed_at":"2022-08-15T00:40:10.793Z","dependency_job_id":null,"html_url":"https://github.com/sylabs/wlm-operator","commit_stats":null,"previous_names":["sylabs/slurm-operator"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sylabs/wlm-operator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylabs%2Fwlm-operator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylabs%2Fwlm-operator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylabs%2Fwlm-operator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylabs%2Fwlm-operator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sylabs","download_url":"https://codeload.github.com/sylabs/wlm-operator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylabs%2Fwlm-operator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28434589,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T18:57:19.464Z","status":"ssl_error","status_checked_at":"2026-01-14T18:52:48.501Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hpc","k8s","kubernetes","kubernetes-operator","singularity","slurm"],"created_at":"2026-01-14T21:11:02.623Z","updated_at":"2026-01-14T21:11:03.437Z","avatar_url":"https://github.com/sylabs.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WLM-operator\n\nThe singularity-cri and wlm-operator projects were created by Sylabs to explore interaction between the Kubernetes and HPC worlds. In 2020, rather than dilute our efforts over a large number of projects, we have focused on Singularity itself and our supporting services. We're also looking forward to introducing new features and technologies in 2021.\n\nAt this point we have archived the repositories to indicate that they aren't under active development or maintenance. We recognize there is still interest in singularity-cri and wlm-operator, and we'd like these projects to find a home within a community that can further develop and maintain them. The code is open-source under the Apache License 2.0, to be compatible with other projects in the k8s ecosystem.\n\nPlease reach out to us via community@sylabs.io if you are interested in establishing a new home for the projects.\n\n-----\n\n[![CircleCI](https://circleci.com/gh/sylabs/wlm-operator.svg?style=svg\u0026circle-token=7222176bc78c1ddf7ea4ea615d2e568334e7ec0a)](https://circleci.com/gh/sylabs/wlm-operator)\n\n**WLM operator** is a Kubernetes operator implementation, capable of submitting and\nmonitoring WLM jobs, while using all of Kubernetes features, such as smart scheduling and volumes.\n\nWLM operator connects Kubernetes node with a whole WLM cluster, which enables multi-cluster scheduling.\nIn other words, Kubernetes integrates with WLM as one to many.\n\nEach WLM partition(queue) is represented as a dedicated virtual node in Kubernetes. WLM operator\ncan automatically discover WLM partition resources(CPUs, memory, nodes, wall-time) and propagates them\nto Kubernetes by labeling virtual node. Those node labels will be respected during Slurm job scheduling so that a\njob will appear only on a suitable partition with enough resources.\n\nRight now WLM-operator supports only SLURM clusters. But it's easy to add a support for another WLM. For it you need to implement a [GRPc server](https://github.com/sylabs/wlm-operator/blob/master/pkg/workload/api/workload.proto). You can use [current SLURM implementation](https://github.com/sylabs/wlm-operator/blob/master/internal/red-box/api/slurm.go) as a reference.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg style=\"width:100%;\" height=\"600\" src=\"./docs/integration.svg\"\u003e\n\u003c/p\u003e\n\n## Installation\n\nSince wlm-operator is now built with [go modules](https://github.com/golang/go/wiki/Modules)\nthere is no need to create standard [go workspace](https://golang.org/doc/code.html). If you still\nprefer keeping source code under `GOPATH` make sure `GO111MODULE` is set. \n\n### Prerequisites\n\n- Go 1.11+\n\n### Installation steps\n\nInstallation process is required to connect Kubernetes with Slurm cluster.\n\n*NOTE*: further described installation process for a single Slurm cluster,\nthe same steps should be performed for each cluster to be connected.\n\n1. Create a new Kubernetes node with [Singularity-CRI](https://github.com/sylabs/singularity-cri) on the\nSlurm login host. Make sure you set up NoSchedule taint so that no random pod will be scheduled there.\n\n2. Create a new dedicated user on the Slurm login host. All submitted Slurm jobs will be executed on behalf\nof that user. Make sure the user has execute permissions for the following Slurm binaries:`sbatch`,\n`scancel`, `sacct` and `scontol`.\n\n3. Clone the repo.\n```bash\ngit clone https://github.com/sylabs/wlm-operator\n```\n\n4. Build and start *red-box* – a gRPC proxy between Kubernetes and a Slurm cluster.\n```bash\ncd wlm-operator \u0026\u0026 make\n```\nUse dedicated user from step 2 to run red-box, e.g. set up `User` in systemd red-box.service.\nBy default red-box listens on `/var/run/syslurm/red-box.sock`, so you have to make sure the user has\nread and write permissions for `/var/run/syslurm`.\n\n5. Set up Slurm operator in Kubernetes.\n```bash\nkubectl apply -f deploy/crds/slurm_v1alpha1_slurmjob.yaml\nkubectl apply -f deploy/operator-rbac.yaml\nkubectl apply -f deploy/operator.yaml\n```\nThis will create new [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) that\nintroduces `SlurmJob` to Kubernetes. After that, Kubernetes controller for `SlurmJob` CRD is set up as a Deployment.\n\n6. Start up configurator that will bring up a virtual node for each partition in the Slurm cluster.\n```bash\nkubectl apply -f deploy/configurator.yaml\n```\n\nAfter all those steps Kubernetes cluster is ready to run SlurmJobs. \n\n## Usage\n\nThe most convenient way to submit them is using YAML files, take a look at [basic examples](/examples).\n\nWe will walk thought basic example how to submit jobs to Slurm in Vagrant.\n\n```yaml\napiVersion: wlm.sylabs.io/v1alpha1\nkind: SlurmJob\nmetadata:\n  name: cow\nspec:\n  batch: |\n    #!/bin/sh\n    #SBATCH --nodes=1\n    #SBATCH --output cow.out\n    srun singularity pull -U library://sylabsed/examples/lolcow\n    srun singularity run lolcow_latest.sif\n    srun rm lolcow_latest.sif\n  nodeSelector:\n    wlm.sylabs.io/containers: singularity\n  results:\n    from: cow.out\n    mount:\n      name: data\n      hostPath:\n        path: /home/job-results\n        type: DirectoryOrCreate\n```\n\nIn the example above we will run lolcow Singularity container in Slurm and collect the results \nto `/home/job-results` located on a k8s node where job has been scheduled. Generally, job results\ncan be collected to any supported [k8s volume](https://kubernetes.io/docs/concepts/storage/volumes/).\n\nSlurm job specification will be processed by operator and a dummy pod will be scheduled in order to transfer job\nspecification to a specific queue. That dummy pod will not have actual physical process under that hood, but instead \nits specification will be used to schedule slurm job directly on a connected cluster. To collect results another pod\nwill be created with UID and GID 1000 (default values), so you should make sure it has a write access to \na volume where you want to store the results (host directory `/home/job-results` in the example above).\nThe UID and GID are inherited from virtual kubelet that spawns the pod, and virtual kubelet inherits them\nfrom configurator (see `runAsUser` in [configurator.yaml](./deploy/configurator.yaml)).\n\nAfter that you can submit cow job:\n```bash\n$ kubectl apply -f examples/cow.yaml \nslurmjob.wlm.sylabs.io \"cow\" created\n\n$ kubectl get slurmjob\nNAME   AGE   STATUS\ncow    66s   Succeeded\n\n\n$ kubectl get pod\nNAME                             READY   STATUS         RESTARTS   AGE\ncow-job                          0/1     Job finished   0          17s\ncow-job-collect                  0/1     Completed      0          9s\n```\n\nValidate job results appeared on a node:\n```bash\n$ ls -la /home/job-results\ncow-job\n  \n$ ls /home/job-results/cow-job \ncow.out\n\n$ cat cow.out\nWARNING: No default remote in use, falling back to: https://library.sylabs.io\n _________________________________________\n/ It is right that he too should have his \\\n| little chronicle, his memories, his     |\n| reason, and be able to recognize the    |\n| good in the bad, the bad in the worst,  |\n| and so grow gently old all down the     |\n| unchanging days and die one day like    |\n| any other day, only shorter.            |\n|                                         |\n\\ -- Samuel Beckett, \"Malone Dies\"        /\n -----------------------------------------\n        \\   ^__^\n         \\  (oo)\\_______\n            (__)\\       )\\/\\\n                ||----w |\n                ||     ||\n```\n\n\n### Results collection\n\nSlurm operator supports result collection into [k8s volume](https://kubernetes.io/docs/concepts/storage/volumes/)\nso that a user won't need to have access Slurm cluster to analyze job results.\n\nHowever, some configuration is required for this feature to work. More specifically, results can be collected\nlocated on a login host only (i.e. where `red-box` is running), while Slurm job can be scheduled on an arbitrary\nSlurm worker node. This means that some kind of a shared storage among Slurm nodes should be configured so that despite\nof a Slurm worker node chosen to run a job, results will appear on a login host as well. \n_NOTE_: result collection is a network and IO consuming task, so collecting large files (e.g. 1Gb result of an\nML job) may not be a great idea.\n\nLet's walk through basic configuration steps. Further assumed that file _cow.out_ from example above\nis collected. This file can be found on a Slurm worker node that is executing a job.\nMore specifically, you'll find it in a folder, from which job was submitted (i.e. `red-box`'s working dir).\nConfiguration for other results file will differ in shared paths only:\n\n\t$RESULTS_DIR = red-box's working directory\n\nShare $RESULTS_DIR among all Slurm nodes, e.g set up nfs share for $RESULTS_DIR.\n\n\n## Configuring red-box\n\nBy default red-box performs automatic resources discovery for all partitions.\nHowever, it's possible to setup available resources for a partition manually with in the config file.\nThe following resources can be specified: `nodes`, `cpu_per_node`, `mem_per_node` and `wall_time`. \nAdditionally you can specify partition features there, e.g. available software or hardware. \nConfig path should be passed to red-box with the `--config` flag.\n\nConfig example:\n```yaml\npatition1:\n  nodes: 10\n  mem_per_node: 2048 # in MBs\n  cpu_per_node: 8\n  wall_time: 10h \npartition2:\n  nodes: 10\n  # mem, cpu and wall_time will be automatic discovered\npartition3:\n  additional_feautres:\n    - name: singularity\n      version: 3.2.0\n    - name: nvidia-gpu\n      version: 2080ti-cuda-7.0\n      quantity: 20\n```\n\n\n## Vagrant\n\nIf you want to try wlm-operator locally before updating your production cluster, use vagrant that will automatically\ninstall and configure all necessary software:\n\n```bash\ncd vagrant\nvagrant up \u0026\u0026 vagrant ssh k8s-master\n```\n_NOTE_: `vagrant up` may take about 15 minutes to start as k8s cluster will be installed from scratch.\n\nVagrant will spin up two VMs: a k8s master and a k8s worker node with Slurm installed.\nIf you wish to set up more workers, fell free to modify `N` parameter in [Vagrantfile](./vagrant/Vagrantfile).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylabs%2Fwlm-operator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsylabs%2Fwlm-operator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylabs%2Fwlm-operator/lists"}