{"id":26232667,"url":"https://github.com/pc2/cc-slurm-sync","last_synced_at":"2026-03-07T13:03:20.230Z","repository":{"id":136811737,"uuid":"528864450","full_name":"pc2/cc-slurm-sync","owner":"pc2","description":"PC2 implementation of a sync script between slurm and ClusterCockpit","archived":false,"fork":false,"pushed_at":"2025-08-28T12:21:28.000Z","size":20,"stargazers_count":0,"open_issues_count":1,"forks_count":6,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-08-28T19:43:25.464Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pc2.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-08-25T13:30:52.000Z","updated_at":"2025-08-28T12:21:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"18bfc944-246a-4a41-8812-9b5338e210e0","html_url":"https://github.com/pc2/cc-slurm-sync","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pc2/cc-slurm-sync","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pc2%2Fcc-slurm-sync","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pc2%2Fcc-slurm-sync/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pc2%2Fcc-slurm-sync/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pc2%2Fcc-slurm-sync/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pc2","download_url":"https://codeload.github.com/pc2/cc-slurm-sync/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pc2%2Fcc-slurm-sync/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30214618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T12:15:00.571Z","status":"ssl_error","status_checked_at":"2026-03-07T12:15:00.217Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-13T00:38:05.764Z","updated_at":"2026-03-07T13:03:20.214Z","avatar_url":"https://github.com/pc2.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DEPRECATED\n\nThis script was written some years ago with the aim to sync jobs from slurm\nto cluster cockpit. Meanwhile, a go based slurm sync tool \n[cc-slurm-adapter](https://github.com/ClusterCockpit/cc-slurm-adapter)\nhas been implemented. We have tested the tool, found it to be good, and will \ntherefore discontinue maintenance of this script.\n\n# Introduction\n\nThis script syncs the slurm jobs with the \n[cluster cockpit](https://github.com/ClusterCockpit/) backend. It uses the\nslurm command line tools to gather the relevant slurm infos and reads the\ncorresponding info from cluster cockpit via its api. After reading the data,\nit stops all jobs in cluster cockpit which are not running any more according\nto slurm and afterwards it creates all new running jobs in cluster cockpit.\n\nThe script has to run on the slurm controller node and needs permissions \nto run the slurm commands squeue and sacct and read data from the slurm\nstate save location.\n\n# Requirements / Branches\n\nDepending on your slurm version, you have different dependencies.\n\n## Running slurm 22.05 and later\n\nThe rest api and its component openapi is changing from version to version.\nTherefore, we decided to create one branch per slurm major versoin. This \nshould help to track the changes according to each slurm version. If you \nare running slurm version 23.11.02 you have to checkout the branch slurm-23-11.\nThe master branch will be the development branch for general functionalities.\nVersion specific changes will be merged to the corresponding version branch. At \nthis point in time, the main branch tracks the **24.11** Version of Slurm. \n\n## Running slurm \u003c 22.05\n\nIf you are runnning slurm 21.08 or something else lower than 22.05, you have to \nuse the commit with tag \"openapi_0.0.37\", because the data structure of the \njson output chaned between 21.08 and 22.05.\n\nThis script expects a certain data structure in the output of squeue. We have \nnoticed during development that `squeue --json` does not distinguish between \nindividual CPUs in the resources used and in the output the allocation of CPU 1 \nand 2 is considered to be the same. However, this may be different for shared \nnodes if multiple jobs request a certain set of resources.\n\nThe cause can be found in the code of the API interface and is based on the \nfact that the core ID is taken modulo the number of cores on a socket. \n\nThe included patch corrects this behavior. It is necessary to recompile slurm \nfor this. The patch is for openapi 0.0.37 but should work with other versions\nas well. \n\n# Getting started\n\nThe easiest way is to clone the Git repository. This way you always get the latest updates. \n\n    git clone https://github.com/pc2/cc-slurm-sync.git\n    cd cc-slurm-sync\n\n## Configuration file\nBefore you start, you have to create a configuration file. You can use \n`config.json.example` as a starting point. Simply copy or rename it to\n`config.json`.\n\n### Confiuration options\n**clustername**\nType in your clustername here. This value must be the same as the value used in cc-backend to identify the cluster. It might be different to the cluster name used in slurm.\n\n**slurm**\n* `squeue` Path to the squeue binary. Defaults to `/usr/bin/squeue`\n* `sacct` Path to the sacct binary. Defaults to `/usr/bin/sacct`\n* `scontrol` Path to the scontrol binary. Defaults to `/usr/bin/scontrol`\n* `state_save_location` Statesave location of slurm. This option has no default value and is **mandatory**.\n\n**cc-backend**\n* `host` The url of the cc-backend api. Must be a valid url excluding trailing `/api`. This option is **mandatory**.\n* `apikey` The JWT token to authenticate against cc-backend. This option is **mandatory**.\n\n**accelerators**\n\nThis part describes accelerators which might be used in jobs. The format is as follows:\n\n\t\"accelerators\" : {\n\t\t\"n2gpu\" : {\n\t\t\t\"0\": \"00000000:03:00.0\",\n\t\t\t\"1\": \"00000000:44:00.0\",\n\t\t\t\"2\": \"00000000:84:00.0\",\n\t\t\t\"3\": \"00000000:C4:00.0\"\n\t\t},\n\t\t\"n2dgx\" : {\n\t\t\t\"0\": \"00000000:07:00.0\",\n\t\t\t\"1\": \"00000000:0F:00.0\",\n\t\t\t\"2\": \"00000000:47:00.0\",\n\t\t\t\"3\": \"00000000:4E:00.0\",\n\t\t\t\"4\": \"00000000:87:00.0\",\n\t\t\t\"5\": \"00000000:90:00.0\",\n\t\t\t\"6\": \"00000000:B7:00.0\",\n\t\t\t\"7\": \"00000000:BD:00.0\"\n\t\t}\n\t},\n\nThe first level (`n2gpu`) describes the prefix of the host names in which corresponding accelerators are installed. The second level describes the ID in Slurm followed by the device id.\n\nHow to get this data? It depends on the accelerators. The following example is for a host with four NVidia A100 GPUs. This should be similar on all hosts with NVidia GPUs:\n\n    # nvidia-smi \n    Thu Aug 25 14:50:05 2022       \n    +-----------------------------------------------------------------------------+\n    | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |\n    |-------------------------------+----------------------+----------------------+\n    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n    |                               |                      |               MIG M. |\n    |===============================+======================+======================|\n    |   0  NVIDIA A100-SXM...  On   | 00000000:03:00.0 Off |                    0 |\n    | N/A   58C    P0   267W / 400W |   7040MiB / 40960MiB |     88%      Default |\n    |                               |                      |             Disabled |\n    +-------------------------------+----------------------+----------------------+\n    |   1  NVIDIA A100-SXM...  On   | 00000000:44:00.0 Off |                    0 |\n    | N/A   59C    P0   337W / 400W |   7040MiB / 40960MiB |     96%      Default |\n    |                               |                      |             Disabled |\n    +-------------------------------+----------------------+----------------------+\n    |   2  NVIDIA A100-SXM...  On   | 00000000:84:00.0 Off |                    0 |\n    | N/A   57C    P0   266W / 400W |   7358MiB / 40960MiB |     89%      Default |\n    |                               |                      |             Disabled |\n    +-------------------------------+----------------------+----------------------+\n    |   3  NVIDIA A100-SXM...  On   | 00000000:C4:00.0 Off |                    0 |\n    | N/A   56C    P0   271W / 400W |   7358MiB / 40960MiB |     89%      Default |\n    |                               |                      |             Disabled |\n    +-------------------------------+----------------------+----------------------+\n\nYou will find the four GPUs identified by ids starting at 0. In the second coloum, you can find the Bus-ID or identifier of the GPU. These are the values which have to be defined in the code example above. The mechanism in the background assumes that all nodes starting with this prefix have the same configuration and assignment of ID to bus ID. So if you have another configuration, you have to start a new prefix, only matching the hosts with this configuration.\n\n**node_regex**\n\nThis option is unique to every cluster system. This regex describes the sytax of the hostnames which are used as computing resources in jobs. \\ have to be escaped\n\nExample: `^(n2(lcn|cn|fpga|gpu)[\\\\d{2,4}\\\\,\\\\-\\\\[\\\\]]+)+$`\n\n## Running the script\n\nSimply run `slurm-clusercockpit-sync.py` inside the same directory which contains the config.json file. A brief help is also available:\n\n* `-c, --config` You can use a different config file for testing or other purposes. Otherwise it would use config.json in the actual directory.\n* `-j, --jobid` In a test setup it might be useful to sync individual job ids instead of syncing all jobs.\n* `-l, --limit` Synchronize only this number of jobs in the respective direction. Stopping a job might take some short time. If a massive amount of jobs have to get stopped, the script might run a long time and miss new starting jobs if they start end end within the execution time of the script. \n* `--direction` Mostly a debug option. Only synchronize starting or stopping jobs. The default is both directions.\n\nThe script terminates after synchronization of all jobs. \n\n# Getting help\n\nThis script is to be seen as an example implementation and may have to be adapted for other installations. I tried to keep the script as general as possible and to catch some differences between clusters already. If adjustments are necessary, I am happy about pull requests or notification about that on other ways to get an implementation that runs on as many systems as possible without adjustments in the long run.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpc2%2Fcc-slurm-sync","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpc2%2Fcc-slurm-sync","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpc2%2Fcc-slurm-sync/lists"}