{"id":36688059,"url":"https://github.com/converged-computing/flux-jobset","last_synced_at":"2026-01-12T11:16:39.048Z","repository":{"id":260398802,"uuid":"881004518","full_name":"converged-computing/flux-jobset","owner":"converged-computing","description":"HPC workloads using Flux and Kubernetes JobSet and eventually Kubeflow Training (under development)","archived":false,"fork":false,"pushed_at":"2024-11-01T04:56:16.000Z","size":14,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-10T14:50:23.537Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/converged-computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-30T18:46:28.000Z","updated_at":"2024-11-01T04:56:20.000Z","dependencies_parsed_at":"2024-10-31T05:16:39.345Z","dependency_job_id":"35122a0c-b8ea-474f-8953-a7363ea2cd90","html_url":"https://github.com/converged-computing/flux-jobset","commit_stats":null,"previous_names":["converged-computing/flux-jobset"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/converged-computing/flux-jobset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-jobset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-jobset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-jobset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-jobset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/converged-computing","download_url":"https://codeload.github.com/converged-computing/flux-jobset/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-jobset/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28338970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T10:58:46.209Z","status":"ssl_error","status_checked_at":"2026-01-12T10:58:42.742Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-12T11:16:38.759Z","updated_at":"2026-01-12T11:16:39.036Z","avatar_url":"https://github.com/converged-computing.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flux JobSet\n\nThis is a small experiment to package HPC applications in JobSet. Since we don't want to require exposing ssh from the containers, we are going to use flux. This requires a little extra prep, but it's fairly easy and worth it! For any of the applications in this repository, here are the setup and usage instructions.\n\n\n## Usage\n\n### 1. Create a Curve Certificate\n\nYou'll need to create a curve certificate. While we could reuse the same one, this is better to re-generate. The [Flux Operator](https://github.com/flux-framework/flux-operator) does this for you, and here we will use a container that uses flux.\n\n```bash\n# Generate a curve.cert in the present working directory\ndocker run -it --user root -v $PWD/:/home/fluxuser fluxrm/flux-sched:jammy flux keygen /home/fluxuser/curve.cert\n\n# Create a curve.cert config map\nkubectl create configmap curve-cert --from-file=curve-cert=curve.cert\n```\n\n### Create a workload\n\nEach workload works by way of installing and configuring flux, and then running your application using it. The alternative is to use MPI, and (personally speaking) I find flux much better to orchestrate, etc. We first need to install JobSet:\n\n```bash\nkubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.0/manifests.yaml\n```\n\nTry creating an interactive flux cluster:\n\n```bash\nkubectl apply -f flux/job.yaml\n```\nYou'll have an interactive flux cluster!\n\n```bash\nkubectl logs flux-worker-0-0-xxx\n```\n```console\n...\n/opt/conda/bin/flux proxy local:///var/run/flux/local bash\n🌀 flux broker --config-path /opt/conda/etc/flux/system/conf.d/broker.toml -Scron.directory=/opt/conda/etc/flux/system/cron.d -Stbon.fanout=256 -Srundir=/var/run/flux -Sbroker.rc2_none -Sstatedir=/opt/conda/etc/flux/system -Slocal-uri=local:///var/run/flux/local -Slog-stderr-level=6 -Slog-stderr-mode=local\nThis is flux-worker-0-0\n# Kubernetes-managed hosts file.\n127.0.0.1\tlocalhost\n::1\tlocalhost ip6-localhost ip6-loopback\nfe00::0\tip6-localnet\nfe00::0\tip6-mcastprefix\nfe00::1\tip6-allnodes\nfe00::2\tip6-allrouters\n10.244.3.16\tflux-worker-0-0.m.default.svc.cluster.local\tflux-worker-0-0\nbroker.info[0]: start: none-\u003ejoin 0.310499ms\nbroker.info[0]: parent-none: join-\u003einit 0.008727ms\ncron.info[0]: synchronizing cron tasks to event heartbeat.pulse\njob-manager.info[0]: restart: 0 jobs\njob-manager.info[0]: restart: 0 running jobs\njob-manager.info[0]: restart: checkpoint.job-manager not found\nbroker.info[0]: rc1.0: running /opt/conda/etc/flux/rc1.d/01-sched-fluxion\nsched-fluxion-resource.info[0]: version 0.38.0\nsched-fluxion-resource.warning[0]: create_reader: allowlist unsupported\nsched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire\nsched-fluxion-qmanager.info[0]: version 0.38.0\nbroker.info[0]: rc1.0: running /opt/conda/etc/flux/rc1.d/02-cron\nbroker.info[0]: rc1.0: /opt/conda/etc/flux/rc1 Exited (rc=0) 0.4s\nbroker.info[0]: rc1-success: init-\u003equorum 0.432494s\nbroker.info[0]: online: flux-worker-0-0 (ranks 0)\nbroker.info[0]: online: flux-worker-0-[0-3] (ranks 0-3)\nbroker.info[0]: quorum-full: quorum-\u003erun 0.591215s\n```\n\nThat is done with just a Python base container! Note that if you want to interactively connect to a broker, a script\nis provided:\n\n```bash\n/flux-connect.sh\n# try \"flux resource list\"\n```\n\nYou can also run an app, which will complete:\n\n```bash\nkubectl apply -f lammps/job.yaml\n```\n```console\nSetting up Verlet run ...\n  Unit style    : real\n  Current step  : 0\n  Time step     : 0.1\nPer MPI rank memory allocation (min/avg/max) = 215.0 | 215.0 | 215.0 Mbytes\nStep Temp PotEng Press E_vdwl E_coul Volume \n       0          300   -113.27833    437.52122   -111.57687   -1.7014647    27418.867 \n      10    299.38517   -113.27631    1439.2857   -111.57492   -1.7013813    27418.867 \n      20    300.27107   -113.27884    3764.3739   -111.57762   -1.7012246    27418.867 \n      30    302.21063   -113.28428    7007.6914   -111.58335   -1.7009363    27418.867 \n      40    303.52265   -113.28799      9844.84   -111.58747   -1.7005186    27418.867 \n      50    301.87059   -113.28324    9663.0443   -111.58318   -1.7000524    27418.867 \n      60    296.67807   -113.26777    7273.7928   -111.56815   -1.6996137    27418.867 \n      70    292.19999   -113.25435    5533.6428   -111.55514   -1.6992157    27418.867 \n      80    293.58677   -113.25831    5993.4151   -111.55946   -1.6988533    27418.867 \n      90    300.62636   -113.27925    7202.8651   -111.58069   -1.6985591    27418.867 \n     100    305.38276   -113.29357    10085.748   -111.59518   -1.6983875    27418.867 \nLoop time of 17.2465 on 1 procs for 100 steps with 2432 atoms\n\nPerformance: 0.050 ns/day, 479.070 hours/ns, 5.798 timesteps/s\n99.9% CPU use with 1 MPI tasks x 1 OpenMP threads\n\nMPI task timing breakdown:\nSection |  min time  |  avg time  |  max time  |%varavg| %total\n---------------------------------------------------------------\nPair    | 12.723     | 12.723     | 12.723     |   0.0 | 73.77\nNeigh   | 0.21998    | 0.21998    | 0.21998    |   0.0 |  1.28\nComm    | 0.0074962  | 0.0074962  | 0.0074962  |   0.0 |  0.04\nOutput  | 0.00055908 | 0.00055908 | 0.00055908 |   0.0 |  0.00\nModify  | 4.2943     | 4.2943     | 4.2943     |   0.0 | 24.90\nOther   |            | 0.0008124  |            |       |  0.00\n\nNlocal:        2432.00 ave        2432 max        2432 min\nHistogram: 1 0 0 0 0 0 0 0 0 0\nNghost:        10685.0 ave       10685 max       10685 min\nHistogram: 1 0 0 0 0 0 0 0 0 0\nNeighs:        823958.0 ave      823958 max      823958 min\nHistogram: 1 0 0 0 0 0 0 0 0 0\n\nTotal # of neighbors = 823958\nAve neighs/atom = 338.79852\nNeighbor list builds = 5\nDangerous builds not checked\nTotal wall time: 0:00:17\nbroker.info[0]: rc2.0: /opt/conda/bin/flux submit --quiet --watch lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite Exited (rc=0) 18.3s\nbroker.info[0]: rc2-success: run-\u003ecleanup 18.2654s\nbroker.info[0]: cleanup.0: flux queue stop --quiet --all --nocheckpoint Exited (rc=0) 0.1s\nbroker.info[0]: cleanup.1: flux resource acquire-mute Exited (rc=0) 0.1s\nbroker.info[0]: cleanup.2: flux cancel --user=all --quiet --states RUN Exited (rc=0) 0.1s\nbroker.info[0]: cleanup.3: flux queue idle --quiet Exited (rc=0) 0.1s\nbroker.info[0]: cleanup-success: cleanup-\u003eshutdown 0.322195s\nbroker.info[0]: children-complete: shutdown-\u003efinalize 82.5735ms\nbroker.info[0]: rc3.0: running /opt/conda/etc/flux/rc3.d/01-sched-fluxion\nbroker.info[0]: rc3.0: /opt/conda/etc/flux/rc3 Exited (rc=0) 0.2s\nbroker.info[0]: rc3-success: finalize-\u003egoodbye 0.194591s\nbroker.info[0]: goodbye: goodbye-\u003eexit 0.030218ms\nReturn value for follower worker is 0\n🤓 Success! Cleaning up\n```\n```console\n$ kubectl get pods\nNAME                      READY   STATUS      RESTARTS   AGE\nlammps-worker-0-0-xgl82   0/1     Completed   0          102s\nlammps-worker-0-1-5hjmf   0/1     Completed   0          102s\nlammps-worker-0-2-6smhp   0/1     Completed   0          102s\nlammps-worker-0-3-c2bft   0/1     Completed   0          102s\n```\n\nMore HPC apps coming soon!\n\n## License\n\nHPCIC DevTools is distributed under the terms of the MIT license.\nAll new contributions must be made under this license.\n\nSee [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),\n[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and\n[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.\n\nSPDX-License-Identifier: (MIT)\n\nLLNL-CODE- 842614\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconverged-computing%2Fflux-jobset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconverged-computing%2Fflux-jobset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconverged-computing%2Fflux-jobset/lists"}