{"id":36688411,"url":"https://github.com/converged-computing/flux-distribute","last_synced_at":"2026-01-12T11:17:01.864Z","repository":{"id":259910280,"uuid":"879799844","full_name":"converged-computing/flux-distribute","owner":"converged-computing","description":"Install flux directly to kubelets to (TBA) help the kubelet, pulling containers, the sky is the limit!","archived":false,"fork":false,"pushed_at":"2024-12-29T17:42:01.000Z","size":7972,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-10T05:37:02.158Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/converged-computing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-28T15:11:40.000Z","updated_at":"2024-12-29T17:42:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"b71a7e19-b374-4d13-bb8f-2c65a7b93adb","html_url":"https://github.com/converged-computing/flux-distribute","commit_stats":null,"previous_names":["converged-computing/flux-distribute"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/converged-computing/flux-distribute","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-distribute","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-distribute/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-distribute/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-distribute/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/converged-computing","download_url":"https://codeload.github.com/converged-computing/flux-distribute/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/converged-computing%2Fflux-distribute/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28338970,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-12T10:58:46.209Z","status":"ssl_error","status_checked_at":"2026-01-12T10:58:42.742Z","response_time":98,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-12T11:17:01.776Z","updated_at":"2026-01-12T11:17:01.851Z","avatar_url":"https://github.com/converged-computing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flux Distribute\n\nThis is an experiment to install Flux to Kubernetes nodes, where we could try:\n\n- using it as a mechanism to distribute content (e.g., containers with some cool tree algorithms)\n- orchestrate some part of the kubelet logic\n- interact with a custom scheduler\n\nWe will first start by having a daemonset that installs it, and if that is successful, we can next try to use flux archive and flux exec to share content. If that is successful, then we can try to implement something else. FOr example, a tool in Go could distributes containers across the nodes, likely by doing a pull first to one node, and then having the distribution done by flux.\n\n## Usage\n\nTo install to your cluster, you should create it first! There is an [example](example) provided using kind:\n\n```bash\nkind create cluster --config ./example/kind-config.yaml\n```\n\nThen install the daemonset. \n\n```bash\nkubectl apply -f ./daemonset-installer.yaml\n```\n\nYou can then look at the first node in your set (the lead broker) to see all workers join:\n\n```console\n/opt/conda/bin/flux proxy local:///var/run/flux/local bash\n🌀 flux broker --config-path /opt/conda/etc/flux/system/conf.d/broker.toml -Scron.directory=/opt/conda/etc/flux/system/cron.d -Stbon.fanout=256 -Srundir=/var/run/flux -Sbroker.rc2_none -Sstatedir=/opt/conda/etc/flux/system -Slocal-uri=local:///var/run/flux/local -Slog-stderr-level=6 -Slog-stderr-mode=local\nbroker.info[0]: start: none-\u003ejoin 0.359796ms\nbroker.info[0]: parent-none: join-\u003einit 0.009167ms\nkvs.info[0]: restored KVS from checkpoint on 2024-10-28T15:01:55Z\ncron.info[0]: synchronizing cron tasks to event heartbeat.pulse\njob-manager.info[0]: restart: 0 jobs\njob-manager.info[0]: restart: 0 running jobs\njob-manager.info[0]: restart: checkpoint.job-manager not found\nbroker.info[0]: rc1.0: running /opt/conda/etc/flux/rc1.d/01-sched-fluxion\nsched-fluxion-resource.info[0]: version 0.38.0\nsched-fluxion-resource.warning[0]: create_reader: allowlist unsupported\nsched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire\nsched-fluxion-qmanager.info[0]: version 0.38.0\nbroker.info[0]: rc1.0: running /opt/conda/etc/flux/rc1.d/02-cron\nbroker.info[0]: rc1.0: /opt/conda/etc/flux/rc1 Exited (rc=0) 0.6s\nbroker.info[0]: rc1-success: init-\u003equorum 0.554957s\nbroker.info[0]: online: kind-worker (ranks 0)\nbroker.info[0]: online: kind-worker,kind-worker[2-4] (ranks 0-3)\nbroker.info[0]: quorum-full: quorum-\u003erun 1.46333s\n```\n\nOr an individual worker will show similar:\n\n```console\n🌀 flux broker --config-path /opt/conda/etc/flux/system/conf.d/broker.toml -Scron.directory=/opt/conda/etc/flux/system/cron.d -Stbon.fanout=256 -Srundir=/var/run/flux -Sbroker.rc2_none -Sstatedir=/opt/conda/etc/flux/system -Slocal-uri=local:///var/run/flux/local -Slog-stderr-level=6 -Slog-stderr-mode=local\nbroker.info[3]: start: none-\u003ejoin 0.312617ms\nbroker.info[3]: parent-ready: join-\u003einit 1.68782s\nbroker.info[3]: configuration updated\nbroker.info[3]: rc1.0: running /opt/conda/etc/flux/rc1.d/01-sched-fluxion\nbroker.info[3]: rc1.0: running /opt/conda/etc/flux/rc1.d/02-cron\nbroker.info[3]: rc1.0: /opt/conda/etc/flux/rc1 Exited (rc=0) 0.3s\nbroker.info[3]: rc1-success: init-\u003equorum 0.303304s\nbroker.info[3]: quorum-full: quorum-\u003erun 1.16005s\n```\n\nWe can't use systemd because the conda packages don't support it, but that should be OK for now.\n\n## How does this work?\n\n1. We use a daemonset and nsenter to enter the init process of the node\n2. We install flux core and sched from conda forge\n3. The rbac / roles and service account given to the daemonset give it permission to list node\n4. With the node addresses, we can prepare a broker configuration\n5. The daemonset launches a script that installs and configures flux\n6. The brokers start on each node!\n\nThe scripts are built into the container, so if you need to update or change something, just do it there.\nNote that a systemd example install is included in [docker](docker) but we cannot use it yet because the conda installs don't support systemd.\n\n\n## Debugging\n\nI've added a script that makes it easy to shell in and debug. You can look at the logs to see the first in the host list - this is the lead broker. In the log above, it's `kind-worker`. Get the pod associated with it:\n\n```bash\n$ kubectl get pods -o wide\nNAME                 READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES\ninstall-flux-266zk   1/1     Running   0          4m43s   172.18.0.3   kind-worker4   \u003cnone\u003e           \u003cnone\u003e\ninstall-flux-2sqq6   1/1     Running   0          4m43s   172.18.0.5   kind-worker    \u003cnone\u003e           \u003cnone\u003e\ninstall-flux-7d5ps   1/1     Running   0          4m43s   172.18.0.4   kind-worker2   \u003cnone\u003e           \u003cnone\u003e\ninstall-flux-ql9w9   1/1     Running   0          4m43s   172.18.0.6   kind-worker3   \u003cnone\u003e           \u003cnone\u003e\n```\n\nYou can either shell into the associated daemonset pods and run nsenter:\n\n```bash\nkubectl exec -it install-flux-2sqq6 bash\nnsenter -t 1 -m bash\n```\n\nOr use the kubectl node-shell plugin (which does the same)\n\n```bash\nkubectl node-shell kind-worker\n```\n\nFlux lives with conda, and it's not on the path. But I will tell you it's in `/opt/conda/bin` and you can just run the script that the daemonset prepares to connect to the broker:\n\n```bash\n ./flux-connect.sh \nroot@kind-worker:/# flux resource list\n     STATE NNODES   NCORES    NGPUS NODELIST\n      free      4       32        0 kind-worker,kind-worker[2-4]\n allocated      0        0        0 \n      down      0        0        0 \n```\n\nBoum! Bing badda... boom! 💥\n\n## Limitations\n\nWe currently just install flux core and sched from conda, meaning the latest versions. An improvement would be to allow customization, or for these builds, to enable systemd support (not currently there for the conda binaries). We also don't do any kind of restart if nodes are added, meaning that autoscaling won't work. You would need to delete the daemonset and then recreate it, and likely we would want to do some kind of cleanup step.\n\n## Development\n\nTo build the base image (note that the Daemonset has image pull policy \"Always\" that requires a push, but you can load into kind or your cluster and then make the policy \"Never.\"). To build, just do:\n\n```bash\nmake build\n```\n\nBut to build and then apply the daemonset:\n\n```bash\nmake test\n```\n\nAnd then see flux running! This is from the node, actually:\n\n```bash\nkubectl logs install-flux-xxxx -f\n```\n\n## Experiments\n\n - [topology](topology): testing bringing up Flux miniclusters with different topologies, the idea being we can eventually extend to a distribution strategy (and test how the topology influences distribution performance).\n\n\n## License\n\nHPCIC DevTools is distributed under the terms of the MIT license.\nAll new contributions must be made under this license.\n\nSee [LICENSE](https://github.com/converged-computing/cloud-select/blob/main/LICENSE),\n[COPYRIGHT](https://github.com/converged-computing/cloud-select/blob/main/COPYRIGHT), and\n[NOTICE](https://github.com/converged-computing/cloud-select/blob/main/NOTICE) for details.\n\nSPDX-License-Identifier: (MIT)\n\nLLNL-CODE- 842614\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconverged-computing%2Fflux-distribute","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconverged-computing%2Fflux-distribute","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconverged-computing%2Fflux-distribute/lists"}