{"id":13314623,"url":"https://github.com/marcorentap/kokkos-docker-cluster","last_synced_at":"2025-03-10T21:30:37.505Z","repository":{"id":222986444,"uuid":"758912224","full_name":"marcorentap/kokkos-docker-cluster","owner":"marcorentap","description":"Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.","archived":false,"fork":false,"pushed_at":"2024-03-07T05:38:44.000Z","size":31,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-07-29T19:07:43.130Z","etag":null,"topics":["cuda","docker","hpc","kokkos"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marcorentap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-17T12:53:20.000Z","updated_at":"2024-07-29T19:07:43.131Z","dependencies_parsed_at":"2024-02-26T04:36:15.896Z","dependency_job_id":null,"html_url":"https://github.com/marcorentap/kokkos-docker-cluster","commit_stats":null,"previous_names":["marcorentap/kokkos-docker-cluster"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcorentap%2Fkokkos-docker-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcorentap%2Fkokkos-docker-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcorentap%2Fkokkos-docker-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcorentap%2Fkokkos-docker-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marcorentap","download_url":"https://codeload.github.com/marcorentap/kokkos-docker-cluster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242929973,"owners_count":20208379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","docker","hpc","kokkos"],"created_at":"2024-07-29T18:11:50.357Z","updated_at":"2025-03-10T21:30:37.495Z","avatar_url":"https://github.com/marcorentap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prerequisites\nEnsure you have [HPCCM](https://github.com/NVIDIA/hpc-container-maker) installed.\n\n\n## CUDA\nIf you need CUDA, install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html).\n\nThen modify `/etc/docker/daemon.json` to set NVIDIA runtime as the default:\n\n```\n{\n    \"runtimes\": {\n        \"nvidia\": {\n            \"args\": [],\n            \"path\": \"/usr/bin/nvidia-container-runtime\"\n        }\n    },\n    \"default-runtime\" :  \"nvidia\"\n}\n```\nThen restart Docker:\n```\nsudo systemctl restart docker\n```\n\nEnsure you have GPU access with the default runtime:\n```\ndocker run -it nvcr.io/nvidia/cuda:12.3.1-devel-ubuntu22.04 nvidia-smi\n```\n\n---\n# Usage\n\n## Building Images\n\nDockerfiles are generated using HPCCM from recipes located in `recipes/`. These images are based on `nvcr.io/nvidia/cuda:12.3.1-devel-ubuntu22.04`.\n\nThere are two images: `kokkos-compute` and `kokkos-sherlock`, both sharing SSH keys found in `ssh/`. To specify the architecture for building `kokkos`, set the environment variable `KOKKOS_CLUSTER_ARCH=\u003cTARGET_ARCH\u003e`. The scripts will then use the `Kokkos_ARCH_\u003cTARGET_ARCH\u003e` compile flag for building `Kokkos`. For example, to generate SSH keys and build the images for `Kokkos_ARCH_VOLTA70`, execute:\n```\nexport KOKKOS_CLUSTER_ARCH=VOLTA70\n./make_keys \u0026\u0026 ./make_compute.sh \u0026\u0026 ./make_sherlock.sh\n```\n\nBoth images contain users `root` with password `kokkosroot` and `compute` with password `kokkoscompute`.\n\n## Starting Containers\n\n`kokkos-compute` containers are intended to run continuously in the background, while `kokkos-sherlock` containers can be started as needed to launch jobs. Additionally, `shared/` is mounted to `/shared` in both images.\n\nStart by building the `kokkos-overlay` network:\n```\n# Initialize Docker swarm if not already done\ndocker swarm init\n./make_network.sh\n```\n\nThe file `compose.yaml` is configured to launch 100 `kokkos-compute` containers. To deploy it, run:\n```\ndocker stack deploy --compose-file=compose.yaml kokkos\n```\nEnsure that all 100 containers have started with `docker service ls`. Then, start a `kokkos-sherlock` container using:\n```\n./start_sherlock.sh\n```\n\n## Try mpirun\n`/shared/hostfile` contains an `mpirun` hostfile with 100 hosts and 4 slots each.\n`/shared/hello.sh` is a test program to print hostnames:\n```\nmpirun --np 400 --hostfile /shared/hostfile /shared/hello.sh\n```\n\n## Scaling\nIf 100 containers are insufficient and you require 150 containers, execute:\n```\ndocker service scale kokkos_compute=150\n./gen_hostfile.py 4 \u003e shared/hostfile\n```\n`./gen_hostfile \u003cMPI_SLOTS\u003e` generates a hostfile based on currently running `kokkos-compute` containers.\n\n## Stopping\nTo stop the setup, execute:\n```\ndocker stack rm kokkos\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcorentap%2Fkokkos-docker-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarcorentap%2Fkokkos-docker-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcorentap%2Fkokkos-docker-cluster/lists"}