https://github.com/JuliaParallel/SlurmClusterManager.jl
Julia package for running code on Slurm clusters
https://github.com/JuliaParallel/SlurmClusterManager.jl
distributed-computing julia slurm
Last synced: about 2 months ago
JSON representation
Julia package for running code on Slurm clusters
- Host: GitHub
- URL: https://github.com/JuliaParallel/SlurmClusterManager.jl
- Owner: JuliaParallel
- Created: 2020-05-20T23:29:12.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2025-03-24T01:15:20.000Z (2 months ago)
- Last Synced: 2025-03-24T01:32:28.328Z (2 months ago)
- Topics: distributed-computing, julia, slurm
- Language: Julia
- Homepage:
- Size: 67.4 KB
- Stars: 55
- Watchers: 3
- Forks: 7
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SlurmClusterManager.jl

This package provides support for using Julia within the Slurm cluster environment.
The code is adapted from [ClusterManagers.jl](https://github.com/JuliaParallel/ClusterManagers.jl) with some modifications.## Usage
This script uses all resources from a Slurm allocation as julia workers and prints the id and hostname on each one.
```jl
#!/usr/bin/env juliausing Distributed, SlurmClusterManager
addprocs(SlurmManager())
@everywhere println("hello from $(myid()):$(gethostname())")
```If the code is saved in `script.jl` it can be queued and executed on two nodes using 64 workers per node by running
```
sbatch -N 2 --ntasks-per-node=64 script.jl
```## Differences from `ClusterManagers.jl`
* Only supports Slurm (see this [issue](https://github.com/JuliaParallel/ClusterManagers.jl/issues/58) for some background).
* Requires that `SlurmManager` be created inside a Slurm allocation created by sbatch/salloc.
Specifically `SLURM_JOBID` and `SLURM_NTASKS` must be defined in order to construct `SlurmManager`.
This matches typical HPC workflows where resources are requested using sbatch and then used by the application code.
In contrast `ClusterManagers.jl` will *dynamically* request resources when run outside of an existing Slurm allocation.
I found that this was basically never what I wanted since this leaves the manager process running on a login node,
and makes the script wait until resources are granted which is better handled by the actual Slurm queueing system.
* Does not take any Slurm arguments. All Slurm arguments are inherited from the external Slurm allocation created by sbatch/salloc.
* Output from workers is redirected to the manager process instead of requiring a separate output file for every task.