Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/giovtorres/slurm-docker-cluster
A Slurm cluster using docker-compose
https://github.com/giovtorres/slurm-docker-cluster
docker-compose hpc slurm slurm-cluster
Last synced: about 2 months ago
JSON representation
A Slurm cluster using docker-compose
- Host: GitHub
- URL: https://github.com/giovtorres/slurm-docker-cluster
- Owner: giovtorres
- License: mit
- Created: 2017-09-11T19:21:32.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-06-11T22:52:58.000Z (3 months ago)
- Last Synced: 2024-06-12T06:22:47.987Z (3 months ago)
- Topics: docker-compose, hpc, slurm, slurm-cluster
- Language: Dockerfile
- Homepage:
- Size: 24.4 KB
- Stars: 264
- Watchers: 13
- Forks: 160
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-high-performance-computing - slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing. (Software / Trends)
README
# Slurm Docker Cluster
This is a multi-container Slurm cluster using docker-compose. The compose file
creates named volumes for persistent storage of MySQL data files as well as
Slurm state and log directories.## Containers and Volumes
The compose file will run the following containers:
* mysql
* slurmdbd
* slurmctld
* c1 (slurmd)
* c2 (slurmd)The compose file will create the following named volumes:
* etc_munge ( -> /etc/munge )
* etc_slurm ( -> /etc/slurm )
* slurm_jobdir ( -> /data )
* var_lib_mysql ( -> /var/lib/mysql )
* var_log_slurm ( -> /var/log/slurm )## Building the Docker Image
Build the image locally:
```console
docker build -t slurm-docker-cluster:21.08.6 .
```Build a different version of Slurm using Docker build args and the Slurm Git
tag:```console
docker build --build-arg SLURM_TAG="slurm-19-05-2-1" -t slurm-docker-cluster:19.05.2 .
```Or equivalently using `docker-compose`:
```console
SLURM_TAG=slurm-19-05-2-1 IMAGE_TAG=19.05.2 docker-compose build
```## Starting the Cluster
Run `docker-compose` to instantiate the cluster:
```console
IMAGE_TAG=19.05.2 docker-compose up -d
```## Register the Cluster with SlurmDBD
To register the cluster to the slurmdbd daemon, run the `register_cluster.sh`
script:```console
./register_cluster.sh
```> Note: You may have to wait a few seconds for the cluster daemons to become
> ready before registering the cluster. Otherwise, you may get an error such
> as **sacctmgr: error: Problem talking to the database: Connection refused**.
>
> You can check the status of the cluster by viewing the logs: `docker-compose
> logs -f`## Accessing the Cluster
Use `docker exec` to run a bash shell on the controller container:
```console
docker exec -it slurmctld bash
```From the shell, execute slurm commands, for example:
```console
[root@slurmctld /]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 5-00:00:00 2 idle c[1-2]
```## Submitting Jobs
The `slurm_jobdir` named volume is mounted on each Slurm container as `/data`.
Therefore, in order to see job output files while on the controller, change to
the `/data` directory when on the **slurmctld** container and then submit a job:```console
[root@slurmctld /]# cd /data/
[root@slurmctld data]# sbatch --wrap="hostname"
Submitted batch job 2
[root@slurmctld data]# ls
slurm-2.out
[root@slurmctld data]# cat slurm-2.out
c1
```## Stopping and Restarting the Cluster
```console
docker-compose stop
docker-compose start
```## Deleting the Cluster
To remove all containers and volumes, run:
```console
docker-compose stop
docker-compose rm -f
docker volume rm slurm-docker-cluster_etc_munge slurm-docker-cluster_etc_slurm slurm-docker-cluster_slurm_jobdir slurm-docker-cluster_var_lib_mysql slurm-docker-cluster_var_log_slurm
```
## Updating the ClusterIf you want to change the `slurm.conf` or `slurmdbd.conf` file without a rebuilding you can do so by calling
```console
./update_slurmfiles.sh slurm.conf slurmdbd.conf
```
(or just one of the files).
The Cluster will automatically be restarted afterwards with
```console
docker-compose restart
```
This might come in handy if you add or remove a node to your cluster or want to test a new setting.