Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/natefoo/slurm-drmaa-old

Old working repository for my slurm-drmaa fork, the new one can be found at:
https://github.com/natefoo/slurm-drmaa-old

Last synced: 14 days ago
JSON representation

Old working repository for my slurm-drmaa fork, the new one can be found at:

Awesome Lists containing this project

README

        

Documentation
=============

The manual for the DRMAA for Simple Linux Utility for Resource Management
(SLURM) is available at project wiki:
http://apps.man.poznan.pl/trac/slurm-drmaa/wiki

Branches
========

- [master](../../tree/master) - slurm-drmaa 1.0.7 (from source tarball, not
SVN) with mostly-complete Slurm multicluster `-M`/`--clusters` support (it is
possible to specify 1 cluster with this option, but not multiple, see
**Limitations** below).
- [slurm-drmaa-1.2.0](../../tree/slurm-drmaa-1.2.0) - slurm-drmaa 1.2.0 from
SVN, no modifications
- [slurm-drmaa-1.2.0-clusters](../../tree/slurm-drmaa-1.2.0-clusters) -
slurm-drmaa 1.2.0 with the same cluster modifications as in the master
branch.
- [slurm-drmaa-1.2.0-multisubmit](../../tree/slurm-drmaa-1.2.0-multisubmit) -
slurm-drmaa 1.2.0 with full multicluster submission (i.e. `sbatch
--clusters=cluster1,cluster2`) functionality (See **Limitations** below)
- [slurm-drmaa-1.2.0-multisubmit-15.08](../../tree/slurm-drmaa-1.2.0-multisubmit-15.08) -
slurm-drmaa 1.2.0 with full multicluster submission (i.e. `sbatch
--clusters=cluster1,cluster2`) functionality, requires Slurm 15.08 or later.

TODO
====

- Get the features in the various branches enabled by libslurm version (at runtime?) and merged back to a single branch.
- Reimport the upstream repository from PSNC with full attributed SVN history

RPM
===

I added a `slurm-drmaa.spec` file in the master branch (but it can be used with any branch) for use with `rpmbuild` to create an RPM. My process (on CentOS 6) is:

```console
% ./autogen.sh
% make distclean
% find . -type d -name autom4te.cache -print0 | xargs -0 rm -rf
% ragel -o drmaa_utils/drmaa_utils/timedelta.c drmaa_utils/drmaa_utils/timedelta.rl
% cp slurm-drmaa.spec ~/rpmbuild/SPECS
% cd ..
% tar zcf slurm-drmaa-1.2.0.tar.gz --exclude=.git\* slurm-drmaa-1.2.0
% cp slurm-drmaa-1.2.0.tar.gz ~/rpmbuild/SOURCES
% rpmbuild -bb ~/rpmbuild/SPECS/slurm-drmaa.spec
```

Limitations
===========

libslurmdb incompatibility
--------------------------

Multicluster support will not work on a standard Slurm installation prior to
Slurm 14.11. slurm-drmaa needs access to Slurm's `working_cluster_rec` global
in `libslurmdb`, which was not extern/public in older versions. However, older
versions can be used if you compile a new `libslurmdb.so`. To do so, apply the
following patch to the Slurm source, reconfigure, and recompile:

```diff
--- src/db_api/Makefile.am.orig 2014-05-06 11:24:19.000000000 -0500
+++ src/db_api/Makefile.am 2014-10-10 11:48:12.730845279 -0500
@@ -95,6 +95,7 @@
(echo "{ global:"; \
echo " slurm_*;"; \
echo " slurmdb_*;"; \
+ echo " working_cluster_rec;"; \
echo " local: *;"; \
echo "};") > $(VERSION_SCRIPT)

--- src/db_api/Makefile.in.orig 2014-05-06 11:24:19.000000000 -0500
+++ src/db_api/Makefile.in 2014-10-10 11:48:22.765938016 -0500
@@ -915,6 +915,7 @@
(echo "{ global:"; \
echo " slurm_*;"; \
echo " slurmdb_*;"; \
+ echo " working_cluster_rec;"; \
echo " local: *;"; \
echo "};") > $(VERSION_SCRIPT)
```

In my instance I then copied the resulting `.libs/libslurdb*.so*` to
slurm-drmaa's lib directory and configured slurm-drmaa with
`LDFLAGS='-Wl,-rpath=/path/to/slurm-drmaa/lib' ./configure`, but putting
`libslurmdb` elsewhere and/or simply setting `$LD_LIBRARY_PATH` at application
runtime also work.

Note that root privileges are not required for this to work. The canonical copy
of libslurmdbd.so does not need to be modified, only the one that libdrmaa
links to at runtime.

Unimplemented features
----------------------

Multiple clusters specified in a single `-M`/`--clusters` option (e.g. `-M
cluster1,cluster2`) are not supported in the `master` and
`slurm-drmaa-1.2.0-clusters` branches.

However, support for it has been hacked in to the
`slurm-drmaa-1.2.0-multisubmit` branch. From the commit message:

This is fairly hacky because the functionality to do this properly is not
available via Slurm's public API. And:

1. I wasn't going to take the time to figure out the mess of private headers I
had to extract from the slurm source and include here, so a copy of the
Slurm source is required at compile time.
2. I have about 1 hours' worth of experience with autoconf/automake at this
point, so the changes I made there are crude.
3. I'm not going to put a lot of effort into making this nicer since the Slurm
development team has put implementing the features necessary for doing this
properly on their roadmap for Slurm 15.08:
http://bugs.schedmd.com/show_bug.cgi?id=1234

That said, once compiled, it will work with a standard Slurm 14.11 (or earlier
versions with the `working_cluster_rec` patch).

Proper support for multiple cluster submission with Slurm 15.08 and later is
available in the `slurm-drmaa-1.2.0-multicluster-15.08` branch.

Potential Job ID incompatibilites
---------------------------------

Because DRMAA does not provide a means for reporting back which cluster is
selected, I've chosen to modify the format of the Job ID returned by the submit
function. If the native specification does not contain `-M`/`--clusters`, Job
IDs are numeric as before. If `-M/--clusters` is passed, the Job ID is appended
with a `'.'`, followed by the cluster name to which the job was submitted (e.g.
`42.cluster1`). The functions which check job state and perform job controls
will also accept Job IDs in this format.