Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hpc/Spindle

Scalable dynamic library and python loading in HPC environments
https://github.com/hpc/Spindle

performance radiuss

Last synced: 14 days ago
JSON representation

Scalable dynamic library and python loading in HPC environments

Host: GitHub
URL: https://github.com/hpc/Spindle
Owner: hpc
License: other
Created: 2013-05-21T18:08:33.000Z (over 11 years ago)
Default Branch: devel
Last Pushed: 2024-10-24T21:32:16.000Z (19 days ago)
Last Synced: 2024-10-26T09:13:58.079Z (18 days ago)
Topics: performance, radiuss
Language: Makefile
Size: 4.77 MB
Stars: 96
Watchers: 19
Forks: 23
Open Issues: 14
Metadata Files:
- Readme: README
- Changelog: CHANGELOG

Awesome Lists containing this project

README

        =============================================================================

== SPINDLE: Scalable Parallel Input Network for Dynamic Load Environments  ==

=============================================================================

Authors:    SPINDLE:              Matthew LeGendre (legendre1 at llnl dot gov)

                                  W.Frings 

            COBO:                 Adam Moody 

Version:    0.13 (Aug 2020)

Summary:

===========

Spindle is a tool for improving the performance of dynamic library

and python loading in HPC enviornments.

Documentation:

============

https://computing.llnl.gov/projects/spindle/software

Overview:

============

Using dynamically-linked libraries is common in most computational

environments, but they can cause serious problem when used on large

clusters and supercomputers.  Shared libraries are frequently stored

on shared file systems, such as NFS.  When thousands of processes

simultaneously start and attempt to search for and load libraries, it

resembles a denial-of-service attack against the shared file system.

This "attack" doesn't just slow down the application, but impacts

every user on the system.  We encountered cases where it took over ten

hours for a dynamically-linked MPI application running on 16K

processes to reach main.

Spindle presents a novel solution to this problem.  It transparently

runs alongside your distributed application and takes over its library

loading mechanism.  When processes start to load a new library,

Spindle intercepts the operation, designates one process to read the

file from the shared file system, then distributes the library's

contents to every process with a scalable broadcast operation.

Spindle is very scalable.  On a cluster at LLNL the Pynamic benchmark

(which measures library loading performance) was unable to scale much

past 100 nodes.  Even at that small scale it was causing significant

performance problems that were impacting everyone on the cluster.

When running Pynamic under Spindle, we were able to scale up to the

max job size at 1,280 nodes without showing any signs of file-system

stress or library-related slowdowns.

Unlike competing solutions, Spindle does not require any special

hardware, and libraries do not have to be staged into any special

locations.  Applications can work out-of-the-box do not need any

special compile or link flags.  Spindle is completely userspace and

does not require kernel patches or root privileges.

Spindle can trigger scalable loading of dlopened libraries, dependent

library, executables, python modules and specified application data

files.

Compilation:

============

Please see INSTALL file in the Spindle source tree.

Usage:

======

Put 'spindle' before your job launch command.  E.g:

  spindle mpirun -n 128 mpi_hello_world