Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/dstdev/awesome-hpc

A collection of Awesome HPC software and tools
https://github.com/dstdev/awesome-hpc

List: awesome-hpc

awesome awesome-list

Last synced: about 1 month ago
JSON representation

A collection of Awesome HPC software and tools

Lists

README

        

# Awesome HPC [![Awesome](https://awesome.re/badge-flat.svg)](https://awesome.re)

High Performance Computing tools and resources for engineers and administrators.

[High Performance Computing (HPC)](https://en.wikipedia.org/wiki/Supercomputer) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

## Contents

(click to expand)

- [Provisioning](#provisioning)
- [Workload Managers](#workload-managers)
- [Pipelines](#pipelines)
- [Applications](#applications)
- [Compilers](#compilers)
- [MPI](#mpi)
- [Parallel Computing](#parallel-computing)
- [Benchmarking](#benchmarking)
- [Miscellaneous](#miscellaneous)
- [Performance](#performance)
- [Parallel Shells](#parallel-shells)
- [Containers](#containers)
- [Environment Management](#environment-management)
- [Visualization](#visualization)
- [Parallel Filesystems](#parallel-filesystems)
- [Programming Languages](#programming-languages)
- [Monitoring](#monitoring)
- [Journals](#journals)
- [Podcasts](#podcasts)
- [Blogs](#blogs)
- [Conferences](#conferences)
- [Websites](#websites)
- [User Groups](#user-groups)

## Provisioning
- [Grendel](https://grendel.readthedocs.io/) - Bare Metal Provisioning system for HPC Linux clusters ([Source Code](https://github.com`ubccr/grendel)) `GPL-3`.
- [XCat](https://xcat.org/) - xCAT is a toolkit for deployment and administration of clusters of all sizes ([Source Code](https://github.com/xcat2/xcat-core)) `EPL-1.0`.
- [Warewulf](https://warewulf.hpcng.org/) - Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems ([Source Code](https://github.com/hpcng/warewulf)) `BSD-3`.
- [Rocks](http://www.rocksclusters.org/) - A Linux distribution for developing Linux clusters `other`.
- [Cobbler](https://cobbler.github.io/) - Cobbler is a Linux installation server that allows for rapid setup of network installation environments ([Source Code](https://github.com/cobbler/cobbler)) `GPL-2.0`.
- [Base Command Manager](https://docs.nvidia.com/base-command-manager/index.html) - Base Command Manager allows administrator to quickly build and manage heterogeneous clusters `Proprietary`.
- [Scyld](https://www.penguinsolutions.com/computing/products/software/scyld-clusterware/) - Scyld Clusterware Scyld ClusterWare is developed based on the continuing evolution of Beowulf clusters first developed at NASA in the 1990s `Proprietary`.
- [BlueBanquise](https://bluebanquise.com) - BlueBanquise is an open source cluster deployment and management stack built on Python and Ansible ([Source Code](https://github.com/bluebanquise/bluebanquise)) `MIT`.

## Workload Managers
- [Slurm](https://slurm.schedmd.com/documentation.html) - A free and open source job scheduler ([Source Code](https://github.com/SchedMD/slurm)) `OSS`.
- [LSF](https://www.ibm.com/products/hpc-workload-management) - A job scheduler and workload management software developed by IBM `Proprietary`.
- [Moab](https://adaptivecomputing.com/moab-hpc-suite/) - Moab is a workload management and job scheduler `other`.
- [Torque](https://en.wikipedia.org/wiki/TORQUE) - Torque is a workload management and job scheduler `other`.
- [OpenLava](https://en.wikipedia.org/wiki/OpenLava) - OpenLava is a workload management and job scheduler `other`.
- [UGE/SGE](https://en.wikipedia.org/wiki/Univa_Grid_Engine) - Univa Grid Engine is a workload management engine for HPC `Proprietary`.
- [Volcano](https://volcano.sh/) - Volcano is a batch system built on Kubernetes `Apache-2.0`.
- [Maui](https://www.mhpcc.hpc.mil/) - Maui is a workload management and job scheduler `other`.
- [Kube Batch](https://github.com/kubernetes-sigs/kube-batch) - A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC `Apache-2.0`.
- [OpenPBS](https://www.openpbs.org/) - OpenPBS® software optimizes job scheduling and workload management in high-performance computing (HPC) environments ([Source Code](https://github.com/openpbs/openpbs)) `other`.

## Pipelines
- [Nextflow](https://nextflow.io) - Data drive computational pipelines `Apache-2.0`.
- [Cromwell](https://cromwell.readthedocs.io/en/stable/) - Scientific workflow engine designed for simplicity & scalability ([Source Code](https://github.com/broadinstitute/cromwell)) `BSD-3`.
- [Pegasus](https://pegasus.isi.edu/) - A configurable system for mapping and executing scientific workflows over a wide range of computational infrastructure ([Source Code](https://github.com/pegasus-isi/pegasus))`Apache-2.0`.

## Applications
- [Spack](https://spack.io) - A flexible package manager that supports multiple versions, configurations, platforms, and compilers ([Source Code](https://github.com/spack/spack)) `other`.
- [EasyBuild](https://easybuild.io/) - EasyBuild - building software with ease ([Source Code](https://github.com/easybuilders/easybuild)) `GPL-2`.

## Compilers
- [Nvidia](https://developer.nvidia.com/hpc-compilers) - NVIDIA HPC compiler suite for Fortran, C/C++ with OpenACC `Proprietary`.
- [Portland Group](https://www.pgroup.com/index.htm) - The Portland Group compilers were Fortran, C/C++ compilers now integrated into NVIDIA HPC SDK `Proprietary`.
- [Intel](https://software.intel.com/content/www/us/en/develop/tools/oneapi/all-toolkits.html#hpc-kit) - The Intel compiler suite offers many language compilers for use in the HPC space `Proprietary`.
- [Cray](https://bluewaters.ncsa.illinois.edu/cray-compiler) - A suite of compilers designed and optimized to target the AMD interlagos instruction set `Proprietary`.
- [GNU](https://gcc.gnu.org/) - The GNU Compiler Collection is a suite of compilers targeting many languages ([Source Code](https://gcc.gnu.org/git.html)) `GPL-3`.
- [LLVM](https://llvm.org/) - The LLVM project is a collection of modular compilers and toolchains ([Source Code](https://github.com/llvm/llvm-project)) `OSS`.

## MPI
- [OpenMPI](https://www.open-mpi.org/) - OpenMPI is an open source implementation of the MPI-3.1 standard ([Source Code](https://github.com/open-mpi/ompi)) `BSD`.
- [MPICH](https://www.mpich.org/) - MPICH is a high-performance and widely portable implementation of the MPI-3.1 standard ([Source Code](https://github.com/pmodels/mpich)) `other`.
- [MVAPICH](https://mvapich.cse.ohio-state.edu/) - MVAPICH is an open source implementation of the MPI-3.1 standard developed by Ohio State University `BSD`.
- [Intel-MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) - Intel-MPI is Intel's MPI-3.1 implementation included in their compiler suite `other`.

## Parallel Computing
- [ArrayFire](https://arrayfire.org/docs/index.htm) - A general purpose tensor library that simplifies the process of software development for parallel architectures `other`.
- [OpenMP](https://www.openmp.org/) - OpenMP is an application programming interface that supports multi-platform shared-memory multiprocessing programming `other`.

## Benchmarking
- [OSU Benchmarks](https://mvapich.cse.ohio-state.edu/benchmarks/) - A collection of benchmarking tools for MPI developed by Ohio State University `other`.
- [Intel MPI Benchmarks](https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-benchmarks.html) - A set of benchmarks developed by Intel for use with their Intel MPI `other`.
- [HPCC Systems](https://hpccsystems.com/) - HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics ([Source Code](https://github.com/hpcc-systems/HPCC-Platform)) `other`.
- [LINPACK](https://www.netlib.org/linpack/) - LINPACK is a set of efficient fortran subroutines for solving linear systems which benchmarks are useful for HPC `other`.
- [IOzone](https://www.iozone.org/) - IOzone is a filesystem benchmark tool `OSS`.
- [IOR](https://www.vi4io.org/tools/benchmarks/ior) - Interleaved or Random is a useful benchmarking tool for testing parallel filesystems `other`.
- [MDtest](https://www.vi4io.org/tools/benchmarks/mdtest) - MDtest is an MPI-based application for evaluating the metadata performance of a file system `other`.
- [FIO](https://fio.readthedocs.io/en/latest/fio_doc.html) - Flexible I/O is an advanced disk benchmark that depends upon the kernel's AIO access library ([Source Code](https://git.kernel.dk/cgit/fio/)) `GPL-2`.
- [elbencho](https://github.com/breuner/elbencho) - A distributed storage benchmark for files, objects & blocks with support for GPUs `GPL-3`.

## Miscellaneous
- [OpenOnDemand](https://openondemand.org/) - Open OnDemand helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device ([Source Code](https://github.com/OSC/openondemand.org)) `MIT`.
- [Open XDMod](https://open.xdmod.org) - Open XDMoD is an open source tool to facilitate the management of high performance computing resources ([Source Code](https://github.com/ubccr/xdmod/)) `LGPL-3`.
- [Coldfront](https://coldfront.readthedocs.io/en/latest/) - ColdFront is an open source resource allocation system designed to provide a central portal for administration, reporting, and measuring scientific impact of HPC resources ([Source Code](https://github.com/ubccr/coldfront)) `GPL-3`.
- [Pavilion2](https://pavilion2.readthedocs.io/) - Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems ([Source Code](https://github.com/hpc/pavilion2)) `other`.
- [Reframe](https://reframe-hpc.readthedocs.io/en/stable/) - A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems. ([Source Code](https://github.com/reframe-hpc/reframe)) `BSD-3`.
- [OLCF Test Harness](https://olcf.github.io/olcf-test-harness/) - The OLCF Test Harness (OTH) helps automate the testing of applications, tools, and other system software ([Source Code](https://github.com/olcf/olcf-test-harness)) `other`.
- [GoSlmailer](https://github.com/CLIP-HPC/goslmailer) - Goslmailer is a drop-in notification delivery solution for slurm that can do slack, mattermost, teams, and more.

## Performance
- [TotalView](https://totalview.io/products/totalview) - TotalView is a debugging tool for HPC applications `Proprietary`.
- [Tau](https://www.cs.uoregon.edu/research/tau/home.php) - TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python `other`.
- [Valgrind](https://www.valgrind.org/) - Valgrind is a tool designed to profile programs to determine memory leaks ([Source Code](https://sourceware.org/git/?p=valgrind.git)) `GPL-2`.
- [Paraver](https://tools.bsc.es/paraver) - Paraver is a very flexible data browser that is part of the CEPBA-Tools toolkit `other`.
- [PAPI](http://icl.cs.utk.edu/papi) - Performance Application Programming Interface (PAPI) is a performance analysis tool ([Source Code](https://bitbucket.org/icl/papi/src/master/)) `other`.

## Parallel Shells
- [pdsh](https://linux.die.net/man/1/pdsh) - pdsh runs terminal commands across multiple hosts in parallel ([Source Code](https://github.com/chaos/pdsh)) `GPL-2`.
- [ClusterShell](https://clustershell.readthedocs.io/en/latest/intro.html) - Scalable cluster administration Python framework ([Source Code](https://github.com/cea-hpc/clustershell)) `LGPL-2.1` .

## Containers
- [Apptainer](https://apptainer.org) - Apptainer is an open source container system ([Source Code](https://github.com/apptainer/apptainer)) `BSD`.
- [Charliecloud](https://hpc.github.io/charliecloud/) - Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers ([Source Code](https://github.com/hpc/charliecloud)) `Apache-2.0`.
- [Docker](https://www.docker.com/) - Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers `other`.
- [uDocker](https://indigo-dc.github.io/udocker/) - A basic user tool to execute simple docker containers in batch or interactive systems without root privileges ([Source Code](https://github.com/indigo-dc/udocker)) `Apache-2.0`.
- [Shifter](https://www.nersc.gov/research-and-development/user-defined-images/) - Shifter is Linux containers for HPC ([Source Code](https://github.com/NERSC/shifter)) `other`.
- [HPC Container Maker](https://github.com/NVIDIA/hpc-container-maker) - HPC Container Maker is an open source tool to make it easier to generate container specification files. `Apache-2.0`.
- [Scarus](https://github.com/eth-cscs/sarus) - An OCI-compatible container engine for HPC `BSD`.
- [Singularity HPC](https://singularity-hpc.readthedocs.io) - Singularity Registry HPC (shpc) allows you to install containers as modules ([Source Code](https://github.com/singularityhub/singularity-hpc)) `MPL 2.0`.

## Environment Management
- [Lmod](https://lmod.readthedocs.io/en/latest/) - Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy ([Source Code](https://github.com/TACC/Lmod)) `other`.
- [Environment Modules](https://modules.readthedocs.io/en/latest/) - Environment Modules: provides dynamic modification of a user's environment ([Source Code](https://github.com/cea-hpc/modules)) `GPL-2`.
- [Anaconda](https://www.anaconda.com/) - Anaconda is a Python and R distribution for use in computational science `other`.
- [Mamba](https://mamba.readthedocs.io/en/latest/) - Mamba is a reimplementation of the conda package manager in C++ ([Source Code](https://github.com/mamba-org/mamba)) `BSD`.

## Visualization
- [Visit](https://visit-dav.github.io/visit-website/) - VisIt - Visualization and Data Analysis for Mesh-based Scientific Data ([Source Code](https://github.com/visit-dav/visit)) `BSD-3`.
- [Paraview](https://www.paraview.org/) - ParaView is an open-source, multi-platform data analysis and visualization application based on Visualization Toolkit (VTK) ([Source Code](https://github.com/Kitware/ParaView)) `BSD-3`.

## Parallel Filesystems
- [GPFS](https://www.ibm.com/docs/en/gpfs/4.1.0.4?topic=guide-introducing-general-parallel-file-system) - GPFS is a high-performance clustered file system software developed by IBM `Proprietary`.
- [Quobyte](https://www.quobyte.com/storage-for/high-performance-computing-hpc?gclid=EAIaIQobChMI-fv1pfKG8wIV5x6tBh367Q5CEAAYASABEgJTgPD_BwE) - A high performance filesystem `Proprietary`.
- [Ceph](https://ceph.io/en/) - Ceph is a distributed object, block, and file storage platform ([Source Code](https://github.com/ceph/ceph)) `other`.
- [Weka](https://www.weka.io/) - A file system designed for HPC `Proprietary` .
- [Lustre/Exascaler](https://www.lustre.org/) - Lustre is an open-source, distributed parallel file system software platform designed for scalability, high-performance, and high-availability ([Source Code](https://git.whamcloud.com/fs/lustre-release.git)) `other`.
- [BeeGFS](https://www.beegfs.io/c/) - BeeGFS is a hardware-independent POSIX parallel file system developed with a strong focus on performance and designed for ease of use, simple installation, and management `Proprietary`.
- [OrangeFS](http://www.orangefs.org/) - OrangeFS is a next generation parallel file system for Linux clusters ([Source Code](https://github.com/waltligon/orangefs)) `other`.
- [MooseFS](https://moosefs.com/) - Moose File System is an Open-source, POSIX-compliant distributed file system developed by Core Technology ([Source Code](https://github.com/moosefs/moosefs)) `GPL-2.0`.

## Programming Languages
- [Julia](https://julialang.org/) - Julia is a high-level, high-performance dynamic language for technical computing `MIT`.
- [Futhark](https://futhark-lang.org/) - Futhark is a purely functional data-parallel programming language in the ML family `isc`.
- [Chapel](https://chapel-lang.org/) - Chapel is a programming language designed for productive parallel computing at scale `Apache-2.0`.

## Monitoring
### Prometheus Based
- [Slurm Exporter](https://github.com/treydock/prometheus-slurm-exporter) - Prometheus exporter for performance metrics from Slurm `GPL-3.0`.
- [Slurm Exporter](https://github.com/ubccr/slurm-exporter) - Slurm Exporter for Prometheus using Rest API `GPL-3.0`.
- [Infiniband Exporter](https://github.com/treydock/infiniband_exporter) - The InfiniBand exporter collects counters from InfiniBand switches and HCAs `Apache-2.0`.
- [Cgroup Exporter](https://github.com/treydock/cgroup_exporter) - Produces metrics from cgroups `Apache-2.0`.
- [Cgroup Exporter](https://github.com/phpHavok/cgroups_exporter) - A Prometheus exporter for cgroup-level metrics `unknown`.
- [GPFS Exporter](https://github.com/treydock/gpfs_exporter) - The GPFS exporter collects metrics from the GPFS filesystem `Apache-2.0`.
- [Lustre Exporter](https://github.com/GSI-HPC/lustre_exporter) - Prometheus exporter for use with the Lustre parallel filesystem `GPL-3.0`.
- [DCGM Exporter](https://github.com/NVIDIA/dcgm-exporter) - NVIDIA GPU metrics exporter for Prometheus leveraging DCGM `Apache-2.0`.

## Journals
- [Journal of Super Computing](https://www.springer.com/journal/11227) - An International Journal of High-Performance Computer Design, Analysis, and Use.

## Podcasts
- [This week in HPC](https://www.intersect360.com/media/podcasts/) - Each week, Intersect360 Research CEO Addison Snell and HPCwire editor Tiffany Trader dissect the week's top HPC stories.
- [Exascaler Project](https://www.exascaleproject.org/podcast/) - ECP's Let's Talk Exascale podcast goes behind the scenes to chat with some of the people who are bringing a capable and sustainable exascale computing ecosystem to fruition.
- [@HPCpodcast](https://insidehpc.com/category/resources/hpc-podcast/) - Join Shahin Khan and Doug Black as they discuss Supercomputing technologies and the applications, markets, and policies that shape them.

## Blogs
- [HPCWire](https://www.hpcwire.com/) - Since 1987 covering the fastest computers in the world and the people who run them.
- [InsideHPC](https://insidehpc.com/) - insideHPC is a global publication recognized for its comprehensive and insightful coverage of the HPC-AI community, linking vendors, end-users and HPC strategists.
- [The Next Platform](https://www.nextplatform.com/category/hpc/) - Offers in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds.
- [The Register HPC](http://www.theregister.co.uk/data_centre/hpc/) - The Register is a leading and trusted global online enterprise technology news publication, reaching roughly 40 million readers worldwide.
- [HPC at Dell](http://hpcatdell.com) - High-Performance Computing knowledge base articles from Dell.

## Conferences

- [Pearc](https://pearc.acm.org/) - Practice & Experience in Advanced Research Computing.
- [Supercomputing (SC)](https://supercomputing.org/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
- [Supercomputing International (ISC)](https://www.isc-hpc.com/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
- [CCGrid](https://dl.acm.org/conference/ccgrid) - IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing.
- [IEEE-HPEC](https://ieee-hpec.org/) - IEEE High Performance Embedded Computing.
- [Hot Chips](https://hotchips.org) - Semiconductor industry's leading conference on high-performance microprocessors and related circuits.
- [Hot Interconnects](https://hoti.org) - IEEE conference on software architectures and implementations for interconnection networks of all scales.
- [ESSA](https://sites.google.com/view/essa-2024/) - Workshop on Extreme-Scale Storage and Analysis.
- [IEEE-IPDPS](https://www.ipdps.org/) - IEEE International Parallel & Distributed Processing Symposium.
- [ESPM2 Workshop](http://nowlab.cse.ohio-state.edu/espm2/) - International Workshop on Extreme Scale Programming Models and Middleware.
- [LCI Workshops](https://linuxclustersinstitute.org/workshops/) - The Linux Clusters Institute (LCI) is providing education and advanced technical training for the deployment and use of computing clusters to the high performance computing community worldwide.
- [HPC Carpentry](https://www.hpc-carpentry.org/) - Teaching basic skills for high-performance computing.

## Websites

- [Top500](https://top500.org) - The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world.

## User Groups
- [MVAPICH](https://mug.mvapich.cse.ohio-state.edu/) - The MUG conference provides an open forum for all attendees (users, system administrators, researchers, engineers, and students) to discuss and share their knowledge on using MVAPICH libraries.
- [Slurm](https://slurm.schedmd.com/slurm_ug_agenda.html) - The annual Slurm user group meeting.

## Contributing

Contributing guidelines can be found in [contributing.md](contributing.md).