Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-cloud-hpc

A curated list of Cloud HPC.
https://github.com/kjrstory/awesome-cloud-hpc

Last synced: 4 days ago
JSON representation

  • Management Tool

    • Azure HPC OnDemand Platform - Azure-based HPC cluster solution with features like Terraform, Ansible, Packer integration, job scheduling, autoscaling, and monitoring ([Repository](https://github.com/Azure/az-hop), [Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/azhpc.azhop)).
    • HPC-NOW - The platform aims to simplify the process of starting and managing HPC workloads in the cloud.
    • Alibaba E-HPC - Alibaba Cloud's computing service for resource management, job submission, performance analysis, and VNC in E-HPC console.
    • AWS ParallelCluster - Open source cluster management tool for deploying and managing HPC clusters ([Repository](https://github.com/aws/aws-parallelcluster)).
    • Azure CycleCloud - Secure and flexible cloud HPC and Big Compute environments.
    • CloudyCluster - Turn-Key Cloud HPC elastic orchestration with a familiar hpc look and feel.
    • KT Cloud HPC - KT Cloud's HPC management product integrating Altair's solutions.
    • OCI HPC Cluster - Automated HPC cluster deployment on OCI.
    • OCI HPC File System (HFS) - Solution for deploying various HPC file servers on OCI. Automated HPC cluster deployment on OCI.
    • SCP HPC Cluster - HPC cluster environment on SCP.
    • JedAI Cloud - Optimized HPC stacks enable easy cluster management and on-demand HPC through pre-integrated solutions, delivering bare metal infrastructure, virtualized services, and containerized apps via a single management interface by Define Tech.
    • TrinityX - Next-gen open-source HPC, AI, and cloud platform offering customizable installations with efficient provisioning, SLURM/OpenPBS, OpenHPC, and more for modern cluster management.
    • AWS ParallelCluster UI - Front-end for AWS ParallelCluster.
    • Flight Environment - The Flight User Suite for improved HPC access through CLI tools, the Flight Web Suite as a web interface for HPC end-users, and the Flight Admin Tools for administrative HPC environment configuration.
    • Alibaba E-HPC - Alibaba Cloud's computing service for resource management, job submission, performance analysis, and VNC in E-HPC console.
    • Cluster in the Cloud - Multi cloud solution that uses Terraform for infrastructure setup, Ansible for software configuration, and Slurm with custom Python scripts for dynamic node management in cloud-based HPC environment.
    • Magic Castle - Multi-cloud HPC cluster solution that leverages Terraform and Puppet for deployment, featuring job scheduling with Slurm and over 3000 research software applications.
    • GCP HPC Toolkit - Google Cloud's open-source software for deploying high-performance computing environments on GCP, featuring customizable Terraform modules and Packer integration. ([Repository](https://github.com/GoogleCloudPlatform/hpc-toolkit)).
  • IaaS-Image

    • Azhpc-images - Installation scripts for HPC images in Azure Marketplace, specifically CentOS-HPC, Ubuntu-HPC, and AlmaLinux-HPC.
    • Flight Solo - HPC-ready, platform-agnostic image approach to deploying HPC resources powerd by alcesflight.
    • GCP HPC-ready VM - CentOS 7.9 or Rocky Linux 8 based VM image that is optimized for tightly coupled HPC workloads [Marketplace CentOS 7](https://console.cloud.google.com/marketplace/product/click-to-deploy-images/hpc-vm-image-centos-7) [Marketplace Rocky Linux 8](https://console.cloud.google.com/marketplace/product/click-to-deploy-images/hpc-vm-image-rocky-linux-8?q=search&referrer=search).
    • HPC Pack 2019 - Microsoft HPC Pack 2019 image powered by Cloud Infrastructure Services ([Marketplace(Azure)](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/cloud-infrastructure-services.hpc2019-windows-server-2019), [Marketplace(AWS)](https://aws.amazon.com/marketplace/pp/prodview-hxo3dtqd4srdk), [Marketplace(GCP)](https://console.cloud.google.com/marketplace/product/cloud-infrastructure-services/hpc2019-windows-2019)).
    • HPCBOX - Desktop-centric, intelligent workflow cloud HPC platform for automating and executing your application pipelines ([Marketplace](https://azuremarketplace.microsoft.com/en-us/marketplace/apps?search=hpcbox)).
    • NVIDIA Virtual Machine Images - Operating system environment for running NVIDIA GPU accelerated software in the cloud.
  • Job Scheduler

    • Slurm on Google Cloud Platform - Open-source software solution that enables setting up Slurm clusters on Google Cloud Platform with ease.
    • Altair Access - HPC Job Submission Portal for Researchers and Engineers.
    • Altair NavOps - Cloud Migration, Automation, and Spend Management for HPC.
    • Altair Grid Engine - Distributed Resource Management and Optimization.
    • Altair PBS-Professional - Industry-leading Workload Manager and Job Scheduler for HPC and High-throughput Computing.
    • MS HPC Pack
    • Altair Control - HPC Administrator's Control Center for Managing, Optimizing, and Forecasting Resources with seamless cloud bursting capabilities.
    • Altair HPCWorks - High-Performance Computing (HPC) and Cloud Platform by Altair.
    • IBM Spectrun LSF Suites - Workload management platform and job scheduler for HPC with dynamic HPC cloud support for all major cloud providers ([Repository](https://github.com/IBM/ibm-spectrum-scale-cloud-install)).
    • Slurm Power Saving Guide - Suspending and resuming nodes as needed, and supports cloud integration with providers like AWS, GCP, and Azure for workload management and cloud bursting.
  • Recipes

  • Solution

  • IaaS-Server

    • Amazon EC2 Hpc7g - HPC-optimized instances powered by AWS Graviton3E processors.
    • Amazon EC2 Hpc7a - HPC-optimized instances powered by 4th Generation AMD EPYC processors.
    • Amazon EC2 Hpc6id - HPC-optimized instances powered by 3rd Generation Intel Xeon Scalable processors.
    • Amazon EC2 P5 - GPU instances powerd by NVIDIA H100 GPUs.
    • Amazon EC2 P4 - GPU instances powerd by NVIDIA A100(80Gb,40Gb) GPUs.
    • Amazon EC2 P3 - GPU instances powerd by NVIDIA V100 GPUs.
    • Amazon EC2 G5 - GPU instances powerd by NVIDIA A10G GPUs and 2nd Gen AMD EPYC processors.
    • Azure HBv4-series - HPC-optimized instances powered by 4th Generation AMD EPYC processors.
    • Azure HBv3-series - HPC-optimized instances powered by 3rd Generation AMD EPYC processors.
    • Azure HBv2-series - HPC-optimized instances powered by 2nd Generation AMD EPYC processors.
    • Azure HB-series - HPC-optimized instances powered by 1st Generation AMD EPYC processors.
    • Azure HC-series - HPC-optimized instances powered by 1st Generation Intel Xeon Scalable processors.
    • Azure HX-series - Optimized instances for workloads that require significant memory capacity with twice the memory capacity as HBv4.
    • Azure NDm H100 v5-series - GPU instances powerd by NVIDIA H100 GPUs.
    • Azure NDm A100 v4-series - GPU instances powerd by NVIDIA A100(80Gb) GPUs and 3rd Generation AMD EPYC processors.
    • Azure NC A100 v4-series - GPU instances powerd by NVIDIA A100(40Gb) GPUs and 3rd Generation AMD EPYC processors.
    • Azure NCv3-series - GPU instances powerd by NVIDIA V100 GPUs.
    • Azure NCasT4_v3-series - GPU instances powerd by NVIDIA T4 GPUs and 2nd Gen AMD EPYC CPUs.
    • Super Computing Cluster - Based on ECS Bare Metal Instance powered by Alibaba Cloud, utilizes high-speed RDMA-based connections to enhance network performance and acceleration ratio in large-scale clusters, providing high-bandwidth and low-latency networks.
    • Super Computing Cluster - Based on ECS Bare Metal Instance powered by Alibaba Cloud, utilizes high-speed RDMA-based connections to enhance network performance and acceleration ratio in large-scale clusters, providing high-bandwidth and low-latency networks.
    • GCP G2 machine-series - GPU instances powerd by NVIDIA L4 GPUs.
  • IaaS-Network

    • Azure InfiniBand - RDMA capable HB-series and N-series VMs communicate over the InfiniBand network.
    • Elastic Fabric Adapter - Network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale.
    • Compute Clusters - us/iaas/Content/Compute/Tasks/managingclusternetworks.htm)) - Group of high performance computing (HPC), GPU, or optimized instances that are connected with a high-bandwidth, ultra low-latency network. <a href="#"> <img src="https://img.shields.io/badge/OCI-F80000?style=flat&logo=oracle&logoColor=black"> </a>
  • IaaS-Storage

    • Amazon FSx for Lustre - Fully managed shared storage with the scalability and performance of the popular Lustre file system.
    • Amazon FSx for OpenZFS - Fully managed shared storage built on the popular OpenZFS file system.
    • Azure HPC Cache
    • Azure Managed Lustre - Managed, pay-as-you-go file system for high-performance computing (HPC) and AI workloads.
    • Azure NetApp Files - Enterprise-grade Azure file shares, powered by NetApp.
    • GCP File Store - High-performance, fully managed file storage.
    • GCP Parallel Store - Based on Intel DAOS and delivers up to 6.3x greater read throughput performance compared to competitive Lustre scratch offerings.
  • PaaS

    • AWS Batch - Fully managed batch computing service.
    • Azure Batch - Cloud-scale job scheduling and compute management.
    • GCP Batch - Fully managed batch service to schedule, queue, and execute batch jobs on Google's infrastructure.
    • NICE DCV - High-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming.
    • NICE EnginFrame - Unified interface to submit jobs for both on-premises and cloud workflow.
    • Research and Engineering Studio - Open source, easy-to-use web-based portal for administrators to create and manage secure cloud-based research and engineering environments on AWS.
    • Rntier Cloud - R&D cloud platform enabling easy and quick access to complex HPC simulations, vGPU-based remote 3D design, and multi-GPU deep learning environments via a web browser.
    • Scyld Cloud Central™ - Fully managed, cloud-based, end-to-end solution for high performance computing that makes it easier and faster for end-users, developers, and data scientists to deploy pure HPC, pure AI, and converged HPC/AI workloads on high-performance clusters.
    • Scyld ClusterWare - Intelligent suite of management functionality, including node provisioning, image customization, and cluster monitoring, while serving as a platform for additional software and schedulers.
    • Scyld Cloud Workstation - Unparalleled performance and a breadth of features that allow it to stand out as a solution for remote access.
    • AWS Parallel Computing Service - Managed service for HPC cluster deployment and scaling on AWS using Slurm.
    • Batch Compute - Cloud service for massive simultaneous batch processing on Alibaba Cloud.
    • Batch Compute - Cloud service for massive simultaneous batch processing on Alibaba Cloud.
    • Batch Compute - Cloud service for massive simultaneous batch processing on Alibaba Cloud.
    • Amazon DCV - High-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming.
    • NI SP EF Portal - Unified interface to submit jobs for both on-premises and cloud workflow.
  • SaaS

    • CloudHPC - On-demand cloud computing for CAE engineering simulations powered by CFD FEA SERVICE.
    • Nimbix - A comprehensive cloud computing solution powered by Atos, offering access to the HyperHub Application Marketplace with over 1,000 high-performance applications and workflows for diverse industries ([Repository](https://github.com/nimbix)).
    • Sabalcore - User-friendly, pay-as-you-go high performance computing cloud service with a full-featured, light-weight client that doesn't require a browser.
    • Scala Computing - Optimized, automated cloud-based HPC resource management platform with integrated network simulation and EDA tools, offering flexible, on-demand computing, secure workflows, and global infrastructure access.
    • TAESUNG Cloud - Offering Ansys applications as a service in a cloud-based SaaS.
    • dicehub - Real-time collaborative CFD (Computational Fluid Dynamics) simulations platform which simplifies your engineering workflow, offers massive parallel scaling and runs in web browser.
    • Uber Cloud - A platform featuring HD 3D graphics desktop GUI, BYOL simulation software support, scalable container-based architecture, and automated cloud computing on AWS, Azure, Google, and HPE.
    • Kaleidosim - Enabling of browser-based access to HPC software through advanced cloud orchestration technology.
    • OnScale Solve - The cloud engineering simulation platform built by engineers for engineers.
    • SyncHPC - Powerful and flexible hybrid HPC and VDI management platform that provides a comprehensive solution for managing high-performance computing (HPC) and Virtual Desktop Infrastructure (VDI) resources.
    • EPIC - Primarily for CFD applications, available on the web and created by Zenotech, which also includes Zenotech's ZCFD.
    • Luminary Cloud - A cloud-based, pay-per-use SaaS simulation platform with a fast, GPU-powered, cloud-native CFD solver and comprehensive high-fidelity capabilities.
  • CAE and EDA ISV

    • Altair One - Cloud Gateway offering dynamic and collaborative access to simulation and data analytics technology, along with scalable HPC and cloud resources.
    • Altair Unlimited - A turnkey, state-of-the-art private appliance available in both on-premises and cloud-based formats, offering unlimited access to a wide range of Altair HyperWorks solver software.
    • Ansys Cloud Direct - Cloud-based interactive workstations and HPC clusters, with flexible licensing that can be accessed from desktop.
    • Ansys Gateway by AWS - Cloud-based solution for managing Ansys Simulation & CAD/CAE developments via a web browser.
    • Cadence OnCloud Platform - SaaS software platform for all your system design and simulation needs that can operate on any hardware, removing the requirement to run and maintain expensive infrastructure hardware.
    • Simulia Cloud
    • Synopsys Cloud - Platform that enables delivery of EDA tools, IP and infrastructure for end-to-end chip design through a browser.
    • Managed Cloud Service - EDA-optimized platform powered by Cadence that provides a fully integrated and proven cloud environment to jump-start product design, verification, and implementation.
    • Palladium and Protium Cloud - Emulation and prototyping offering provides pre-silicon hardware system verification and debug powered by Cadence.
    • 3DEXPERIENCE platform on ther cloud - Complete suite of industry-leading apps and software(CATIA, SIMULIA, DELMIA, 3DEXCITE, etc.) powered by Dassalut Systèmes.
    • Cloud Passport - Cloud-ready tools powered by Cadence that have been optimized for use in customers' own cloud environment.
    • Ansys Access on Microsoft Azure - Cloud-based simulation solution available on the Azure Marketplace, offering fast, scalable access to Ansys applications ([Marketplace](https://azuremarketplace.microsoft.com/marketplace/apps/ansys.ansysaccessonmicrosoftazure?tab=overview)).
    • Simcenter Cloud HPC - Part of the Xcelerator as a Service(XaaS) offering powered by Siemens, offers increased flexibility and scalability for CFD simulations with no additional setup needed.
  • Resource

  • Blog Documentation YouTube

    • Day 1 HPC - AWS engineering's hpc communutiy site.