Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/argonne-lcf/ALCFBeginnersGuide
https://github.com/argonne-lcf/ALCFBeginnersGuide
Last synced: 14 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/argonne-lcf/ALCFBeginnersGuide
- Owner: argonne-lcf
- Created: 2022-12-21T00:57:09.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-07-17T16:56:29.000Z (4 months ago)
- Last Synced: 2024-08-01T16:38:35.820Z (3 months ago)
- Language: Cuda
- Size: 140 MB
- Stars: 23
- Watchers: 5
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ALCF Beginners Guide
If you are new to using supercomputers and/or ALCF systems, this is the starting place for you. This guide will teach you the following:
* how to login to ALCF systems
* how to setup a usable environment on a login node
* how to query the job scheduler
* how to submit an interactive job to execute examples on a worker node
* how to submit a job script to the schedulerWe have versions of these steps laid out for each of our systems so please pick the system you wish to begin with and start there:
* [Polaris](polaris/)
* [Crux](crux/) [Coming Soon]
* [Sophia](sophia/) [Coming Soon]# Our Assumptions
These guides presume the reader knows:
* how to use GitHub and git repositories.
* how to use SSH on their computer in order to remotely login to another machine. If you are not familiar with SSH please search for "how to use ssh on _linux or windows or mac_".
* how to navigate a linux terminal and shell environment. Your default login shell (BASH, ZSH, etc.) can be selected via [ALCF Accounts Page](accounts.alcf.anl.gov).
* how to compile software in a linux environment using `gcc`/`g++`/`clang++`.# ALCF Systems Overview
![ALCFSystems](media/alcf_system_overview.png)
# What is a Supercomputer?
Argonne hosts DOE supercomputers for use by research scientists in need of large computational resources. Supercomputers are composed of many computing _nodes_ (1 _node_ = 1 physical computer) that are connected by a high-speed communications network so that groups of nodes can share information quickly, effectively operating together as a larger computer.
# A Compute Node
If you look inside your Desktop or Laptop you'll find these parts:
![parts](media/computer-parts-diagram.png)
A computing node of a supercomputer is very similar, each has simliar parts, but it is designed as a single unit that can be inserted and removed from large closet-sized racks with many others:
![blade](media/computer_blade.jpg)
In large supercomputers multiple computer processors (CPUs) and/or graphics processors (GPUs) are combined into a single node. It has a CPU on which the local operating system runs. It has local memory for running software. It may have GPUs for doing intensive calculations. Each node has a high-speed network connection that allows it to communicate with other nodes and to a large shared filesystem.
# Cluster/HPC Computing Hardware Setup
![Hardware](media/supercomputer_diagram.png)
Large computer systems typically have _worker_ nodes and _login_ nodes. _login_ nodes are the nodes on which every user arrives when they login to the system. _login_ nodes should not be used for computation, but for compiling code, writing/editing code, and launching _jobs_ on the system. A _job_ is the application that will be launched on the _worker_ nodes of the supercomputer.
# Supercomputers are Big!
These supercomputers occupy a lot of space in the ACLF data center. Here is our staff at the time (2019) in front of [Mira](https://en.wikipedia.org/wiki/Mira_(supercomputer)), an IBM supercomputer, that debuted as the third fastest supercomputer in the world in 2012.
![Staff-Mira](media/mira_staff.jpg)