{"id":15356399,"url":"https://github.com/saravanabalagi/hpc_scripts","last_synced_at":"2025-10-31T19:30:28.683Z","repository":{"id":84130055,"uuid":"247709258","full_name":"saravanabalagi/hpc_scripts","owner":"saravanabalagi","description":"Contains scripts I use for scheduling jobs in Irish Centre for High-End Computing (ICHEC). Can be adapted to use on other HPCs with SLURM.","archived":false,"fork":false,"pushed_at":"2020-04-01T12:39:29.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-25T20:27:09.461Z","etag":null,"topics":["hpc","icheck","sbatch","scripts","slurm","slurm-job-scheduler"],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saravanabalagi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-16T13:27:37.000Z","updated_at":"2020-04-01T12:39:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"e937ee14-88ec-4a6b-85e1-23599d0719a7","html_url":"https://github.com/saravanabalagi/hpc_scripts","commit_stats":{"total_commits":8,"total_committers":1,"mean_commits":8.0,"dds":0.0,"last_synced_commit":"300a596a96ec843bd0a325bd27f77236e95f6f9f"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fhpc_scripts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fhpc_scripts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fhpc_scripts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saravanabalagi%2Fhpc_scripts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saravanabalagi","download_url":"https://codeload.github.com/saravanabalagi/hpc_scripts/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239225766,"owners_count":19603162,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hpc","icheck","sbatch","scripts","slurm","slurm-job-scheduler"],"created_at":"2024-10-01T12:28:36.275Z","updated_at":"2025-10-31T19:30:28.611Z","avatar_url":"https://github.com/saravanabalagi.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"## SLURM srun\r\n\r\nIn a SLURM managed cluster, we need to create a sbatch script like the one given below , so we can schedule our job using `sbatch sbatch_script.sh`\r\n\r\n```shell\r\n#!/bin/sh\r\n\r\n#SBATCH --time=00:20:00\r\n#SBATCH --nodes=2\r\n#SBATCH -A nuim01\r\n#SBATCH -p GpuQ\r\n\r\n#SBATCH -o outfile  # send stdout to outfile\r\n#SBATCH -e outfile  # send stderr to errfile\r\n#SBATCH --job-name python_test_run\r\n\r\n# Here goes your commands\r\n\u003ccommand_prefix\u003e /bin/sh ~/script_test.sh\r\n```\r\n\r\n```\r\n# script_test.sh\r\necho \"Node $SLURM_NODEID here: $(hostname)\"\r\n```\r\n\r\nNote: If we directly use `echo \"Node $SLURM_NODEID here: $(hostname)\"` in the sbatch_script, then all nodes will print the same $SLURM_NODEID on which the sbatch_script is being processed.\r\n\r\n| Command Prefix | outfile |\r\n|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n|  | Node 0 here: n360 |\r\n| srun | Node 0 here: n360\u003cbr\u003eNode 1 here: n361 |\r\n| srun -n1 | Node 0 here: n360\u003csup\u003e#\u003c/sup\u003e |\r\n| srun -n2 | Node 0 here: n360\u003cbr\u003eNode 1 here: n361 |\r\n| srun -n3 | Node 0 here: n360\u003cbr\u003eNode 0 here: n360\u003cbr\u003eNode 1 here: n361 |\r\n| srun -n8 | Node 0 here: n360\u003cbr\u003eNode 0 here: n360\u003cbr\u003eNode 0 here: n360\u003cbr\u003eNode 0 here: n360\u003cbr\u003eNode 1 here: n361\u003cbr\u003eNode 1 here: n361\u003cbr\u003eNode 1 here: n361\u003cbr\u003eNode 1 here: n361 |\r\n\r\n\u003csup\u003e#\u003c/sup\u003esrun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1\r\n\r\n## View Allocated Resources\r\n\r\n```\r\nscontrol show job -d {{SLURM_JOB_ID}}\r\n\r\n# if you run only one job, then you could do, or see the details of first job then\r\n# scontrol show job -d $(squeue -u $(whoami) | awk 'NR\u003e1 {print $1}')\r\n```\r\n```\r\nUserId=USER(UID) GroupId=GROUP(GID) MCS_label=N/A\r\nPriority=1 Nice=0 Account=nuim01 QOS=nuim01\r\nJobState=RUNNING Reason=None Dependency=(null)\r\nRequeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\r\nDerivedExitCode=0:0\r\nRunTime=00:00:28 TimeLimit=00:20:00 TimeMin=N/A\r\nSubmitTime=2020-03-16T12:36:16 EligibleTime=2020-03-16T12:36:16\r\nStartTime=2020-03-16T12:36:17 EndTime=2020-03-16T12:56:17 Deadline=N/A\r\nPreemptTime=None SuspendTime=None SecsPreSuspend=0\r\nLastSchedEval=2020-03-16T12:36:17\r\nPartition=GpuQ AllocNode:Sid=login2:23007\r\nReqNodeList=(null) ExcNodeList=(null)\r\nNodeList=n[360-361]\r\nBatchHost=n360\r\nNumNodes=2 NumCPUs=80 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*\r\nTRES=cpu=80,node=2,billing=184\r\nSocks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*\r\n  Nodes=n[360-361] CPU_IDs=0-39 Mem=0 GRES_IDX=\r\nMinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0\r\nFeatures=(null) DelayBoot=00:00:00\r\nGres=(null) Reservation=(null)\r\nOverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)\r\nCommand=/path/to/workdir/scripts/sbatch_test.sh\r\nWorkDir=/path/to/workdir\r\nStdErr=/path/to/stderr\r\nStdIn=/dev/null\r\nStdOut=/path/to/stdout\r\nPower=\r\n```\r\n\r\n## Useful SLURM Environment variables\r\n\r\n| Slurm Variable Name | Description | Example values |\r\n|----------------------|-----------------------------------------------------------|-------------------------------------------|\r\n| SLURM_JOB_ID | Job ID | 5741192 |\r\n| SLURM_JOBID | Deprecated. Same as SLURM_JOB_ID | 5741192  |\r\n| SLURM_JOB_NAME | Job Name | myjob |\r\n| SLURM_SUBMIT_DIR | Submit Directory | /lustre/payerle/work |\r\n| SLURM_JOB_NODELIST | Nodes assigned to job | compute-b24-[1-3,5-9],compute-b25-[1,4,8] |\r\n| SLURM_SUBMIT_HOST | Host submitted from | login-1.deepthought2.umd.edu |\r\n| SLURM_JOB_NUM_NODES | Number of nodes allocated to job | 2 |\r\n| SLURM_CPUS_ON_NODE | Number of cores/node | 8,3 |\r\n| SLURM_NTASKS | Total number of cores for job??? | 11 |\r\n| SLURM_NODEID | Index to node running onrelative to nodes assigned to job | 0 |\r\n| PBS_O_VNODENUM | Index to core running onwithin node | 4 |\r\n| SLURM_PROCID | Index to task relative to job | 0 |\r\n\r\n## Additional References and FAQs\r\n\r\n1. [Submitting a basic Job](https://www.ichec.ie/academic/national-hpc/FAQ#2-how-do-i-submit-a-job-to-kay)\r\n1. [Selecting the number of CPUs and threads in SLURM sbatch](https://stackoverflow.com/a/51141287/3125070)\r\n1. [What does the --ntasks or -n tasks does in SLURM?](https://stackoverflow.com/a/53759961/3125070)\r\n1. [CÉCI FAQ](https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmFAQ.html#Q05)\r\n1. [Running Interactive Jobs](http://geco.mines.edu/files/userguides/slurm/interactive.html)\r\n1. [Check node usages](https://www.ichec.ie/academic/national-hpc/documentation/check-node-utilization)\r\n1. [Task Farming](https://www.ichec.ie/academic/national-hpc/documentation/task-farming)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaravanabalagi%2Fhpc_scripts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaravanabalagi%2Fhpc_scripts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaravanabalagi%2Fhpc_scripts/lists"}