https://github.com/ultimatile/hpc
An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM)
https://github.com/ultimatile/hpc
rsync slurm ssh
Last synced: 19 days ago
JSON representation
An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM)
- Host: GitHub
- URL: https://github.com/ultimatile/hpc
- Owner: ultimatile
- Created: 2025-12-26T09:25:29.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-05-08T19:18:24.000Z (21 days ago)
- Last Synced: 2026-05-10T22:54:43.976Z (19 days ago)
- Topics: rsync, slurm, ssh
- Language: Python
- Homepage:
- Size: 216 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hpc
An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM).
## Installation
One-shot execution (no install):
```bash
uvx --from git+https://github.com/ultimatile/hpc hpc
```
Permanent install:
```bash
uv tool install git+https://github.com/ultimatile/hpc
```
## Quick Start
```bash
# 1. Initialize project
hpc init
# 2. Edit configuration
vim hpc.toml
# 3. Sync files to cluster
hpc sync
hpc sync --dry-run # preview only
# 4. Submit job
hpc submit "python train.py"
# 5. Check status
hpc status 12345678
# 6. View job output
hpc job-output 12345678
```
## Commands
### `hpc init`
Creates `hpc.toml` configuration file in the current directory.
```bash
hpc init
```
### `hpc sync`
Syncs local files to the remote HPC cluster using rsync.
Always syncs the entire project root (where `hpc.toml` is located), regardless of which subdirectory you run from.
```bash
hpc sync # sync files
hpc sync --dry-run # preview without syncing (-n for short)
hpc sync --workdir /scratch/user/other # override remote workdir
hpc sync --push # push only (local → remote)
hpc sync --pull # pull only (remote → local)
```
### `hpc exec`
Executes a command directly on the login node (not via scheduler). Useful for setup tasks that need internet access (package installs, dependency downloads).
```bash
hpc exec "julia -e 'using Pkg; Pkg.instantiate()'"
hpc exec --script setup.sh
hpc exec --workdir /scratch/user/other "cmake .."
```
Environment setup (`[env]` section) is applied automatically. The working directory follows the same CWD-relative logic as `hpc submit`.
### `hpc submit`
Submits a job to the configured scheduler.
Returns both run_id (e.g., `20260109_1234`, hpc's local tracking ID) and job_id (scheduler job ID, e.g., `12345678`).
The job's working directory is set based on your current position relative to the project root (see [Multi-Setup Runs](#multi-setup-runs)).
```bash
hpc submit "python train.py"
hpc submit --script run.sh
hpc submit -s run.sh --wait
hpc submit --workdir /scratch/user/other "python train.py" # override remote workdir
```
`#SBATCH` (Slurm) and `#PJM` (PJM) directives written at the top of a script passed via `--script` are honored: hpc hoists them into the prologue of the rendered job script, so they are scanned by `sbatch` / `pjsub` instead of being silently treated as comments.
Only column-zero directive lines that appear before the first executable line in the user script are hoisted, matching the schedulers' own prologue-scan rule. Directives after an executable line, or inside heredocs, are left in the body as-is.
When the same option is set both via config (`[slurm.options]` for Slurm, the `pjm.options` array for PJM) and via a `#SBATCH` / `#PJM` line in the script, the script's value wins (the scheduler's last-occurrence-wins semantics for duplicate directives). The `submit_options` list is passed as command-line flags to `sbatch` / `pjsub` and, per scheduler specifications, overrides script directives unconditionally.
### `hpc status`
Checks the status of a submitted job.
Accepts either run_id or job_id.
```bash
hpc status 12345678
```
### `hpc job-output`
Shows the output of a submitted job.
Accepts either run_id or job_id.
```bash
hpc job-output 12345678
```
Pass `--follow` / `-f` to stream the output of a running job in real time
(equivalent to `tail -F` on the remote output file). Combine with `--error` /
`-e` to follow stderr instead of stdout. For terminal-state jobs the command
prints the final output and exits.
```bash
hpc job-output -f 12345678
hpc job-output -f -e 12345678
```
### `hpc wait`
Waits for a run to complete.
Accepts either run_id or job_id.
```bash
hpc wait 12345678
```
## Project Root and Config Discovery
hpc walks up from the current directory to find `hpc.toml`, similar to how git finds `.git`. This means you can run hpc commands from any subdirectory within your project.
Resolution order: `--config` / `-c` > `$HPC_CONFIG` > walk-up discovery > `./hpc.toml`.
The directory containing `hpc.toml` is the **project root**. This affects:
- **`hpc sync`**: always syncs the entire project root to `workdir`, regardless of CWD
- **`hpc submit`**: sets the job's `cd` to `workdir` + (CWD relative to project root)
- **`.hpc/runs/`**: run metadata is always stored at the project root
`hpc init` does not walk up — it always creates `hpc.toml` in the current directory.
## Multi-Setup Runs
When running multiple benchmarks or parameter sets from a single project, use subdirectories. hpc automatically maps your local directory structure to the remote.
```
myproject/
hpc.toml # workdir = "/remote/myproject"
src/main.py
runs/
setup-a/
input.dat
setup-b/
input.dat
```
```bash
# Sync the entire project (same result from any subdirectory)
hpc sync
# Submit from a subdirectory — job runs in the matching remote path
cd runs/setup-a
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-a
cd ../setup-b
hpc submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-b
```
Key points:
- **sync** is always project-wide. The remote mirrors your local project structure exactly.
- **submit** uses your CWD to determine the job's working directory on the remote.
- **`--workdir`** overrides `cluster.workdir` for one-off use without editing `hpc.toml`.
- Large artifacts that shouldn't be synced are managed via `[sync] ignore`.
## Configuration
Edit `hpc.toml`:
```toml
[cluster]
host = "myhpc" # SSH host (from ~/.ssh/config)
workdir = "/scratch/user/proj" # Remote working directory; all codes and data will be synced here
scheduler = "slurm" # "slurm" (default) or "pjm"
[env]
modules = ["gcc/12.2.0", "cuda/12.2"] # Modules to load (shorthand for module load)
spack = ["python@3.11"] # Spack packages to load (shorthand for spack load)
setup = [ # Additional setup commands
{source = "/path/to/venv/bin/activate"},
{export = ["VAR=value"]}, # {command = [args...]} format
"some_cmd", # String: command without args
]
[sync]
ignore = ["hpc.toml", ".git"] # Patterns to exclude from sync
compare = "checksum" # File comparison: "checksum" (content-based, default) or "timestamp"
pull_dir = "~/data/myproj" # Pull destination (default: project root). Useful for keeping git repo clean
[slurm.options]
partition = "gpu" # Example (Slurm): partition
time = "02:00:00" # Example (Slurm): time limit
mem = "32G" # Example (Slurm): memory
gpus = 1 # Example (Slurm): number of GPUs
```
### Environment Setup
Commands are executed in this order: `modules` → `spack` → `setup`.
`modules` and `spack` are shorthand syntax:
- `modules = ["gcc/12.2.0"]` expands to `module load gcc/12.2.0`
- `spack = ["python@3.11"]` expands to `spack load python@3.11`
`setup` accepts:
- String: command without args (e.g., `"some_cmd"`)
- Dict: `{command = args}` format (e.g., `{export = ["VAR=value"]}` → `export VAR=value`)
- Special commands `module` and `spack` in dict format expand to `module load` / `spack load`
If you need a different execution order, put everything in `setup`:
```toml
[env]
setup = [
{spack = "python@3.11"},
{module = "gcc/12.2.0"},
{source = "/path/to/venv/bin/activate"},
]
```
Shell special characters (`` ;|&`$<>\'"\n `` and space) are prohibited in arguments for security.
### PJM Configuration
For PJM scheduler, use array format for options:
```toml
[cluster]
scheduler = "pjm"
[pjm]
options = [
["-L", "node=12"],
["-L", "rscgrp=small"],
["-L", "elapse=00:30:00"],
["--mpi", "max-proc-per-node=4"],
["-g", "laa4Hoo5"],
["-s"]
]
```
`$XDG_CONFIG_HOME/hpc/config.toml` (default: `~/.config/hpc/config.toml`) will be copied as `hpc.toml` if it exists when running `hpc init`.
## Requirements
- Python 3.11+
- SSH access to HPC cluster (key-based authentication recommended)
- rsync
- Slurm or PJM on the remote cluster
### rsync Note
rsync from is recommended over macOS's built-in openrsync. When using checksum-based comparison (`compare = "checksum"`, default), openrsync has a bug where files with sizes that are exact multiples of 64 bytes are always detected as changed, even when identical. This is due to a protocol 29 checksum boundary issue. Confirmed with macOS 15.7's openrsync (protocol version 29, rsync version 2.6.9 compatible). If concerned, use `[sync] compare = "timestamp"` instead.
On macOS, install rsync via Homebrew:
```bash
brew install rsync
```
## Claude Code Integration
This project includes a [Claude Code skill](https://docs.anthropic.com/en/docs/claude-code/skills) (`.claude/skills/hpc/SKILL.md`) that teaches Claude how to use the hpc CLI. The CLI reference in the skill is dynamically generated via `hpc --skill` to stay in sync with the code.
## Development
```bash
make test # run tests
make lint # run linter
make check # run all checks
```