https://github.com/datavorous/cluster-setup
A clean minimal boilerplate to setup an test bed for experiments.
https://github.com/datavorous/cluster-setup
boilerplate-template gpu-cluster
Last synced: about 1 month ago
JSON representation
A clean minimal boilerplate to setup an test bed for experiments.
- Host: GitHub
- URL: https://github.com/datavorous/cluster-setup
- Owner: datavorous
- Created: 2026-05-07T17:59:12.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-08T12:28:42.000Z (about 2 months ago)
- Last Synced: 2026-05-08T12:32:48.978Z (about 2 months ago)
- Topics: boilerplate-template, gpu-cluster
- Language: Shell
- Homepage:
- Size: 2.93 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cluster Setup
## Setup (one time)
```bash
git clone https://github.com/datavorous/cluster-setup.git
cd ~/cluster-setup
srun -p u22 -A SLURM_ACCOUNT --gres=gpu:1 -c 4 --time=00:30:00 --pty bash
bash setup/setup.sh
exit
```
## Before running anything
```bash
cd ~/cluster-setup
source setup/env.sh
pixi shell
```
## Directory structure
- `src/`: your training code
- `configs/`: your config files
- `scripts/`: utility scripts (demo.py to test)
- `setup/`: initialization scripts
## Environment variables
Available after `source setup/env.sh`
```bash
SCRATCH=/scratch/$USER # ephemeral node-local storage
HF_HOME=/scratch/$USER/hf_cache # Hugging Face models
CHECKPOINTS_DIR=/scratch/$USER/checkpoints # your checkpoints
EXPERIMENTS_DIR=/scratch/$USER/experiments # your logs/outputs
```
Use these in your code. `/scratch` is fast and deleted after job ends. Store everything there.
## Run your script
```bash
python src/script.py
```
## Test setup
```bash
python scripts/demo.py
```
## Submit batch job
Create `job.sh`:
```bash
#!/bin/bash
#SBATCH --job-name=train
#SBATCH --partition=u22
#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
source setup/env.sh
pixi shell
python src/script.py
```
Then: `sbatch job.sh`