An open API service indexing awesome lists of open source software.

https://github.com/cellgeni/farm-course

Materials for farm-course organised by cellgenIT
https://github.com/cellgeni/farm-course

Last synced: 4 months ago
JSON representation

Materials for farm-course organised by cellgenIT

Awesome Lists containing this project

README

          

# FARM Course - LSF Job Scheduling and HPC Tutorial

This repository contains tutorial materials and example scripts for learning how to use the FARM cluster with LSF (Load Sharing Facility) job scheduling system.

## Overview

This course covers:
- Basic LSF job submission with `bsub`
- Array job processing
- Working with modules and Singularity containers
- Python script execution in HPC environments
- iRODS data management

## Repository Structure

```
farm-course/
├── README.md # This file
├── slides.pdf # Presentation slides
├── data/
│ ├── sample_table.csv # Sample data for array jobs
│ └── sample.list # Sample list file
├── scripts/
│ ├── script.sh # Basic bash script example
│ ├── array_script.bsub # LSF array job script
│ ├── python_script.bsub # Python execution with modules/Singularity
│ ├── add_values.py # Simple Python calculator
│ └── command_list.sh # Collection of useful commands
├── logs/ # Job log files (stdout/stderr)
│ ├── arrayOutput*.log # Array job outputs
│ ├── arrayError*.log # Array job errors
│ ├── output*.log # Standard job outputs
│ └── error*.log # Standard job errors
└── results/ # Output files from job executions
├── lolkek.txt
└── sample*.txt # Generated by array jobs
```

## Getting Started

### Prerequisites
- Access to FARM cluster
- Basic knowledge of bash scripting
- Familiarity with Python (optional)

### Basic Job Submission

1. **Simple job submission:**
```bash
bsub -G farm-course -q normal -n 1 -M "2G" -R "select[mem>2G] rusage[mem=2G]" -o "output%J.log" -e "error%J.log" ./scripts/script.sh
```

2. **Array job submission:**
```bash
bsub -J "[1-6]" < scripts/array_script.bsub
```

3. **Python script with modules:**
```bash
bsub < scripts/python_script.bsub
```

## Script Examples

### 1. Basic Bash Script (`script.sh`)
Demonstrates:
- File output redirection
- stdout vs stderr output
- Basic job execution

### 2. Array Job Script (`array_script.bsub`)
Demonstrates:
- LSF array job directives
- Reading sample data from CSV
- Job indexing with `LSB_JOBINDEX`
- Dynamic file creation based on job index

### 3. Python Calculator (`add_values.py`)
A simple Python script that:
- Accepts command line arguments
- Performs basic arithmetic
- Includes error handling

### 4. Module and Singularity Usage (`python_script.bsub`)
Shows how to:
- Load Python modules
- Execute Python scripts with different environments
- Use Singularity containers for reproducible environments

## Key LSF Directives

| Directive | Description |
|-----------|-------------|
| `#BSUB -G` | Specify user group |
| `#BSUB -q` | Queue selection |
| `#BSUB -n` | Number of cores |
| `#BSUB -M` | Memory limit |
| `#BSUB -R` | Resource requirements |
| `#BSUB -o` | Standard output file |
| `#BSUB -e` | Standard error file |
| `#BSUB -J` | Job array specification |

## Working with Data

### Sample Data Format
The `data/sample_table.csv` contains a simple list of sample names:
```
sample1
sample2
sample3
sample4
sample5
sample6
```

### Array Job Processing
Array jobs automatically process each sample using the `LSB_JOBINDEX` variable:
- Job index 1 processes `sample1`
- Job index 2 processes `sample2`
- And so on...

## Module System

Load available modules:
```bash
module avail -C python
module load ISG/python/3.12.3
module load cellgen/singularity
```

## iRODS Commands

Basic iRODS operations covered:
```bash
# List catalogs
ils /Sanger1/training

# Check metadata
imeta ls -d /path/to/file

# Download data
iget -Kv /path/to/remote/file

# Query by metadata
imeta qu -z /seq -d sample = "sample_id"
```

## Output Files

All job outputs are stored in the `results/` directory:
- `lolkek.txt` - Output from basic job
- `sample*.txt` - Outputs from array jobs
- Log files with naming pattern: `output.log`, `error.log`

## Useful Commands

Monitor your jobs:
```bash
bjobs # List your jobs
bqueues # Show available queues
bhist # Job history
bkill # Kill a job
```

## Troubleshooting

### Common Issues

1. **"command not found" errors**: Use absolute paths for scripts
```bash
# Instead of: script.sh
# Use: ./scripts/script.sh
```

2. **Permission denied**: Make scripts executable
```bash
chmod +x scripts/*.sh
```

3. **Array indexing**: Remember that `LSB_JOBINDEX` starts from 1, but array indices start from 0
```bash
sample_index=$((LSB_JOBINDEX - 1))
```

## Learning Objectives

By the end of this course, you should be able to:
- Submit basic and array jobs to LSF
- Understand job resource requirements and queues
- Use modules and Singularity for software management
- Handle file I/O and error logging
- Work with iRODS for data management
- Debug common job submission issues

## Additional Resources

- LSF Documentation: Check cluster-specific [documentation](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options)
- Singularity User Guide: For [containerized](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) applications
- iRODS Documentation: For data [management](https://docs.irods.org) workflows

---

*This tutorial is designed for the FARM cluster environment and may need adaptation for other HPC systems.*