https://github.com/cellgeni/farm-course
Materials for farm-course organised by cellgenIT
https://github.com/cellgeni/farm-course
Last synced: 4 months ago
JSON representation
Materials for farm-course organised by cellgenIT
- Host: GitHub
- URL: https://github.com/cellgeni/farm-course
- Owner: cellgeni
- Created: 2025-07-11T12:52:19.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2025-07-11T15:25:57.000Z (11 months ago)
- Last Synced: 2025-09-09T23:50:45.108Z (9 months ago)
- Language: Shell
- Size: 2.1 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FARM Course - LSF Job Scheduling and HPC Tutorial
This repository contains tutorial materials and example scripts for learning how to use the FARM cluster with LSF (Load Sharing Facility) job scheduling system.
## Overview
This course covers:
- Basic LSF job submission with `bsub`
- Array job processing
- Working with modules and Singularity containers
- Python script execution in HPC environments
- iRODS data management
## Repository Structure
```
farm-course/
├── README.md # This file
├── slides.pdf # Presentation slides
├── data/
│ ├── sample_table.csv # Sample data for array jobs
│ └── sample.list # Sample list file
├── scripts/
│ ├── script.sh # Basic bash script example
│ ├── array_script.bsub # LSF array job script
│ ├── python_script.bsub # Python execution with modules/Singularity
│ ├── add_values.py # Simple Python calculator
│ └── command_list.sh # Collection of useful commands
├── logs/ # Job log files (stdout/stderr)
│ ├── arrayOutput*.log # Array job outputs
│ ├── arrayError*.log # Array job errors
│ ├── output*.log # Standard job outputs
│ └── error*.log # Standard job errors
└── results/ # Output files from job executions
├── lolkek.txt
└── sample*.txt # Generated by array jobs
```
## Getting Started
### Prerequisites
- Access to FARM cluster
- Basic knowledge of bash scripting
- Familiarity with Python (optional)
### Basic Job Submission
1. **Simple job submission:**
```bash
bsub -G farm-course -q normal -n 1 -M "2G" -R "select[mem>2G] rusage[mem=2G]" -o "output%J.log" -e "error%J.log" ./scripts/script.sh
```
2. **Array job submission:**
```bash
bsub -J "[1-6]" < scripts/array_script.bsub
```
3. **Python script with modules:**
```bash
bsub < scripts/python_script.bsub
```
## Script Examples
### 1. Basic Bash Script (`script.sh`)
Demonstrates:
- File output redirection
- stdout vs stderr output
- Basic job execution
### 2. Array Job Script (`array_script.bsub`)
Demonstrates:
- LSF array job directives
- Reading sample data from CSV
- Job indexing with `LSB_JOBINDEX`
- Dynamic file creation based on job index
### 3. Python Calculator (`add_values.py`)
A simple Python script that:
- Accepts command line arguments
- Performs basic arithmetic
- Includes error handling
### 4. Module and Singularity Usage (`python_script.bsub`)
Shows how to:
- Load Python modules
- Execute Python scripts with different environments
- Use Singularity containers for reproducible environments
## Key LSF Directives
| Directive | Description |
|-----------|-------------|
| `#BSUB -G` | Specify user group |
| `#BSUB -q` | Queue selection |
| `#BSUB -n` | Number of cores |
| `#BSUB -M` | Memory limit |
| `#BSUB -R` | Resource requirements |
| `#BSUB -o` | Standard output file |
| `#BSUB -e` | Standard error file |
| `#BSUB -J` | Job array specification |
## Working with Data
### Sample Data Format
The `data/sample_table.csv` contains a simple list of sample names:
```
sample1
sample2
sample3
sample4
sample5
sample6
```
### Array Job Processing
Array jobs automatically process each sample using the `LSB_JOBINDEX` variable:
- Job index 1 processes `sample1`
- Job index 2 processes `sample2`
- And so on...
## Module System
Load available modules:
```bash
module avail -C python
module load ISG/python/3.12.3
module load cellgen/singularity
```
## iRODS Commands
Basic iRODS operations covered:
```bash
# List catalogs
ils /Sanger1/training
# Check metadata
imeta ls -d /path/to/file
# Download data
iget -Kv /path/to/remote/file
# Query by metadata
imeta qu -z /seq -d sample = "sample_id"
```
## Output Files
All job outputs are stored in the `results/` directory:
- `lolkek.txt` - Output from basic job
- `sample*.txt` - Outputs from array jobs
- Log files with naming pattern: `output.log`, `error.log`
## Useful Commands
Monitor your jobs:
```bash
bjobs # List your jobs
bqueues # Show available queues
bhist # Job history
bkill # Kill a job
```
## Troubleshooting
### Common Issues
1. **"command not found" errors**: Use absolute paths for scripts
```bash
# Instead of: script.sh
# Use: ./scripts/script.sh
```
2. **Permission denied**: Make scripts executable
```bash
chmod +x scripts/*.sh
```
3. **Array indexing**: Remember that `LSB_JOBINDEX` starts from 1, but array indices start from 0
```bash
sample_index=$((LSB_JOBINDEX - 1))
```
## Learning Objectives
By the end of this course, you should be able to:
- Submit basic and array jobs to LSF
- Understand job resource requirements and queues
- Use modules and Singularity for software management
- Handle file I/O and error logging
- Work with iRODS for data management
- Debug common job submission issues
## Additional Resources
- LSF Documentation: Check cluster-specific [documentation](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options)
- Singularity User Guide: For [containerized](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) applications
- iRODS Documentation: For data [management](https://docs.irods.org) workflows
---
*This tutorial is designed for the FARM cluster environment and may need adaptation for other HPC systems.*