https://github.com/zincware/laufband

Parallel Iteration with File-Based Coordination
https://github.com/zincware/laufband

Last synced: 12 months ago
JSON representation

Parallel Iteration with File-Based Coordination

Host: GitHub
URL: https://github.com/zincware/laufband
Owner: zincware
License: mit
Created: 2025-04-26T17:38:39.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-07-21T19:41:14.000Z (12 months ago)
Last Synced: 2025-07-21T21:36:09.402Z (12 months ago)
Language: Python
Homepage:
Size: 249 KB
Stars: 7
Watchers: 1
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


![Logo](https://github.com/user-attachments/assets/c8d2a3f9-284b-474d-b46d-98612c9d266b)



# Laufband: Embarrassingly parallel, embarrassingly simple!

[![codecov](https://codecov.io/gh/zincware/laufband/graph/badge.svg?token=9DJ3YZGTBA)](https://codecov.io/gh/zincware/laufband)

Laufband enables parallel iteration over a dataset from multiple processes, utilizing file-based locking and communication to ensure each item is processed exactly once.

## Installation

Install Laufband using pip:

```bash

pip install laufband

```

## Usage

Using Laufband is similar to the familiar `tqdm` progress bar for sequential iteration.

```python

from laufband import Laufband

data = list(range(100))

for item in Laufband(data):

    # Process each item in the dataset

    pass

```

The true power of Laufband emerges when you run your script in parallel. Multiple processes will coordinate using file-based locking to ensure that each item in the dataset is processed by only one process.

Here's a typical example demonstrating parallel processing with Laufband and file-based locking for shared resource access:

```python

import json

import time

from pathlib import Path

from laufband import Laufband

output_file = Path("data.json")

output_file.write_text(json.dumps({"processed_data": []}))

data = list(range(100))

worker = Laufband(data, desc="using Laufband")

for item in worker:

    # Simulate some computationally intensive task

    time.sleep(0.1)

    with worker.lock:

        # Access and modify a shared resource (e.g., a file) safely using the lock

        file_content = json.loads(output_file.read_text())

        file_content["processed_data"].append(item)

        output_file.write_text(json.dumps(file_content))

```

To execute this script (`main.py`) in parallel, you can use a command like the following in your terminal (this example launches 10 background processes):

```bash

for i in {1..10} ; do python main.py & done

```

> [!IMPORTANT]

> The different processes may finish at different times. Therefore, the order of items in `file_content` is not guaranteed.

> If the order is important, you will need to implement sorting logic afterwards.

### Failure Policy

In Laufband, a job will be automatically marked as failed if the iteration is interrupted by:

- an unhandled Exception

- or an explicit break.

```python

from laufband import Laufband

data = list(range(100))

# Example 1: break

for item in Laufband(data):

    if item == 50:

        break  # Job 50 will be marked as failed

# Example 2: Exception

for item in Laufband(data):

    if item == 70:

        raise ValueError("Something went wrong")  # Job 70 will be marked as failed

```

If you want to exit early but still mark the job as successfully completed,

you should use `Laufband.close()` instead of `break`:

```python

from laufband import Laufband

data = list(range(100))

worker = Laufband(data)

for item in worker:

    if item == 50:

        worker.close()  # Job 50 will be marked as completed, and iteration will stop cleanly

```

# Examples

## ASE Calculator

For atomistic data, the ASE package is widely used to calculate energies and forces of atomic configurations using either _ab initio_ methods or machine-learned interatomic potentials (MLIPs).

You can use Laufband to parallelize these calculations easily without duplication or manual bookkeeping and automatic checkpointing.

The following example uses a MACE foundation model to compute energies and forces on the ASE S22 dataset.

> [!TIP]

> You can safely run this script multiple times — even across multiple SLURM jobs — without any modifications.

> Laufband will automatically coordinate which configurations are processed.

> For local parallelization, you can use bash: `for i in {1..10} ; do python main.py & done`

```python

import ase.io

from ase.collections import s22

from laufband import Laufband

from mace.calculators import mace_mp

# Initialize calculator

calc = mace_mp(model="medium", dispersion=False, default_dtype="float32")

worker = Laufband(list(s22))

for atoms in worker:

    atoms.calc = calc

    atoms.get_potential_energy()

    with worker.lock:

        ase.io.write("frames.xyz", atoms, append=True)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zincware/laufband

Awesome Lists containing this project

README