
An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™

distributed-computing high-performance julia parallel transducers

Last synced: 2 months ago
JSON representation

Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™

Awesome Lists containing this project



# FLoops: `fold` for humans™

[![GitHub Actions](](

[FLoops.jl]( provides a macro
`@floop`. It can be used to generate a fast generic sequential and parallel
iteration over complex collections.

Furthermore, the loop written in `@floop` can be executed with any compatible
See [FoldsThreads.jl]( for
various thread-based executors that are optimized for different kinds of
loops. [FoldsCUDA.jl]( provides an
executor for GPU. FLoops.jl also provide a simple distributed executor.

## Update notes

FLoops.jl 0.2 defaults to a parallel loop; i.e., it uses a parallel executor
(e.g., `ThreadedEx`) when the executor is not specified and the explicit
sequential form `@floop begin ... end` is not used.

That is to say, `@floop` without `@reduce` such as

@floop for i in eachindex(ys, xs)
ys[i] = f(xs[i])

is now executed in parallel by default.

## Usage

### Parallel loop

`@floop` is a superset of `Threads.@threads` (see below) and in particular
supports complex reduction with additional syntax `@reduce`:

julia> using FLoops # exports @floop macro

julia> @floop for (x, y) in zip(1:3, 1:2:6)
a = x + y
b = x - y
@reduce s += a
@reduce t += b
(s, t)
(15, -3)

For more examples, see
[parallel loops tutorial](

### Sequential (single-thread) loop

Simply wrap a `for` loop and its initialization part with `@floop begin ... end`:

julia> @floop begin
s = 0
for x in 1:3
s += x

For more examples, see
[sequential loops tutorial](

## Advantages over `Threads.@threads`

`@floop` is a superset of `Threads.@threads` and has a couple of advantages:

* `@floop` supports various input collection types including
arrays, dicts, sets, strings, and many iterators from `Base.Iterators` such
as `zip` and `product`. More precisely, `@floop` can generate high-performance
parallel iterations for any collections that supports
* With [`FoldsThreads.NondeterministicEx`](,
`@floop` can even parallelize iterations over non-parallelizable input collections
(although it is beneficial only for heavier workload).
* [FoldsThreads.jl]( provides
multiple alternative thread-based executors (= loop execution backend) that
can be used to tune the performance without touching the loop itself.
* [FoldsCUDA.jl]( provides a simple
GPU executor.
* `@reduce` syntax for supporting complex reduction in a forward-compatible manner
* Note: `threadid`-based reduction (that is commonly used in conjunction with
`@threads`) may not be forward-compatible to Julia that supports
migrating tasks across threads.
* There is a trick for ["changing" the effective number of threads without
restarting `julia` using the `basesize`

The relative disadvantages may be that `@floop` is much newer than
`Threads.@threads` and has much more flexible internals. These points can
contribute to undiscovered bugs.

## How it works

`@floop` works by converting the native Julia `for` loop syntax to
`foldl` defined by
[Transducers.jl]( Unlike
`foldl` defined in `Base`, `foldl` defined by Transducers.jl is
[powerful enough to cover the `for` loop semantics and more](