https://github.com/juliaarrays/tilediteration.jl

Julia package to facilitate writing mulithreaded, multidimensional, cache-efficient code
https://github.com/juliaarrays/tilediteration.jl
Last synced: 10 months ago
JSON representation
Julia package to facilitate writing mulithreaded, multidimensional, cache-efficient code
Host: GitHub
URL: https://github.com/juliaarrays/tilediteration.jl
Owner: JuliaArrays
License: other
Created: 2016-08-26T15:08:29.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2024-05-10T10:36:39.000Z (about 2 years ago)
Last Synced: 2025-02-21T07:17:39.787Z (over 1 year ago)
Language: Julia
Size: 92.8 KB
Stars: 81
Watchers: 6
Forks: 9
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

          # TiledIteration

[![CI](https://github.com/JuliaArrays/TiledIteration.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/JuliaArrays/TiledIteration.jl/actions/workflows/CI.yml)

[![codecov.io](http://codecov.io/github/JuliaArrays/TiledIteration.jl/coverage.svg?branch=master)](http://codecov.io/github/JuliaArrays/TiledIteration.jl?branch=master)

This Julia package handles some of the low-level details for writing

cache-efficient, possibly-multithreaded code for multidimensional

arrays. A "tile" corresponds to a chunk of a larger array, typically a

region that is large enough to encompass any "local" computations you

need to perform; some of these computations may require temporary storage.

A related package with different aims is [TiledViews.jl](https://github.com/bionanoimaging/TiledViews.jl).

## Usage

This package offers two basic kinds of functionality: the management

of temporary buffers for processing on tiles, and the iteration over

disjoint tiles of a larger array.

### Iteration

#### SplitAxis and SplitAxes

The main use for these simple types is in distributing work across

threads, usually in circumstances that do not require

multidimensional locality as provided by `TileIterator`.  `SplitAxis`

splits a single array axis, and `SplitAxes` splits multidimensional

axes along the final axis.  For example:

```julia

julia> using TiledIteration

julia> A = rand(3, 20);

julia> collect(SplitAxes(axes(A), 4))

4-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:

 (1:3, 1:5)

 (1:3, 6:10)

 (1:3, 11:15)

 (1:3, 16:20)

```

You can also reduce the amount of work assigned to thread 1 (often the

main thread is responsible for scheduling the other threads):

```julia

julia> collect(SplitAxes(axes(A), 3.5))

4-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:

 (1:3, 1:2)

 (1:3, 3:8)

 (1:3, 9:14)

 (1:3, 15:20)

```

Using "3.5 chunks" forces the later workers to perform 6 columns of

work (rounding 20/3.5 up to the next integer), leaving only two

columns remaining for the first thread.

#### TileIterator

More general iteration over disjoint tiles of a larger array can be done

with `TileIterator`:

```julia

using TiledIteration

A = rand(1000,1000);   # our big array

for tileaxs in TileIterator(axes(A), (128,8))

    @show tileaxs

end

```

This produces

```julia

tileaxs = (1:128,1:8)

tileaxs = (129:256,1:8)

tileaxs = (257:384,1:8)

tileaxs = (385:512,1:8)

tileaxs = (513:640,1:8)

tileaxs = (641:768,1:8)

tileaxs = (769:896,1:8)

tileaxs = (897:1000,1:8)

tileaxs = (1:128,9:16)

tileaxs = (129:256,9:16)

tileaxs = (257:384,9:16)

tileaxs = (385:512,9:16)

...

```

You can see that the total axes range is split up into chunks,

which are of size `(128,8)` except at the edges of `A`. Naturally,

these axes serve as the basis for processing individual chunks of

the array.

As a further example, suppose you've started julia with `JULIA_NUM_THREADS=4`; then

```julia

function fillid!(A, tilesz)

    tileinds_all = collect(TileIterator(axes(A), tilesz))

    Threads.@threads for i = 1:length(tileinds_all)

        tileaxs = tileinds_all[i]

        A[tileaxs...] .= Threads.threadid()

    end

    A

end

A = zeros(Int, 8, 8)

fillid!(A, (2,2))

```

would yield

```julia

8×8 Array{Int64,2}:

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

 1  1  2  2  3  3  4  4

```

See also "EdgeIterator" below.

### Determining the chunk size

[Stencil computations](https://en.wikipedia.org/wiki/Stencil_code)

typically require "padding" values, so the inputs to a computation may

be of a different size than the resulting outputs. Naturally, you can

set the tile size manually; a simple convenience function,

`padded_tilesize`, attempts to pick reasonable choices for you

depending on the size of your kernel (stencil) and element type you'll

be using:

```julia

julia> padded_tilesize(UInt8, (3,3))

(768,18)

julia> padded_tilesize(UInt8, (3,3), 4)  # we want 4 of these to fit in L1 cache at once

(512,12)

julia> padded_tilesize(Float64, (3,3))

(96,18)

julia> padded_tilesize(Float32, (3,3,3))

(64,6,6)

```

### Allocating and managing temporary storage

To allocate temporary storage while working with tiles, use `TileBuffer`:

```julia

julia> tileaxs = (-1:15, 0:7)  # really this might have come from TileIterator

julia> buf = TileBuffer(Float32, tileaxs)

TiledIteration.TileBuffer{Float32,2,2} with indices -1:15×0:7:

 0.0  0.0          2.38221f-44  0.0          0.0          0.0          9.3887f-44   0.0

 0.0  1.26117f-44  0.0          0.0          0.0          8.26766f-44  0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          6.02558f-44  0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          7.28675f-44  0.0          0.0          0.0

 0.0  1.54143f-44  0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          9.94922f-44  0.0

 0.0  0.0          0.0          0.0          0.0          8.82818f-44  0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          9.10844f-44  0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          1.03696f-43  0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

 0.0  0.0          0.0          0.0          0.0          0.0          0.0          0.0

```

This returns an uninitialized buffer for use over the indicated domain. You can reuse this same storage for the next tile, even if the tile is smaller because it corresponds to the edge of the original array:

```julia

julia> pointer(buf)

Ptr{Float32} @0x00007f79131fd550

julia> buf = TileBuffer(buf, (16:20, 0:7))

TiledIteration.TileBuffer{Float32,2,2} with indices 16:20×0:7:

 0.0  0.0  0.0  0.0          0.0          0.0  0.0          0.0

 0.0  0.0  0.0  0.0          0.0          0.0  0.0          0.0

 0.0  0.0  0.0  0.0          1.54143f-44  0.0  0.0          0.0

 0.0  0.0  0.0  1.26117f-44  0.0          0.0  0.0          0.0

 0.0  0.0  0.0  0.0          0.0          0.0  2.38221f-44  0.0

julia> pointer(buf)

Ptr{Float32} @0x00007f79131fd550

```

When you use it again 
```julia 
julia> buf 
TiledIteration.TileBuf 
 0.0  0.0 
 0.0  1.26117f-44  0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  1.54143f-44  0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0 
 0.0  0.0          0.0

at the top of the next block of columns, it returns to its original size while still reusing the same memory: = TileBuffer(buf, (-1:15, 8:15)) fer{Float32,2,2} with indices -1:15×8:15: 2.38221f-44  0.0          0.0          0.0          9.3887f-44   0.0 0.0          0.0          8.26766f-44  0.0          0.0 0.0          0.0          0.0          0.0          0.0 6.02558f-44  0.0          0.0          0.0          0.0 0.0          7.28675f-44  0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          9.94922f-44  0.0 0.0          0.0          8.82818f-44  0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          9.10844f-44  0.0          0.0 0.0          0.0          0.0          1.03696f-43  0.0 0.0          0.0          0.0          0.0          0.0 0.0          0.0          0.0          0.0          0.0

julia> pointer(buf)

Ptr{Float32} @0x00007f79131fd550

```

### EdgeIterator

When performing stencil operations, oftentimes the edge of the array

requires special treatment. Several approaches to handling the edges

(adding explicit padding, or executing special code just when on the

boundaries) can slow your algorithm down because of extra steps or

branches.

This package helps support implementations which first handle the

"interior" of an array (for example using `TiledIterator` over just

the interior) using a "fast path," and then handle just the edges by a

(possibly) less carefully optimized algorithm. The key component of

this is `EdgeIterator`:

```julia

outerrange = CartesianIndices((-1:4, 0:3))

innerrange = CartesianIndices(( 1:3, 1:2))

julia> for I in EdgeIterator(outerrange, innerrange)

           @show I

       end

I = CartesianIndex(-1, 0)

I = CartesianIndex(0, 0)

I = CartesianIndex(1, 0)

I = CartesianIndex(2, 0)

I = CartesianIndex(3, 0)

I = CartesianIndex(4, 0)

I = CartesianIndex(-1, 1)

I = CartesianIndex(0, 1)

I = CartesianIndex(4, 1)

I = CartesianIndex(-1, 2)

I = CartesianIndex(0, 2)

I = CartesianIndex(4, 2)

I = CartesianIndex(-1, 3)

I = CartesianIndex(0, 3)

I = CartesianIndex(1, 3)

I = CartesianIndex(2, 3)

I = CartesianIndex(3, 3)

I = CartesianIndex(4, 3)

```

The time required to visit these edge sites is on the order of the

number of edge sites, not the order of the number of sites encompassed

by `outerrange`, and consequently is efficient.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/juliaarrays/tilediteration.jl

Awesome Lists containing this project

README