https://github.com/juliafolds/foldscuda.jl

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)
https://github.com/juliafolds/foldscuda.jl

cuda gpu high-performance iterators julia map-reduce parallel transducers

Last synced: 3 months ago
JSON representation

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

Host: GitHub
URL: https://github.com/juliafolds/foldscuda.jl
Owner: JuliaFolds
License: mit
Created: 2020-10-11T22:53:02.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2023-05-30T21:44:34.000Z (about 2 years ago)
Last Synced: 2024-04-25T05:02:11.587Z (about 1 year ago)
Topics: cuda, gpu, high-performance, iterators, julia, map-reduce, parallel, transducers
Language: Julia
Homepage:
Size: 1.3 MB
Stars: 54
Watchers: 7
Forks: 4
Open Issues: 9
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # FoldsCUDA

[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliafolds.github.io/FoldsCUDA.jl/dev)

[![Buildkite status](https://badge.buildkite.com/c4196ef2fa588454c146bab0001d0f8de876aa864ab7c5de80.svg?branch=master)](https://buildkite.com/julialang/foldscuda-dot-jl)

[![Run tests w/o GPU](https://github.com/JuliaFolds/FoldsCUDA.jl/workflows/Run%20tests%20w/o%20GPU/badge.svg)](https://github.com/JuliaFolds/FoldsCUDA.jl/actions?query=workflow%3A%22Run+tests+w%2Fo+GPU%22)

FoldsCUDA.jl provides

[Transducers.jl](https://github.com/JuliaFolds/Transducers.jl)-compatible

fold (reduce) implemented using

[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl).  This brings the

transducers and reducing function combinators implemented in

Transducers.jl to GPU.  Furthermore, using

[FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), you can write

parallel `for` loops that run on GPU.

## API

FoldsCUDA exports `CUDAEx`, a parallel loop

[executor](https://juliafolds.github.io/Transducers.jl/dev/explanation/glossary/#glossary-executor).

It can be used with the parallel `for` loop created with

[`FLoops.@floop`](https://github.com/JuliaFolds/FLoops.jl),

`Base`-like high-level parallel API in

[Folds.jl](https://github.com/JuliaFolds/Folds.jl), and extensible

transducers provided by

[Transducers.jl](https://github.com/JuliaFolds/Transducers.jl).

## Examples

### `findmax` using FLoops.jl

You can pass CUDA executor `FoldsCUDA.CUDAEx()` to `@floop` to run a

parallel `for` loop on GPU:

```julia

julia> using FoldsCUDA, CUDA, FLoops

julia> using GPUArrays: @allowscalar

julia> xs = CUDA.rand(10^8);

julia> @allowscalar xs[100] = 2;

julia> @allowscalar xs[200] = 2;

julia> @floop CUDAEx() for (x, i) in zip(xs, eachindex(xs))

           @reduce() do (imax = -1; i), (xmax = -Inf32; x)

               if xmax < x

                   xmax = x

                   imax = i

               end

           end

       end

julia> xmax

2.0f0

julia> imax  # the *first* position for the largest value

100

```

### `extrema` using `Transducers.TeeRF`

```julia

julia> using Transducers, Folds

julia> @allowscalar xs[300] = -0.5;

julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())

(-0.5f0, 2.0f0)

julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx())  # iterator comprehension works

(-1.0f0, 4.0f0)

julia> Folds.reduce(TeeRF(min, max), Map(x -> 2x)(xs), CUDAEx())  # equivalent, using a transducer

(-1.0f0, 4.0f0)

```

### More examples

For more examples, see the

[examples section in the documentation](https://juliafolds.github.io/FoldsCUDA.jl/dev/examples/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/juliafolds/foldscuda.jl

Awesome Lists containing this project

README