https://github.com/juliafolds/foldscuda.jl
Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)
https://github.com/juliafolds/foldscuda.jl
cuda gpu high-performance iterators julia map-reduce parallel transducers
Last synced: about 1 month ago
JSON representation
Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)
- Host: GitHub
- URL: https://github.com/juliafolds/foldscuda.jl
- Owner: JuliaFolds
- License: mit
- Created: 2020-10-11T22:53:02.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-05-30T21:44:34.000Z (almost 2 years ago)
- Last Synced: 2024-04-25T05:02:11.587Z (about 1 year ago)
- Topics: cuda, gpu, high-performance, iterators, julia, map-reduce, parallel, transducers
- Language: Julia
- Homepage:
- Size: 1.3 MB
- Stars: 54
- Watchers: 7
- Forks: 4
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FoldsCUDA
[](https://juliafolds.github.io/FoldsCUDA.jl/dev)
[](https://buildkite.com/julialang/foldscuda-dot-jl)
[](https://github.com/JuliaFolds/FoldsCUDA.jl/actions?query=workflow%3A%22Run+tests+w%2Fo+GPU%22)FoldsCUDA.jl provides
[Transducers.jl](https://github.com/JuliaFolds/Transducers.jl)-compatible
fold (reduce) implemented using
[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl). This brings the
transducers and reducing function combinators implemented in
Transducers.jl to GPU. Furthermore, using
[FLoops.jl](https://github.com/JuliaFolds/FLoops.jl), you can write
parallel `for` loops that run on GPU.## API
FoldsCUDA exports `CUDAEx`, a parallel loop
[executor](https://juliafolds.github.io/Transducers.jl/dev/explanation/glossary/#glossary-executor).
It can be used with the parallel `for` loop created with
[`FLoops.@floop`](https://github.com/JuliaFolds/FLoops.jl),
`Base`-like high-level parallel API in
[Folds.jl](https://github.com/JuliaFolds/Folds.jl), and extensible
transducers provided by
[Transducers.jl](https://github.com/JuliaFolds/Transducers.jl).## Examples
### `findmax` using FLoops.jl
You can pass CUDA executor `FoldsCUDA.CUDAEx()` to `@floop` to run a
parallel `for` loop on GPU:```julia
julia> using FoldsCUDA, CUDA, FLoopsjulia> using GPUArrays: @allowscalar
julia> xs = CUDA.rand(10^8);
julia> @allowscalar xs[100] = 2;
julia> @allowscalar xs[200] = 2;
julia> @floop CUDAEx() for (x, i) in zip(xs, eachindex(xs))
@reduce() do (imax = -1; i), (xmax = -Inf32; x)
if xmax < x
xmax = x
imax = i
end
end
endjulia> xmax
2.0f0julia> imax # the *first* position for the largest value
100
```### `extrema` using `Transducers.TeeRF`
```julia
julia> using Transducers, Foldsjulia> @allowscalar xs[300] = -0.5;
julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())
(-0.5f0, 2.0f0)julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx()) # iterator comprehension works
(-1.0f0, 4.0f0)julia> Folds.reduce(TeeRF(min, max), Map(x -> 2x)(xs), CUDAEx()) # equivalent, using a transducer
(-1.0f0, 4.0f0)
```### More examples
For more examples, see the
[examples section in the documentation](https://juliafolds.github.io/FoldsCUDA.jl/dev/examples/).