https://github.com/pxl-th/nnop.jl
Pure Julia NN kernels.
https://github.com/pxl-th/nnop.jl
gpgpu gpu julia
Last synced: over 1 year ago
JSON representation
Pure Julia NN kernels.
- Host: GitHub
- URL: https://github.com/pxl-th/nnop.jl
- Owner: pxl-th
- Created: 2025-02-05T23:34:10.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-03-03T22:53:10.000Z (over 1 year ago)
- Last Synced: 2025-03-03T23:29:13.028Z (over 1 year ago)
- Topics: gpgpu, gpu, julia
- Language: Julia
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NNop.jl
Pure Julia NN kernels.
> [!WARNING]
> The package is in the early stages and is not yet fully ready.
## Ops
### Flash Attention
Implementation of [FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/abs/2205.14135).
```julia
E = 64
L = 4096
H, B = 4, 4
q = ROCArray(rand(Float32, E, L, H, B))
k = ROCArray(rand(Float32, E, L, H, B))
v = ROCArray(rand(Float32, E, L, H, B))
o = flash_attention(q, k, v)
```
#### Benchmarks:
For the problem size `(E=64, L=4096, H=4, B=4)`.
||Naїve attention|Flash Attention|
|-|-|-|
|Execution time|55.034 ms|18.490 ms|
|Peak memory usage|4.044 GiB|16.500 MiB|
#### Features:
- Forward & backward passes.
- Arbitrary sequence length.
- Arbitrary head sizes.
- FP32, FP16, BFP16 support.
In progress:
- [ ] Causal masking.
- [ ] Variable sequence length.
### Fused (online) Softmax
Implementation of [Online normalizer calculation for softmax](https://arxiv.org/abs/1805.02867).
```julia
x = ROCArray(ones(Float32, 8192, 1024))
y = online_softmax(x)
```
||Naїve Softmax|Online Softmax|
|-|-|-|
|Execution time|745.123 μs|61.600 μs|
|Peak memory usage|64.258 MiB|32.000 MiB|