An open API service indexing awesome lists of open source software.

https://github.com/fuzzypixelz/parallelfft


https://github.com/fuzzypixelz/parallelfft

Last synced: 5 months ago
JSON representation

Awesome Lists containing this project

README

          

* Introduction

To quote the official website:

#+begin_quote
Futhark is a small programming language designed to be compiled to efficient
parallel code. It is a statically typed, data-parallel, and purely functional
array language in the ML family, and comes with a heavily optimizing
ahead-of-time compiler that presently generates either GPU code via CUDA and
OpenCL, or multi-threaded CPU code.
#+end_quote

I sought to write idiomatic Futhark code without digging too much into compiler
internals. This allowed me to see the performance one could expect from Futhark
without knowing exactly how code is executed.

Throughout the project, my model of Futhark semantics was that of
MapReduce. Hence why I avoided algorithms with explicit memory operations and
searched for methods to calculate the Fourier Transform using only parallel
array operations.

* Fourier Transform

(If you're reading this on Github, kindly refer to the PDF file instead; many
forms are not rendered correctly)

** Discrete Fourier Transform

For any array of complex numbers $\bf a$ of length $N$, its Discrete Fourier
Transform (DFT) is defined component-wise as:

$$\hat{a}_k = \sum_{i=0}^{N-1}{a_{i}{\zeta}^{ik}}$$

where

$$\hat{\bf a} = (\hat{a}_0, \dots, \hat{a}_{N-1})$$

and $\zeta$ is a principal root of unity in the ring of complex numbers.

** Parallel Fast Fourier Transform

The following algorithm is described in the [[https://dl.acm.org/doi/10.1145/2331684.2331693][MapReduce-SSA]] paper by Tsz-Wo Sze,
where a MapReduce-friendly, parallel and relatively simple FFT algorithm is
needed to perform large integer multiplication. The exception being that the
paper applies FFT in the ring of integers modulo $2^n + 1$ while I work with
complex numbers. I will avoid going into the tedious proof.

If we write $N = PQ$ for some positive $P$ and $Q$ (in the code I assume $N$ is
a perfect square) we can compute the Fourier Transform of $\bf a$ as:

1. $P$ DFTs of $Q$ point arrays, in parallel
2. then, $Q$ DFTs of $P$ point arrays, in parallel

In fact, write for all $0\le p < P$:

$${\bf{a}}^{(p)} = (a_p, \dots, a_{(Q-2)P+p}, a_{(Q-1)P+p})$$

in the code, this is referred to as =aslices=. These constitute the $P$ DFTs of
$Q$ points needed; they are computed in parallel because there are no inter-dependencies.

Next, for all $0\le q < Q$, we define:

$${\bf{z}}^{[q]} = (z_{qP}, \dots, z_{qP+(P-2)}, z_{qP+(P-1)})$$

where

$$z_{qP+p} = \zeta^{pq} \widehat{a^{(p)}}_q$$

This corresponds to =zslices=. Likewise, these are the remaining $Q$ DFTs.

Finally, we get for all $p$ and $q$:

$$\hat{a}_{pQ+q} = \widehat{z^{[q]}}_p$$

** Implementation

At first, I naively wrote two separate functions to compute =aslices= and =zslices=
respectively:

#+begin_src ml
def aslice [n] (f: factorize) (a: [n]complex.complex) (p: i64) =
let (p_max, q_max) = f n
in map (\q -> a[q * p_max + p]) (0.. root' p q complex.* dft (aslice f a p) q) (0..