An open API service indexing awesome lists of open source software.

https://github.com/drilonaliu/parallel-fractal-tree

GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.
https://github.com/drilonaliu/parallel-fractal-tree

cuda fractal-tree fractals gpu

Last synced: about 1 month ago
JSON representation

GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.

Awesome Lists containing this project

README

          

# Parallel-Fractal-Tree
GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.

# Fractal Tree

The Fractal Tree is a well-known fractal that visually represents the concept of recursion in nature. The construction of this fractal begins with a single trunk, which then splits into two branches at a certain angle. Each of these branches further splits into two smaller branches, continuing this process recursively. The initial iterations of this fractal are shown below.



Tree 1
Tree 2
Tree 3
Tree 4


## Constructing branches

Let the points $A(x_1, y_1)$ and $B(x_2, y_2)$ be given. We have to find the point
$C(c_1, c_2)$ such that
$\overrightarrow{AB} = \lambda \overrightarrow{BC}$, and the points $C'$ and
$C''$ such that $\angle CBC' = \alpha = \angle CBC''$.



\
From the equation $\overrightarrow{AB} = \lambda \overrightarrow{BC}$ we get

$$(x_2 - x_1, y_2 - y_1) = \lambda (c_1 - x_2, c_2 - y_2)$$

\
By equating the corresponding coordinates of the pairs listed on both sides
of the equation we get

$$x_2 - x_1 = \lambda (c_1 - x_2)$$

$$y_2 - y_1 = \lambda (c_2 - y_2)$$

\
Solving for $c_1$, $c_2$ we get:

$$c_1 = \frac{x_2 (1 + \lambda) - x_1}{\lambda}$$

$$c_2 = \frac{y_2 (1 + \lambda) - y_1}{\lambda}$$

\
We rotate the point $C$ around the point $B$ by the angle $\alpha$ and get
point $C''$ with coordinates:

$$c_1'' = (c_1 - x_2) \cos(\alpha) - (c_2 - y_2) \sin(\alpha) + x_2$$

$$c_2'' = (c_1 - x_2) \sin(\alpha) + (c_2 - y_2) \cos(\alpha) + y_2$$

\
Similarly, point $C'$ is obtained by rotating point $C$ around point $B$
for the $-\alpha$ angle.

## Number of vertices
\
Let $f(n)$ be the number of edges of the fractal in $n$ iteration. After
each iteration, the number of edges doubles and we have:

$$f(n) = 2^0 + 2^1 + 2^2 + \cdots + 2^n = 2^{n+1} - 1.$$

\
Each edge has two points and the number of points will be $2 \cdot f(n)$,
and we use this expression to connect the points as segments in part
visualization.

```
void renderTreeFromBuffer() {
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, nullptr);
glEnableVertexAttribArray(0);
glColor3f(0.0f, 0.0f, 0.0f);
int numberOfVertices = 2 * (pow(2, iteration + 1) - 1);
glDrawArrays(GL_LINES, 0, numberOfVertices2(iterations));
glutSwapBuffers();
}
```

## Parallelization

Parallelization occurs in a hierarchical manner, where in each iteration they will be
active a certain number of threads. Each active thread will get
the $i$ -th branch of the picture, and thread $i$ will construct the right branch or
left depending on the index. The newly acquired branch is inserted into the array of
all branches. The size of this array will be $f(n)$.

\
For each iteration there will be $2^n$ active threads. With the increase of
iterations, the number of active threads will rise exponentially:

1. Iteration 0: Only the thread with id 0 is active. This thread initializes the
initial branch.

2. Iteration 1: Threads 2 and 3 are active. Thread 2 constructs the left branch from the initial branch and inserts it into the branch array at position 2. Thread 3 constructs the right branch and inserts it into the branch array at position 3.

3. Iteration 2: Threads 4, 5, 6, and 7 are active. Threads 4 and 5 construct the left and right branches, respectively, of the branch at position 2 in the array. Threads 6 and 7 construct the left and right branches, respectively, of the branch at position 3 in the array. The created branches are inserted into the branch array for the next iteration.

4. Iteration $n$: Threads $2^n$ to $2^{n+1}-1$ will be active.
These threads take the respective branches and, depending on the index, construct the left or right branch.



\
Parallelization and data insertion are modeled in a binary graph . Each node in the graph represents a thread that constructs a branch, while the edges of the graph represent the branching in future iterations.
Each thread $i$ takes the branch from the parent position $i/2$ in the branch array, constructs the right or the
left branch, and the newly constructed branch is inserted at position $i$ in the array.
Each level of the graph represents the threads that are working in parallel.

\
The leftmost index at the level n of the graph is:

$$\text{leftMost}(n) = 2^n$$

The rightmost index at the level n of the graph is:

$$\text{rightMost}(n) = \text{leftMost}(n+1) - 1 = 2^{n+1} - 1$$

Number of of threads working in parallel at level n is:

$$\text{leftMost}(n+1) - \text{leftMost}(n) = 2^{n+1} - 2^n = 2^n$$

## The kernel

The following kernel generates the fractal through parallel computation. Initially, the thread with index 0 handles the initial branch. For each iteration, threads within the interval start_at to end_at are active and construct new branches. Threads with even index numbers construct the left branches, while threads with odd index numbers construct the right branches. The constructed branches are stored in the branch array, which will be branched in the next iteration. The two endpoints of the branch are added to the points array. Synchronization ensures that all threads complete their tasks before proceeding to the next iteration.

```
__global__ void branchDivide(float* points, Branch branch, Branch* branches, float angle_left, float angle_right, int start_iteration, int max_iterations, int threadShiftIndex) {

int idx = threadIdx.x + blockIdx.x * blockDim.x;;
idx += threadShiftIndex;

Branch childBranch,parentBranch;
float angle;
auto g = cg::this_grid();

if (idx == 0) {
points[0] = branch.start.x;
points[1] = branch.start.y;
points[2] = branch.end.x;
points[3] = branch.end.y;
branches[1] = branch;
}

for (int iteration = start_iteration; iteration <= max_iterations; iteration++) {
float start_at = round(pow(2, iteration));
int end_at = round((pow(2, iteration + 1))) - 1;

if (idx >= start_at && idx <= end_at) {
int parentNode = idx / 2;
parentBranch = branches[parentNode];
int t = idx % 2;

if (t == 0) {
angle = angle_left;
}
else {
angle = angle_right;
}

childBranch = makeChildBranch(parentBranch,angle);
branches[idx] = childBranch;
//add points to points array;
int offset = 2 * 2 * (idx - 1);
points[offset] = childBranch.start.x;
points[offset + 1] = childBranch.start.y;
points[offset + 2] = childBranch.end.x;
points[offset + 3] = childBranch.end.y;
}
g.sync();
}
}
```

## Comparisons

The table below compares the execution time in microseconds of the fractal between the sequential version and the parallel version. The CUDA implementation was done using cooperative groups, with 23,040 threads per kernel call.

KTime comparison.


Iterimi
C++
CUDA



02327
14025
24723
33928
44027
55827
633229
728828
810526
915526
1043627
1154124
1273329
13156329
14257643
15486147
16902789
1722230148
1837435253
1975886382
20145146745
212949501268
225846442560
2311841015265
242347734142678
254705941439919
269416423789610



****

[\[fig:tree_graph\]](#fig:tree_graph){reference-type="ref"
reference="fig:tree_graph"}.