An open API service indexing awesome lists of open source software.

https://github.com/milleniumbug/cudagameoflife


https://github.com/milleniumbug/cudagameoflife

Last synced: 27 days ago
JSON representation

Awesome Lists containing this project

README

          

cudaGameOfLife
==============

Exactly what it says on the tin.

Current inner workings
----------------------

My board is split up in square "blocks", and each one of them manages memory allocated by CUDA - "next" and "current" block. Currently, each cell is a `bool` - bit operations would be way better, I think. Each block has a size of 64x64=4096 cells. I provide the following to the kernel:
- pointer to contiguous "flattened" block to "next"
- a pointer to array of 9 "current" blocks, which are essentially neighbours
- an "out" bool array of 9, where the information about whether there are alive cells on borders (in other words, information whether I need to materialize new blocks)

The kernel is launched with something like this:

```
const int blockDimension = 64;
const int threadsPerDimension = 16;
const dim3 threadsPerBlock(threadsPerDimension, threadsPerDimension);
const dim3 dimensions(blockDimension / threadsPerBlock.x, blockDimension / threadsPerBlock.y);
nextGenerationKernel <<< dimensions, threadsPerBlock >>> (next.getDevice(), cudaSurrounding.getDevice(), borderCheck.getDevice());
auto result = bordersToHost();
```

`bordersToHost()` runs a `cudaMemcpy` in order to get the border information back to host (AFAIK this is horrible because it's synchronous)

Performance improvements
------------------------

2000 generations on 20x20 block board with blocks = 64x64:

d35e8018e87a11d07e7ac159ba7e998439d5c2ff: 275 seconds

2000 generations
Without streams: Executed in: 139218618 microseconds (139 seconds)
With streams : Executed in: 92492721 microseconds (92492 milliseconds)