https://github.com/milleniumbug/cudagameoflife

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/milleniumbug/cudagameoflife
Owner: milleniumbug
License: mit
Created: 2017-05-24T23:47:36.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2017-06-08T02:07:47.000Z (about 9 years ago)
Last Synced: 2025-01-10T20:42:11.141Z (over 1 year ago)
Language: C++
Size: 66.4 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          cudaGameOfLife

==============

Exactly what it says on the tin.

Current inner workings

----------------------

My board is split up in square "blocks", and each one of them manages memory allocated by CUDA - "next" and "current" block. Currently, each cell is a `bool` - bit operations would be way better, I think. Each block has a size of 64x64=4096 cells. I provide the following to the kernel:

 - pointer to contiguous "flattened" block to "next"

 - a pointer to array of 9 "current" blocks, which are essentially neighbours

 - an "out" bool array of 9, where the information about whether there are alive cells on borders (in other words, information whether I need to materialize new blocks)

The kernel is launched with something like this:

```

	const int blockDimension = 64;

	const int threadsPerDimension = 16;

	const dim3 threadsPerBlock(threadsPerDimension, threadsPerDimension);

	const dim3 dimensions(blockDimension / threadsPerBlock.x, blockDimension / threadsPerBlock.y);

	nextGenerationKernel <<< dimensions, threadsPerBlock >>> (next.getDevice(), cudaSurrounding.getDevice(), borderCheck.getDevice());

	auto result = bordersToHost();

```

`bordersToHost()` runs a `cudaMemcpy` in order to get the border information back to host (AFAIK this is horrible because it's synchronous)

Performance improvements

------------------------

2000 generations on 20x20 block board with blocks = 64x64:

d35e8018e87a11d07e7ac159ba7e998439d5c2ff: 275 seconds

2000 generations

Without streams: Executed in: 139218618 microseconds (139 seconds)

With streams   : Executed in: 92492721 microseconds (92492 milliseconds)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/milleniumbug/cudagameoflife

Awesome Lists containing this project

README