{"id":18399809,"url":"https://github.com/mrc-ide/dust-random-bench","last_synced_at":"2025-04-12T16:25:17.072Z","repository":{"id":45959018,"uuid":"420997466","full_name":"mrc-ide/dust-random-bench","owner":"mrc-ide","description":null,"archived":false,"fork":false,"pushed_at":"2021-11-24T12:53:34.000Z","size":369,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-16T03:16:37.713Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mrc-ide.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-25T11:35:25.000Z","updated_at":"2021-11-24T12:53:35.000Z","dependencies_parsed_at":"2022-09-02T18:20:25.094Z","dependency_job_id":null,"html_url":"https://github.com/mrc-ide/dust-random-bench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrc-ide%2Fdust-random-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrc-ide%2Fdust-random-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrc-ide%2Fdust-random-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mrc-ide%2Fdust-random-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mrc-ide","download_url":"https://codeload.github.com/mrc-ide/dust-random-bench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248595162,"owners_count":21130471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T02:28:35.134Z","updated_at":"2025-04-12T16:25:17.039Z","avatar_url":"https://github.com/mrc-ide.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Benchmarking dust's RNG on a GPU\n\nThis repository explores benchmarking dust's random number generator on GPU vs \"curand\" (the built-in random number library that we decided not to use).  Our random number generators are described [in the dust docs](https://mrc-ide.github.io/dust/articles/rng.html) and our reasons for the decisions we made are [in the last section of that document](https://mrc-ide.github.io/dust/articles/rng.html#other-packages-with-similar-functionality-1). In particular, we want random numbers that can be created identically on the CPU and GPU, do not depend on the number of threads, and which support per-draw parameter changes.\n\n## Compilation and use\n\nCompilation requires knowing the compute mode of your device; this will likey be\n\n* 75 (Turing), e.g. GeForce RTX 2080 Ti\n* 86 (Ampere), e.g. GeForce RTX 3090, A5000, A100\n\nRunning `./configure` with no arguments will attempt to detect this by compiling a very small program with `nvcc`. You can also force a version by running `./configure 86`.\n\nAfter configuration, run `make` which will download the latest dust-random release and build two binaries (`curand` and `dustrand`).\n\nThese binaries take positional arguments `\u003cdistribution\u003e \u003cn_threads\u003e \u003cn_draws\u003e`, for example\n\n```\n./curand uniform 16384 1000000\n```\n\nwill draw 1 million uniformly distributed random numbers on _each of_ 2^14 different threads, in parallel (so a total of ~65 billion numbers).  This should take considerably less than one second.  Output looks like:\n\n```\nengine: curand, distribution: uniform, n_threads: 16384, n_draws: 1000000, t_setup: 0.00242017, t_sample: 0.0305579\n```\n\nwhich contains\n\n* `engine`: either `dust` or `curand`\n* `distribution`, `n_threads` and `n_draws`: as given in inputs\n* `t_setup`: Wall time (in seconds) for random number initialisation, including allocations on the GPU\n* `t_sample`: Wall time (in seconds) for carrying out the samples\n\nSuported distributions are:\n\n* both `curand` and `dustrand`: `uniform`, `normal_box_muller`, `poisson`\n* `dustrand` only: `normal_polar`, `normal_ziggurat`, `exponential`, `binomial`\n\nThe script `bench.py` will run the benchmark programs with varying `n_threads` and `n_draws` to create the file `data/uniform.csv` (this script requires python3 but only standard modules).  This should require a minute or so to run.\n\nThe script `plot.R` will make some plots with this output.\n\n## Results\n\nSetup cost is linear in the number of threads for dust, nonlinear for curand. In both cases, the time taken is nontrivial where the number of samples drawn will be low.\n\n![Plot of setup cost vs number of threads for the two engines](figs/setup.png)\n\nFor both engines as the number of draws increase we approach a stable time per sample. The 'y' here is the walltime divided by the number of draws taken, which is flat when the per-draw time is constant. The dust graph shows an overhead with small numbers of threads and small numbers of draws per thread.\n\n![Plot of per-draw timing vs number of draws for the two engines](figs/sample.png)\n\nPerformance for the two generators is very siumilar, with the ratio of performance approaching 1 over a range of number of number of threads. curand does better for small numbers of draws per thread, dust genreally seems to do slightly better for very large numbers of threads or large numbers of draws per threads.\n\n![Plot of relative performance vs numbers of draws for the two engines](figs/sample-rel.png)\n\n\n## Profiling\n\nYou can also profile these kernels with `ncu`.  First, make sure to compile with profiling enabled:\n\n```\n./configure --enable-profiler\nmake clean all\n```\n\nThen run like\n\n```\nncu -o profile-uniform-dust --set full ./dustrand uniform 131072 1000000\nncu -o profile-uniform-curand --set full ./curand uniform 131072 1000000\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrc-ide%2Fdust-random-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrc-ide%2Fdust-random-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrc-ide%2Fdust-random-bench/lists"}