An open API service indexing awesome lists of open source software.

https://github.com/cafxx/fastrand

Fast, concurrent PRNGs
https://github.com/cafxx/fastrand

atomic concurrency golang lock-free pcg pcg-random performance prng random random-number-generators splitmix xoshiro256

Last synced: 5 months ago
JSON representation

Fast, concurrent PRNGs

Awesome Lists containing this project

README

          

# fastrand

[![Build status](https://github.com/CAFxX/fastrand/workflows/Build/badge.svg)](https://github.com/CAFxX/fastrand/actions)
[![codecov](https://codecov.io/gh/CAFxX/fastrand/branch/main/graph/badge.svg)](https://codecov.io/gh/CAFxX/fastrand)
[![Go Report](https://goreportcard.com/badge/github.com/CAFxX/fastrand)](https://goreportcard.com/report/github.com/CAFxX/fastrand)
[![Go Reference](https://pkg.go.dev/badge/github.com/CAFxX/fastrand.svg)](https://pkg.go.dev/github.com/CAFxX/fastrand) :warning: API is not stable yet.

Some fast, non-cryptographic PRNG sources, in three variants:

- **Plain** - the basic implementation. Fastest, but can not be used concurrently without external synchronization.
- **Atomic** - implementation using atomic operations. Non-locking, can be used concurrently, but a bit slower (especially at high concurrency).
- **Sharded** - implementation using per-thread (P) sharding. Non-locking, can be used concurrently, fast (even at high concurrency), but does not support explicit seeding.

PRNGs currently implemented:

| Name | State (bits) | Output (bits) | Period | Variants |
| ------------------------------------------------------------ | ------------ | ------------- | ----------------- | ---------------------- |
| [SplitMix64](https://dl.acm.org/doi/10.1145/2714064.2660195) | 64 | 64 | 264 | Plain, Atomic, Sharded |
| [PCG-XSH-RR](https://www.pcg-random.org/) | 64 | 32 | 264 | Plain, Atomic, Sharded |
| [Xoshiro256**](http://prng.di.unimi.it/) | 256 | 64 | 2256-1 | Plain, Sharded |

Planned additions include:

| Name | State (bits) | Output (bits) | Period | Variants |
| ----------------------------------------- | ------------ | ------------- | ----------------- | ---------------------------------- |
| [PCG-XSL-RR](https://www.pcg-random.org/) | 128 | 64 | 2128 | Plain, Atomic3, Sharded |
| [xorshift128+](http://prng.di.unimi.it/) | 128 | 64 | 2128-1 | Plain, Atomic3, Sharded |

## Performance

Tests run on a `Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz` with Turbo Boost disabled. Lower is better.

### `GOMAXPROCS=1`

| PRNG | Plain | Atomic | Sharded |
| ------------ | -----: | ------------------: | ------: |
| SplitMix64 | 2.02ns | 8.72ns | 7.33ns |
| PCG-XSH-RR | 3.17ns | 11.90ns | 7.33ns |
| Xoshiro256** | 4.57ns | -1 | 12.40ns |
| math/rand | 7.06ns | 24.20ns2 | - |

### `GOMAXPROCS=8`

| PRNG | Plain | Atomic | Sharded |
| ------------ | -----: | ------------------: | ------: |
| SplitMix64 | 0.29ns | 26.20ns | 1.33ns |
| PCG-XSH-RR | 0.41ns | 13.20ns | 1.34ns |
| Xoshiro256** | 0.81ns | -1 | 2.12ns |
| math/rand | 1.19ns | 72.40ns2 | - |

## Usage notes

### Atomic variants

The atomic variant currently relies on `unsafe` to improve the performance of its CAS loops. It does so by calling the unexported `procyield` function in package `runtime`. This dependency will be removed in a future release. Usage of `unsafe` can be disabled by setting the `fastrand_nounsafe` build tag, at the cost of lower performance.

The state of the atomic variants is not padded/aligned to fill the cacheline: if needed users should pad the structure to avoid false sharing of the cacheline.

### Sharded variants

Sharded variants rely on `unsafe` to implement sharding. They do so by calling the unexported `procPin` and `procUnpin` functions in package `runtime`. These functions are used by other packages (e.g. `sync`) for the same purpose, so they are unlikely to disappear/change. Usage of `unsafe` can be disabled by setting the `fastrand_nounsafe` build tag, at the cost of lower performance.

Sharded variants detect the value of `GOMAXPROCS` when they are instantiated (with `NewShardedXxx`). If `GOMAXPROCS` is increased after a sharded PRNG is instantiated it will yield suboptimal performance, as it may dynamically fallback to the corresponding atomic variant.

Sharded variants use more memory for the state than the other variants: in general they use at least `GOMAXPROCS * 64` bytes. This is done to avoid false sharing of cachelines between shards.

Sharded variants do not allow explicit seeding since there is no easy way for a user to obtain a deterministic sequence from these variants (because, in general, goroutines can migrate between threads at any time).

## License

[MIT](LICENSE)

---

1 there is no atomic variant for Xoshiro256** because its large state is not amenable to a performant atomic implementation.

2 the `math/rand` atomic variant is not a pure non-locking implementation, since it is implemented by guarding a `rand.Rand` using a `sync.Mutex`.

3 only for platforms where 128 bit CAS primitives are supported.