Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jberryman/chan-benchmarks

Criterion benchmarks for the different haskell concurrent channel implementations in base and stm
https://github.com/jberryman/chan-benchmarks

Last synced: 3 months ago
JSON representation

Criterion benchmarks for the different haskell concurrent channel implementations in base and stm

Host: GitHub
URL: https://github.com/jberryman/chan-benchmarks
Owner: jberryman
License: bsd-3-clause
Created: 2013-12-30T01:21:37.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2014-04-21T23:27:52.000Z (over 10 years ago)
Last Synced: 2023-03-11T18:03:21.687Z (almost 2 years ago)
Language: Haskell
Size: 4.55 MB
Stars: 17
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Criterion benchmarks for the different haskell concurrent channel
implementations in `base` and `stm` and elsewhere, as well as simple var
read/write benchmarks for `MVar`, `IORef`, and `TVar` and others.

These benchmarks were originally taken from bench/chanbench.hs in the `stm`
package, ported to criterion with some additions. To run them on your machine:

cabal sandbox init
cabal install
# For HTML reports:
./.cabal-sandbox/bin/chan-benchmarks -g -o Benchmarks.html +RTS -N

Feel free to send pull requests with new or improved benchmarks.

# Sample Results

Nice HTML output for a sample run performed on:

$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Stepping: 9
CPU MHz: 1400.000
BogoMIPS: 4988.38
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K

...are at `Benchmarks.chans_sample.html` and `Benchmarks.vars_sample.html`.

## Some analysis of primitive operations

forkIO 309.ns
context switch 2975.ns

getNumCapabilities 4.1ns
myThreadId 4.7ns

newIORef 7.19
readIORef 3.74ns
writeIORef 7.02ns
modifyIORef' 7.02ns -- even though this is implemented as read+write??
atomicModifyIORef' 22.43ns
atomicModifyIORef 53.67ns -- variable; showing cost of lazy creation of (const 'x') thunks?

newEmptyMVar 7.32ns
takeMVar 16.21ns
putMVar 9.02ns
modifyMVarMasked_ 35.09ns -- handler overhead ~ 10ns

newTVarIO 12.96ns
atomically writeTVar 53.35ns
atomically readTVar 54.29ns
readTVarIO 4.13ns
atomically modifyTVar' 63.76ns

--counter from atomic-primops v0.5
-- NOTE: ACTUALLY
newCounter 11.99ns
incrCounter 9.28ns

Throughput of incrementing atomic counter with...
modifyMVar_ 331.29ms / 100000 = 3312.9ns per increment
modifyMVarMasked_ 323.41ms / 100000 3234.1ns -- NOTE: small variance
atomicModifyIORef' 87.66ms / 100000 876.6ns -- NOTE: sort of folded normal distribution beginning at 9.76ms , where the max we saw was 256.12ms
modifyTVar' 19.10ms / 100000 191.0ns -- NOTE: much lower variance, with samples 17 - 21 ms

incrCounter 0.97ms / 100000 -- NOTE: GARBAGE; COUNTER NOT ATOMIC

## Random resources

Some discussion of nitty-gritty of `atomicModifyIORef`:

http://stackoverflow.com/questions/10102881/haskell-how-does-atomicmodifyioref-work

## Random Analysis

Back-of-envelope look at how primitive var read write cost relates to chan RW
cost.

As of the current test run, looking at the mean times for the fastest three
contenders on the easiest test (write some, read some), we get the following
mean timings for *one read and write* (although reads and writes might vary
widely)

Chan 135 ns
TQueue 175 ns
chan-split-fast 88 ns

Measured timings for an atomic `modify` (or take/put; again not ideal) divided
by 2 (i.e. very approx timing for a take/read or put/write):

MVar 15 ns
TVar 36 ns

and var creation:

MVar 24 ns
TVar 21 ns

Counting var operations, with (count) around slowest path

chan-split-fast
---------------
puts takes creates TOTAL
readChan 1(2) 1(3) (1) 30-99 ns
writeChan 1(2) 1 / 30-45 ns
TOTAL for read and write: 60-144 ns

So if we didn't screw that up:
- yes, read/write timing dominates
- but there might still be room to shave time elsewhere
- in "write some / read some" we don't take the slow reader-blocked path much, as expected