Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jberryman/chan-benchmarks
Criterion benchmarks for the different haskell concurrent channel implementations in base and stm
https://github.com/jberryman/chan-benchmarks
Last synced: 3 months ago
JSON representation
Criterion benchmarks for the different haskell concurrent channel implementations in base and stm
- Host: GitHub
- URL: https://github.com/jberryman/chan-benchmarks
- Owner: jberryman
- License: bsd-3-clause
- Created: 2013-12-30T01:21:37.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-04-21T23:27:52.000Z (over 10 years ago)
- Last Synced: 2023-03-11T18:03:21.687Z (almost 2 years ago)
- Language: Haskell
- Size: 4.55 MB
- Stars: 17
- Watchers: 4
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Criterion benchmarks for the different haskell concurrent channel
implementations in `base` and `stm` and elsewhere, as well as simple var
read/write benchmarks for `MVar`, `IORef`, and `TVar` and others.These benchmarks were originally taken from bench/chanbench.hs in the `stm`
package, ported to criterion with some additions. To run them on your machine:cabal sandbox init
cabal install
# For HTML reports:
./.cabal-sandbox/bin/chan-benchmarks -g -o Benchmarks.html +RTS -NFeel free to send pull requests with new or improved benchmarks.
# Sample Results
Nice HTML output for a sample run performed on:
$ lscpu
Architecture: i686
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Stepping: 9
CPU MHz: 1400.000
BogoMIPS: 4988.38
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K...are at `Benchmarks.chans_sample.html` and `Benchmarks.vars_sample.html`.
## Some analysis of primitive operations
forkIO 309.ns
context switch 2975.nsgetNumCapabilities 4.1ns
myThreadId 4.7nsnewIORef 7.19
readIORef 3.74ns
writeIORef 7.02ns
modifyIORef' 7.02ns -- even though this is implemented as read+write??
atomicModifyIORef' 22.43ns
atomicModifyIORef 53.67ns -- variable; showing cost of lazy creation of (const 'x') thunks?newEmptyMVar 7.32ns
takeMVar 16.21ns
putMVar 9.02ns
modifyMVarMasked_ 35.09ns -- handler overhead ~ 10nsnewTVarIO 12.96ns
atomically writeTVar 53.35ns
atomically readTVar 54.29ns
readTVarIO 4.13ns
atomically modifyTVar' 63.76ns--counter from atomic-primops v0.5
-- NOTE: ACTUALLY
newCounter 11.99ns
incrCounter 9.28nsThroughput of incrementing atomic counter with...
modifyMVar_ 331.29ms / 100000 = 3312.9ns per increment
modifyMVarMasked_ 323.41ms / 100000 3234.1ns -- NOTE: small variance
atomicModifyIORef' 87.66ms / 100000 876.6ns -- NOTE: sort of folded normal distribution beginning at 9.76ms , where the max we saw was 256.12ms
modifyTVar' 19.10ms / 100000 191.0ns -- NOTE: much lower variance, with samples 17 - 21 msincrCounter 0.97ms / 100000 -- NOTE: GARBAGE; COUNTER NOT ATOMIC
## Random resources
Some discussion of nitty-gritty of `atomicModifyIORef`:
http://stackoverflow.com/questions/10102881/haskell-how-does-atomicmodifyioref-work
## Random Analysis
Back-of-envelope look at how primitive var read write cost relates to chan RW
cost.As of the current test run, looking at the mean times for the fastest three
contenders on the easiest test (write some, read some), we get the following
mean timings for *one read and write* (although reads and writes might vary
widely)Chan 135 ns
TQueue 175 ns
chan-split-fast 88 nsMeasured timings for an atomic `modify` (or take/put; again not ideal) divided
by 2 (i.e. very approx timing for a take/read or put/write):MVar 15 ns
TVar 36 nsand var creation:
MVar 24 ns
TVar 21 nsCounting var operations, with (count) around slowest path
chan-split-fast
---------------
puts takes creates TOTAL
readChan 1(2) 1(3) (1) 30-99 ns
writeChan 1(2) 1 / 30-45 ns
TOTAL for read and write: 60-144 nsSo if we didn't screw that up:
- yes, read/write timing dominates
- but there might still be room to shave time elsewhere
- in "write some / read some" we don't take the slow reader-blocked path much, as expected