{"id":20762000,"url":"https://github.com/composewell/haskell-perf","last_synced_at":"2026-04-22T06:48:50.790Z","repository":{"id":173779432,"uuid":"651280721","full_name":"composewell/haskell-perf","owner":"composewell","description":null,"archived":false,"fork":false,"pushed_at":"2024-02-05T21:06:09.000Z","size":124,"stargazers_count":2,"open_issues_count":8,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-11T17:25:25.050Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Haskell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/composewell.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-06-08T22:55:59.000Z","updated_at":"2024-02-02T15:14:02.000Z","dependencies_parsed_at":"2023-10-21T06:28:20.255Z","dependency_job_id":"9f907eb9-d605-40bb-ae9b-53cc88af33b1","html_url":"https://github.com/composewell/haskell-perf","commit_stats":null,"previous_names":["composewell/haskell-perf"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/composewell/haskell-perf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/composewell%2Fhaskell-perf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/composewell%2Fhaskell-perf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/composewell%2Fhaskell-perf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/composewell%2Fhaskell-perf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/composewell","download_url":"https://codeload.github.com/composewell/haskell-perf/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/composewell%2Fhaskell-perf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28021576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T10:28:37.378Z","updated_at":"2025-12-25T06:06:11.329Z","avatar_url":"https://github.com/composewell.png","language":"Haskell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# haskell-perf\n\nGHC Patch: https://github.com/composewell/ghc/tree/ghc-8.10.7-eventlog-enhancements\n\n## Enable Linux perf counters\n\nEnable unrestricted use of perf counters:\n\n```\n# echo -1 \u003e /proc/sys/kernel/perf_event_paranoid\n```\n\n## Disable CPU scaling\n\nSet the scaling governer of all your cpus to `performance`:\n\n```\necho performance \u003e /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor\necho performance \u003e /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor\n...\n...\necho performance \u003e /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor\n```\n\n## Generating the eventlog\n\nTo generate the event log, we need to compile the program with the eventlog enabled\nand run the program setting the `-l` rts option.\n\nThere are multiple ways of doing this.\n\n__Using plain GHC__:\n\n```\nghc Main.hs -rtsopts -eventlog\n./Main +RTS -l -RTS\n```\n\n__Using Cabal__:\n\nThe `.cabal` file should contain the following ghc options\n```\nghc-options: -eventlog \"-with-rtsopts=-l\"\n```\n\nIf the `-threaded` option is used while compiling. You may want to use the `-N1`\nrts option.\n\n## Creating windows\n\nHelper function to create windows:\n\n```\n{-# LANGUAGE BangPatterns #-}\n\nimport Control.Monad.IO.Class (MonadIO(..))\nimport Debug.Trace (traceEventIO)\n\n{-# INLINE withTracingFlow #-}\nwithTracingFlow :: MonadIO m =\u003e String -\u003e m a -\u003e m a\nwithTracingFlow tag action = do\n    liftIO $ traceEventIO (\"START:\" ++ tag)\n    !res \u003c- action\n    liftIO $ traceEventIO (\"END:\" ++ tag)\n    pure res\n```\n\nWe can wrap parts of the flow we want to analyze with `withTracingFlow` using a\ntag to help us identify it.\n\n## End of Window\n\nYou can put the END of the window in different paths but ensure that all paths\nare covered:\n\n```\n  r \u003c- f x\n  case r of\n    Just val -\u003e do\n      -- _ \u003c- L.runIO $ traceEventIO $ \"END:\" ++ \"window\"\n      -- Some processing\n    Nothing -\u003e do\n      -- _ \u003c- L.runIO $ traceEventIO $ \"END:\" ++ \"window\"\n      -- Some processing\n```\n\n## Measurement Overhead\n\nEven when you are measuring an empty block of code there will be some minimum\ntiming and allocations reported because of the measurement overhead.\n\n```\n    _ \u003c- traceEventIO $ \"START:emptyWindow\"\n    _ \u003c- traceEventIO $ \"END:emptyWindow\"\n```\n\nThe timing is due to the time measurement system call itself. The allocations\nare due to the traceEventIO haskell code execution. TODO: fix the allocations.\n\n## Measurement with Lazy Evaluation\n\nIf we want to measure the cost of the lookup in the code below we need\nto evaluate it right there:\n\n```\n    m \u003c- readIORef _configCache\n    return . snd $ SimpleLRU.lookup k m\n```\n\nFor correct measurement use the following code:\n\n```\n    m \u003c- readIORef _configCache\n    _ \u003c- traceEventIO $ \"START:\" ++ \"mapLookup\"\n    let !v = HM.lookup k m\n    _ \u003c- traceEventIO $ \"END:\" ++ \"mapLookup\"\n    return v\n```\n\n## Labelling Threads\n\nWe should label our threads to identify the thread to scrutinize while reading\nthe stats.\n\nFor example,\n\nTo scrutinize the main thread:\n\n```\nimport GHC.Conc (myThreadId, labelThread)\n\nmain :: IO ()\nmain = do\n    tid \u003c- myThreadId\n    labelThread tid \"main-thread\"\n    withTracingFlow \"main\" $ do\n       ...\n```\n\nTo scrutinize the server thread in warp we can use the following middleware:\n\n```\neventlogMiddleware :: Application -\u003e Application\neventlogMiddleware app request respond = do\n    tid \u003c- myThreadId\n    labelThread tid \"server\"\n    traceEventIO (\"START:server\")\n    app request respond1\n\n    where\n\n    respond1 r = do\n        res \u003c- respond r\n        traceEventIO (\"END:server\")\n        return res\n\n```\n\nWe can use `eventlogMiddleware` as the outermost layer.\n\n## Reading the results\n\nWe get a lot of output currently. We are in the process of simplifying the\nstatistics and making the details controllable via options.\n\nCurrently, the program prints a lot of information. It's essential to understand\nwhat to ignore given the use case.\n\nThe use-case we assume is: __Understand the window CPU time and Thread allocated__.\n\nConsider the following program:\n\n```\n{-# LANGUAGE BangPatterns #-}\n\nimport Control.Monad (unless)\nimport Control.Monad.IO.Class (MonadIO(..))\nimport Debug.Trace (traceEventIO)\nimport GHC.Conc (myThreadId, labelThread)\n\n{-# INLINE withTracingFlow #-}\nwithTracingFlow :: MonadIO m =\u003e String -\u003e m a -\u003e m a\nwithTracingFlow tag action = do\n    liftIO $ traceEventIO (\"START:\" ++ tag)\n    !res \u003c- action\n    liftIO $ traceEventIO (\"END:\" ++ tag)\n    pure res\n\n{-# INLINE printSumLoop #-}\nprintSumLoop :: Int -\u003e Int -\u003e Int -\u003e IO ()\nprintSumLoop _ _ 0 = print \"All Done!\"\nprintSumLoop chunksOf from times = do\n    withTracingFlow \"sum\" $ print $ sum [from..(from + chunksOf)]\n    printSumLoop chunksOf (from + chunksOf) (times - 1)\n\nmain :: IO ()\nmain = do\n    tid \u003c- myThreadId\n    labelThread tid \"main-thread\"\n    withTracingFlow \"main\" $ do\n         printSumLoop 10000 1 100\n```\n\nThe statics gleaned from the eventlog of the above program will look like the\nfollowing:\n\n```\n--------------------------------------------------\nSummary Stats\n--------------------------------------------------\n\nGlobal thread wise stat summary\ntid       label samples ThreadCPUTime ThreadAllocated\n--- ----------- ------- ------------- ---------------\n  1 main-thread       2       967,479         434,384\n  2           -       1         5,854          17,664\n\n  -           -       3       973,333         452,048\n\n\nWindow [1:main] thread wise stat summary\nProcessCPUTime: 1,174,455\nProcessUserCPUTime: 0\nProcessSystemCPUTime: 1,175,000\n\nThreadCPUTime:934,898\nGcCPUTime:0\nRtsCPUTime:239,557\ntid       label samples ThreadCPUTime ThreadAllocated\n--- ----------- ------- ------------- ---------------\n  1 main-thread       1       934,898         429,952\n\n  -           -       1       934,898         429,952\n\n\nWindow [1:sum] thread wise stat summary\nProcessCPUTime: 953,862\nProcessUserCPUTime: 0\nProcessSystemCPUTime: 949,000\n\nThreadCPUTime:833,991\nGcCPUTime:0\nRtsCPUTime:119,871\ntid       label samples ThreadCPUTime ThreadAllocated\n--- ----------- ------- ------------- ---------------\n  1 main-thread     100       833,991         328,224\n\n  -           -     100       833,991         328,224\n\n\n--------------------------------------------------\nDetailed Stats\n--------------------------------------------------\n\nWindow [1:main] thread wise stats for [ThreadCPUTime]\ntid       label   total count     avg minimum maximum stddev\n--- ----------- ------- ----- ------- ------- ------- ------\n  1 main-thread 934,898     1 934,898 934,898 934,898      0\n\n\nGrand total: 934,898\n\nWindow [1:main] thread wise stats for [ThreadAllocated]\ntid       label   total count     avg minimum maximum stddev\n--- ----------- ------- ----- ------- ------- ------- ------\n  1 main-thread 429,952     1 429,952 429,952 429,952      0\n\n\nGrand total: 429,952\n\nWindow [1:sum] thread wise stats for [ThreadCPUTime]\ntid       label   total count   avg minimum maximum stddev\n--- ----------- ------- ----- ----- ------- ------- ------\n  1 main-thread 833,991   100 8,340   5,533  63,493  5,714\n\n\nGrand total: 833,991\n\nWindow [1:sum] thread wise stats for [ThreadAllocated]\ntid       label   total count   avg minimum maximum stddev\n--- ----------- ------- ----- ----- ------- ------- ------\n  1 main-thread 328,224   100 3,282   2,960  31,584  2,844\n\n\nGrand total: 328,224\n\nGlobal thread wise stats for [ThreadCPUTime]\ntid       label   total count     avg minimum maximum  stddev\n--- ----------- ------- ----- ------- ------- ------- -------\n  1 main-thread 967,479     2 483,740  33,519 933,960 450,220\n  2           -   5,854     1   5,854   5,854   5,854       0\n\n\nGrand total: 973,333\n\nGlobal thread wise stats for [ThreadAllocated]\ntid       label   total count     avg minimum maximum  stddev\n--- ----------- ------- ----- ------- ------- ------- -------\n  1 main-thread 434,384     2 217,192   4,920 429,464 212,272\n  2           -  17,664     1  17,664  17,664  17,664       0\n\n\nGrand total: 452,048\n```\n\nFrom the __Global thread wise stat summary__ under __Summary Stats__ figure out\nthe thread id we want to scrutinize. In this case, we care about the\n`main-thread`. The thread id is `1`.\n\nWe can skip to the __Detailed Stats__ section.\n\nWe want to look at all the windows we want to scrutinize that run in the\n`main-thread`. The windows in the above program are `main` and `sum`.  The\nthread id is prepended to the windows. So we want to look at sections\ncorresponding to `[1:main]` and `[1:sum]`.\n\nThat is,\n```\nWindow [1:main] thread wise stats for [ThreadCPUTime]\ntid       label   total count     avg minimum maximum stddev\n--- ----------- ------- ----- ------- ------- ------- ------\n  1 main-thread 934,898     1 934,898 934,898 934,898      0\n\n\nGrand total: 934,898\n\nWindow [1:main] thread wise stats for [ThreadAllocated]\ntid       label   total count     avg minimum maximum stddev\n--- ----------- ------- ----- ------- ------- ------- ------\n  1 main-thread 429,952     1 429,952 429,952 429,952      0\n\n\nGrand total: 429,952\n\nWindow [1:sum] thread wise stats for [ThreadCPUTime]\ntid       label   total count   avg minimum maximum stddev\n--- ----------- ------- ----- ----- ------- ------- ------\n  1 main-thread 833,991   100 8,340   5,533  63,493  5,714\n\n\nGrand total: 833,991\n\nWindow [1:sum] thread wise stats for [ThreadAllocated]\ntid       label   total count   avg minimum maximum stddev\n--- ----------- ------- ----- ----- ------- ------- ------\n  1 main-thread 328,224   100 3,282   2,960  31,584  2,844\n```\n\nConsider one specific section,\n\n```\nWindow [1:sum] thread wise stats for [ThreadCPUTime]\ntid       label   total count   avg minimum maximum stddev\n--- ----------- ------- ----- ----- ------- ------- ------\n  1 main-thread 833,991   100 8,340   5,533  63,493  5,714\n```\n\nThis section is a table. It has 8 columns. It can have multiple rows.  We should\nonly scrutinize the row where the `tid` matches `main-thread`. ie. `tid == 1`.\n\nThe granularity of `ThreadCPUTime` is in nanoseconds and `ThreadAllocated` is\nin bytes.\n\nColumns:\n\n- `tid`: The thread id\n- `label`: The thread label\n- `total`: The total accumulated sum of all the samples\n- `count`: Number of samples or the times this window is seen\n- `avg`: The average size of the samples\n- `minimum`: The minimum of all the samples\n- `maximum`: The maximum of all the samples\n- `stddev`: The standard deviation of the samples\n\n__NOTE__: It is important to look at `stddev`. If `stddev` is more than 30% of\nthe average and if the difference between the `minimum` and `maximum` is too\nmuch, the `average` might have unecessary outliers. In the future we would like\nto remove outliers automatically.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomposewell%2Fhaskell-perf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcomposewell%2Fhaskell-perf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcomposewell%2Fhaskell-perf/lists"}