https://github.com/composewell/haskell-perf

Last synced: 2 months ago
JSON representation
Host: GitHub
URL: https://github.com/composewell/haskell-perf
Owner: composewell
License: apache-2.0
Created: 2023-06-08T22:55:59.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2024-02-05T21:06:09.000Z (over 2 years ago)
Last Synced: 2025-03-11T17:25:25.050Z (over 1 year ago)
Language: Haskell
Size: 121 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.md
- Changelog: Changelog.md
- License: LICENSE
Awesome Lists containing this project

README

          # haskell-perf

GHC Patch: https://github.com/composewell/ghc/tree/ghc-8.10.7-eventlog-enhancements

## Enable Linux perf counters

Enable unrestricted use of perf counters:

```

# echo -1 > /proc/sys/kernel/perf_event_paranoid

```

## Disable CPU scaling

Set the scaling governer of all your cpus to `performance`:

```

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

...

...

echo performance > /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor

```

## Generating the eventlog

To generate the event log, we need to compile the program with the eventlog enabled

and run the program setting the `-l` rts option.

There are multiple ways of doing this.

__Using plain GHC__:

```

ghc Main.hs -rtsopts -eventlog

./Main +RTS -l -RTS

```

__Using Cabal__:

The `.cabal` file should contain the following ghc options

```

ghc-options: -eventlog "-with-rtsopts=-l"

```

If the `-threaded` option is used while compiling. You may want to use the `-N1`

rts option.

## Creating windows

Helper function to create windows:

```

{-# LANGUAGE BangPatterns #-}

import Control.Monad.IO.Class (MonadIO(..))

import Debug.Trace (traceEventIO)

{-# INLINE withTracingFlow #-}

withTracingFlow :: MonadIO m => String -> m a -> m a

withTracingFlow tag action = do

    liftIO $ traceEventIO ("START:" ++ tag)

    !res <- action

    liftIO $ traceEventIO ("END:" ++ tag)

    pure res

```

We can wrap parts of the flow we want to analyze with `withTracingFlow` using a

tag to help us identify it.

## End of Window

You can put the END of the window in different paths but ensure that all paths

are covered:

```

  r <- f x

  case r of

    Just val -> do

      -- _ <- L.runIO $ traceEventIO $ "END:" ++ "window"

      -- Some processing

    Nothing -> do

      -- _ <- L.runIO $ traceEventIO $ "END:" ++ "window"

      -- Some processing

```

## Measurement Overhead

Even when you are measuring an empty block of code there will be some minimum

timing and allocations reported because of the measurement overhead.

```

    _ <- traceEventIO $ "START:emptyWindow"

    _ <- traceEventIO $ "END:emptyWindow"

```

The timing is due to the time measurement system call itself. The allocations

are due to the traceEventIO haskell code execution. TODO: fix the allocations.

## Measurement with Lazy Evaluation

If we want to measure the cost of the lookup in the code below we need

to evaluate it right there:

```

    m <- readIORef _configCache

    return . snd $ SimpleLRU.lookup k m

```

For correct measurement use the following code:

```

    m <- readIORef _configCache

    _ <- traceEventIO $ "START:" ++ "mapLookup"

    let !v = HM.lookup k m

    _ <- traceEventIO $ "END:" ++ "mapLookup"

    return v

```

## Labelling Threads

We should label our threads to identify the thread to scrutinize while reading

the stats.

For example,

To scrutinize the main thread:

```

import GHC.Conc (myThreadId, labelThread)

main :: IO ()

main = do

    tid <- myThreadId

    labelThread tid "main-thread"

    withTracingFlow "main" $ do

       ...

```

To scrutinize the server thread in warp we can use the following middleware:

```

eventlogMiddleware :: Application -> Application

eventlogMiddleware app request respond = do

    tid <- myThreadId

    labelThread tid "server"

    traceEventIO ("START:server")

    app request respond1

    where

    respond1 r = do

        res <- respond r

        traceEventIO ("END:server")

        return res

```

We can use `eventlogMiddleware` as the outermost layer.

## Reading the results

We get a lot of output currently. We are in the process of simplifying the

statistics and making the details controllable via options.

Currently, the program prints a lot of information. It's essential to understand

what to ignore given the use case.

The use-case we assume is: __Understand the window CPU time and Thread allocated__.

Consider the following program:

```

{-# LANGUAGE BangPatterns #-}

import Control.Monad (unless)

import Control.Monad.IO.Class (MonadIO(..))

import Debug.Trace (traceEventIO)

import GHC.Conc (myThreadId, labelThread)

{-# INLINE withTracingFlow #-}

withTracingFlow :: MonadIO m => String -> m a -> m a

withTracingFlow tag action = do

    liftIO $ traceEventIO ("START:" ++ tag)

    !res <- action

    liftIO $ traceEventIO ("END:" ++ tag)

    pure res

{-# INLINE printSumLoop #-}

printSumLoop :: Int -> Int -> Int -> IO ()

printSumLoop _ _ 0 = print "All Done!"

printSumLoop chunksOf from times = do

    withTracingFlow "sum" $ print $ sum [from..(from + chunksOf)]

    printSumLoop chunksOf (from + chunksOf) (times - 1)

main :: IO ()

main = do

    tid <- myThreadId

    labelThread tid "main-thread"

    withTracingFlow "main" $ do

         printSumLoop 10000 1 100

```

The statics gleaned from the eventlog of the above program will look like the

following:

```

--------------------------------------------------

Summary Stats

--------------------------------------------------

Global thread wise stat summary

tid       label samples ThreadCPUTime ThreadAllocated

--- ----------- ------- ------------- ---------------

  1 main-thread       2       967,479         434,384

  2           -       1         5,854          17,664

  -           -       3       973,333         452,048

Window [1:main] thread wise stat summary

ProcessCPUTime: 1,174,455

ProcessUserCPUTime: 0

ProcessSystemCPUTime: 1,175,000

ThreadCPUTime:934,898

GcCPUTime:0

RtsCPUTime:239,557

tid       label samples ThreadCPUTime ThreadAllocated

--- ----------- ------- ------------- ---------------

  1 main-thread       1       934,898         429,952

  -           -       1       934,898         429,952

Window [1:sum] thread wise stat summary

ProcessCPUTime: 953,862

ProcessUserCPUTime: 0

ProcessSystemCPUTime: 949,000

ThreadCPUTime:833,991

GcCPUTime:0

RtsCPUTime:119,871

tid       label samples ThreadCPUTime ThreadAllocated

--- ----------- ------- ------------- ---------------

  1 main-thread     100       833,991         328,224

  -           -     100       833,991         328,224

--------------------------------------------------

Detailed Stats

--------------------------------------------------

Window [1:main] thread wise stats for [ThreadCPUTime]

tid       label   total count     avg minimum maximum stddev

--- ----------- ------- ----- ------- ------- ------- ------

  1 main-thread 934,898     1 934,898 934,898 934,898      0

Grand total: 934,898

Window [1:main] thread wise stats for [ThreadAllocated]

tid       label   total count     avg minimum maximum stddev

--- ----------- ------- ----- ------- ------- ------- ------

  1 main-thread 429,952     1 429,952 429,952 429,952      0

Grand total: 429,952

Window [1:sum] thread wise stats for [ThreadCPUTime]

tid       label   total count   avg minimum maximum stddev

--- ----------- ------- ----- ----- ------- ------- ------

  1 main-thread 833,991   100 8,340   5,533  63,493  5,714

Grand total: 833,991

Window [1:sum] thread wise stats for [ThreadAllocated]

tid       label   total count   avg minimum maximum stddev

--- ----------- ------- ----- ----- ------- ------- ------

  1 main-thread 328,224   100 3,282   2,960  31,584  2,844

Grand total: 328,224

Global thread wise stats for [ThreadCPUTime]

tid       label   total count     avg minimum maximum  stddev

--- ----------- ------- ----- ------- ------- ------- -------

  1 main-thread 967,479     2 483,740  33,519 933,960 450,220

  2           -   5,854     1   5,854   5,854   5,854       0

Grand total: 973,333

Global thread wise stats for [ThreadAllocated]

tid       label   total count     avg minimum maximum  stddev

--- ----------- ------- ----- ------- ------- ------- -------

  1 main-thread 434,384     2 217,192   4,920 429,464 212,272

  2           -  17,664     1  17,664  17,664  17,664       0

Grand total: 452,048

```

From the __Global thread wise stat summary__ under __Summary Stats__ figure out

the thread id we want to scrutinize. In this case, we care about the

`main-thread`. The thread id is `1`.

We can skip to the __Detailed Stats__ section.

We want to look at all the windows we want to scrutinize that run in the

`main-thread`. The windows in the above program are `main` and `sum`.  The

thread id is prepended to the windows. So we want to look at sections

corresponding to `[1:main]` and `[1:sum]`.

That is,

```

Window [1:main] thread wise stats for [ThreadCPUTime]

tid       label   total count     avg minimum maximum stddev

--- ----------- ------- ----- ------- ------- ------- ------

  1 main-thread 934,898     1 934,898 934,898 934,898      0

Grand total: 934,898

Window [1:main] thread wise stats for [ThreadAllocated]

tid       label   total count     avg minimum maximum stddev

--- ----------- ------- ----- ------- ------- ------- ------

  1 main-thread 429,952     1 429,952 429,952 429,952      0

Grand total: 429,952

Window [1:sum] thread wise stats for [ThreadCPUTime]

tid       label   total count   avg minimum maximum stddev

--- ----------- ------- ----- ----- ------- ------- ------

  1 main-thread 833,991   100 8,340   5,533  63,493  5,714

Grand total: 833,991

Window [1:sum] thread wise stats for [ThreadAllocated]

tid       label   total count   avg minimum maximum stddev

--- ----------- ------- ----- ----- ------- ------- ------

  1 main-thread 328,224   100 3,282   2,960  31,584  2,844

```

Consider one specific section,

```

Window [1:sum] thread wise stats for [ThreadCPUTime]

tid       label   total count   avg minimum maximum stddev

--- ----------- ------- ----- ----- ------- ------- ------

  1 main-thread 833,991   100 8,340   5,533  63,493  5,714

```

This section is a table. It has 8 columns. It can have multiple rows.  We should

only scrutinize the row where the `tid` matches `main-thread`. ie. `tid == 1`.

The granularity of `ThreadCPUTime` is in nanoseconds and `ThreadAllocated` is

in bytes.

Columns:

- `tid`: The thread id

- `label`: The thread label

- `total`: The total accumulated sum of all the samples

- `count`: Number of samples or the times this window is seen

- `avg`: The average size of the samples

- `minimum`: The minimum of all the samples

- `maximum`: The maximum of all the samples

- `stddev`: The standard deviation of the samples

__NOTE__: It is important to look at `stddev`. If `stddev` is more than 30% of

the average and if the difference between the `minimum` and `maximum` is too

much, the `average` might have unecessary outliers. In the future we would like

to remove outliers automatically.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/composewell/haskell-perf

Awesome Lists containing this project

README