Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rouming/ccont

Tool burns CPUs on different NUMA nodes and measures execution time
https://github.com/rouming/ccont

Last synced: 5 days ago
JSON representation

Tool burns CPUs on different NUMA nodes and measures execution time

Host: GitHub
URL: https://github.com/rouming/ccont
Owner: rouming
Created: 2016-03-21T12:09:03.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2024-06-19T11:13:11.000Z (5 months ago)
Last Synced: 2024-06-19T21:20:29.022Z (5 months ago)
Language: C
Size: 2.93 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README

Awesome Lists containing this project

README

        ccont: Tool burns CPUs on different NUMA nodes and measures execution time.

Description:

    The goal is to measure cache contention on different NUMA nodes,

    burn different CPUs, execute different instructions with different

    load patterns, e.g. the following is the list of three load patterns

    which were executed on machine with 2 NUMA nodes and 8 CPUs:

    o cpu-increase - on each iteration number of CPU is increased:

      # ./ccont --load cpu-increase --op cmpxchg

      Nodes  N0   N1  CPUs    operation       min       max       avg     stdev

       CPUs *--- ----    1      cmpxchg     8.938     8.938     8.938     0.000

       CPUs **-- ----    2      cmpxchg    36.114    36.119    36.117     0.004

       CPUs ***- ----    3      cmpxchg    54.270    54.272    54.271     0.001

       CPUs **** ----    4      cmpxchg    72.292    72.321    72.313     0.013

       CPUs **** *---    5      cmpxchg    61.691   108.060    98.782    20.735

       CPUs **** **--    6      cmpxchg   101.316   136.923   125.059    18.369

       CPUs **** ***-    7      cmpxchg   151.639   169.218   161.702     9.358

       CPUs **** ****    8      cmpxchg   192.281   196.250   194.281     2.098

    o node-cascade - on each iteration CPUs from each node are burned:

      # ./ccont --load node-cascade --op cmpxchg

      Nodes  N0   N1  CPUs    operation       min       max       avg     stdev

       CPUs **** ----    4      cmpxchg    72.287    72.322    72.310     0.016

       CPUs ---- ****    4      cmpxchg    72.327    72.333    72.330     0.003

    o cpu-rollover - on each iteration executor thread rolls to another CPU on

    the next node, keeping the same amount of CPUs burning:

      # ./ccont --load cpu-rollover --op cmpxcgh

      Nodes  N0   N1  CPUs    operation       min       max       avg     stdev

       CPUs **** ----    4      cmpxchg    48.769    48.774    48.772     0.002

       CPUs ***- *---    4      cmpxchg    85.506    97.754    94.683     6.118

       CPUs **-- **--    4      cmpxchg   116.803   121.450   119.108     2.658

       CPUs *--- ***-    4      cmpxchg    91.312   103.877   100.721     6.273

       CPUs ---- ****    4      cmpxchg    48.288    48.368    48.323     0.038

    Memory chunk for each load is always allocated on the node#0.

    Results show, that scattered tasks over NUMA nodes show bad performance for

    cmpxchg instruction (cpu-rollover pattern), but execution on remote node

    is not so bad, because of the L3 cache (node-cascade pattern).  Increase of

    the CPUs number can degrade performance by factor of 24 because of the cache

    line contention (cpu-increase pattern).

    The following burning operations are supported:

    o "idle" - idle loop:

          used just for calibrating.

              while (spins--)

                   ;

    o "memset64" - memset glibc call:

	      memsets 64 bytes (usual cache line size).

    o "memset128" - memset glibc call:

	      memsets 128 bytes.

    o "memset256" - memset glibc call:

	      memsets 256 bytes.

    o "test_bit" - btl:

          testing a bit, used for test_bit() in Linux kernel.

              var | (1 << bit)

    o "set_bit" - bts:

          test and set bit, used for test_and_set_bit() in Linux kernel.

          "test_bit" - name in test results.

              res = var | (1 << bit)

              var |= (1 << bit)

    o "inc" - lock inc:

          increment, used for atomic_inc() in Linux kernel.

              var += 1

    o "xadd" - lock xadd:

          exchanges operands, used for __sync_fetch_and_add()

          and similar gcc atomic builtins.

              tmp = src + dst;

              src = dst;

              dst = tmp;

    o "cmpxchg" - lock cmpxchg:

          exchanges operangs, used for cmpxchg() for all sorts of atomic

          exchanges in Linux kernel.

              res = var

              if (res == old)

                  var = new

    o "mfence" - mfence:

          memory barrier for load and store, used for smp_mb() in Linux kernel.

    o "sfence" - sfence:

          memory barrier for store, used for smp_wmb() in Linux kernel.

    o "lfence" - lfence:

          memory barrier for load, used for smp_rmb() in Linux kernel.