{"id":13730209,"url":"https://github.com/daanx/mimalloc-bench","last_synced_at":"2025-05-16T05:05:54.754Z","repository":{"id":37388227,"uuid":"192573707","full_name":"daanx/mimalloc-bench","owner":"daanx","description":"Suite for benchmarking malloc implementations.","archived":false,"fork":false,"pushed_at":"2025-05-08T09:57:37.000Z","size":787,"stargazers_count":419,"open_issues_count":33,"forks_count":59,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-13T14:09:31.189Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daanx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-06-18T16:09:59.000Z","updated_at":"2025-05-12T03:27:00.000Z","dependencies_parsed_at":"2024-02-11T00:28:46.852Z","dependency_job_id":"c451a843-dd84-4e36-935a-081bd5406b2e","html_url":"https://github.com/daanx/mimalloc-bench","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanx%2Fmimalloc-bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanx%2Fmimalloc-bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanx%2Fmimalloc-bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daanx%2Fmimalloc-bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daanx","download_url":"https://codeload.github.com/daanx/mimalloc-bench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471061,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T02:01:11.546Z","updated_at":"2025-05-16T05:05:49.743Z","avatar_url":"https://github.com/daanx.png","language":"C","readme":"\u003cimg align=\"left\" width=\"100\" height=\"100\" src=\"doc/mimalloc-logo.png\"/\u003e\n\n# Mimalloc-bench\n\n\u0026nbsp;\n\nSuite for benchmarking malloc implementations, originally\ndeveloped for benchmarking [`mimalloc`](https://github.com/microsoft/mimalloc).\nCollection of various benchmarks from the academic literature, together with\nautomated scripts to pull specific versions of benchmark programs and\nallocators from Github and build them.\n\nDue to the large variance in programs and allocators, the suite is currently\nonly developed for Unix-like systems, and specifically Ubuntu with `apt-get`, Fedora with `dnf`,\nand macOS (for a limited set of allocators and benchmarks).\nThe only system-installed allocator used is glibc's implementation that ships as part of Linux's libc.\nAll other allocators are downloaded and built as part of `build-bench-env.sh` --\nif you are looking to run these benchmarks on a different Linux distribution look at\nthe `setup_packages` function to see the packages required to build the full set of\nallocators.\n\n\nIt is quite easy to add new benchmarks and allocator implementations --\nplease do so!.\n\nEnjoy,\n  Daan\n\n\n\nNote that all the code in the `bench` directory is not part of\n_mimalloc-bench_ as such, and all programs in the `bench` directory are\ngoverned under their own specific licenses and copyrights as detailed in\ntheir `README.md` (or `license.txt`) files. They are just included here for convenience.\n\n\n# Benchmarking\n\nThe `build-bench-env.sh` script with the `all` argument will automatically pull\nall needed benchmarks and allocators and build them in the `extern` directory:\n```\n~/dev/mimalloc-bench\u003e ./build-bench-env.sh all\n```\nIt starts installing packages and you will need to enter the sudo password.\nAll other programs are build in the `mimalloc-bench/extern` directory.\nUse `./build-bench-env.sh -h` to see all options.\n\nIf everything succeeded, you can run the full benchmark suite (from `out/bench`) as:\n\n- `~/dev/mimalloc-bench\u003e cd out/bench`\n- `~/dev/mimalloc-bench/out/bench\u003e../../bench.sh alla allt`\n\nOr just test _mimalloc_ and _tcmalloc_ on _cfrac_ and _larson_ with 16 threads:\n\n- `~/dev/mimalloc-bench/out/bench\u003e../../bench.sh --procs=16 mi tc cfrac larson`\n\nGenerally, you can specify the allocators (`mi`, `je`,\n`tc`, `hd`, `sys` (system allocator)) etc, and the benchmarks\n, `cfrac`, `espresso`, `barnes`, `lean`, `larson`, `alloc-test`, `cscratch`, etc.\nOr all allocators (`alla`) and tests (`allt`).\nUse `--procs=\u003cn\u003e` to set the concurrency, and use `--help` to see all supported\nallocators and benchmarks.\n\n\n## Current Allocators\n\nSupported allocators are as follow, see\n[build-bench-env.sh](https://github.com/daanx/mimalloc-bench/blob/master/build-bench-env.sh)\nfor the versions:\n\n- **dieharder**: The [_DieHarder_](https://github.com/emeryberger/DieHard)\n  allocator is an error-resistant memory allocator for Windows, Linux, and Mac\n  OS X.\n- **ff**: [ffmalloc](https://github.com/bwickman97/ffmalloc), from the Usenix\n  Security 21 [paper](https://www.usenix.org/conference/usenixsecurity21/presentation/wickman)\n- **fg**: The [_FreeGuard_](https://github.com/UTSASRG/FreeGuard) allocator, from\n  the CCS 17 [paper](https://dl.acm.org/doi/10.1145/3133956.3133957)\n- **gd**: The [_Guarder_](https://github.com/UTSASRG/Guarder) allocator\n  is a tunable secure allocator by the UTSA.\n- **hd**: The [_Hoard_](https://github.com/emeryberger/Hoard) allocator by\n  Emery Berger \\[1]. This is one of the first multi-thread scalable allocators.\n- **hm**: The [_Hardened\n  Malloc_](https://github.com/GrapheneOS/hardened_malloc) from GrapheneOS,\n  security-focused.\n- **iso**: The [_Isoalloc_](https://github.com/struct/isoalloc/) allocator,\n  isolation-based aiming at providing a reasonable level of security without\n  sacrificing too much the performances.\n- **je**: The [_jemalloc_](https://github.com/jemalloc/jemalloc)\n  allocator by [Jason Evans](https://github.com/jasone),\n  now developed at Facebook\n  and widely used in practice, for example in FreeBSD and Firefox.\n- **lf**: The [_lockfree-malloc_](https://github.com/Begun/lockfree-malloc) allocator,\n  multi-thread scalability-focused.\n- **lp**: The [_libpas_](https://github.com/WebKit/WebKit/tree/main/Source/bmalloc/libpas)\n  allocator, used by [WebKit](https://webkit.org).\n- **lt**: The [_ltalloc_](https://github.com/r-lyeh-archived/ltalloc) allocator,\n  a multi-threaded memory allocator based on free lists best suited for many small allocations.\n- **mesh**: The [_mesh_](https://github.com/plasma-umass/mesh) allocator, a\n  memory allocator that automatically reduces the memory footprint of C/C++\n  applications. Also tested as **nomesh** with the meshing feature disabled.\n- **mi**: The [_mimalloc_](https://github.com/microsoft/mimalloc) allocator.\n  We can also test the debug version as **dmi** (this can be used to check for\n  any bugs in the benchmarks), and the secure version as **smi**.\n- **mng**: [musl](https://musl.libc.org)'s memory allocator.\n- **pa**: The [_PartitionAlloc_](https://chromium.googlesource.com/chromium/src/base/allocator/partition_allocator.git/+/refs/heads/main/PartitionAlloc.md) allocator used in Chromium.\n- **rp**: The [_rpmalloc_](https://github.com/mjansson/rpmalloc) allocator uses\n  16-byte aligned allocations and is developed by [Mattias\n  Jansson](https://twitter.com/maniccoder) at Epic Games, used for example\n  in [Haiku](https://git.haiku-os.org/haiku/commit/?id=7132b79eafd69cced14f028f227936b9eca4de48).\n- **sc**: The [_scalloc_](https://github.com/cksystemsgroup/scalloc) allocator,\n  a fast, multicore-scalable, low-fragmentation memory allocator \n- **scudo**: The\n  [_scudo_](https://www.llvm.org/docs/ScudoHardenedAllocator.html) allocator\n  used by Fuschia and Android.\n- **sg**: The [slimguard](https://github.com/ssrg-vt/SlimGuard) allocator,\n  designed to be secure and memory-efficient.\n- **sm**: The [_Supermalloc_](https://github.com/kuszmaul/SuperMalloc)\n  allocator by Bradley Kuszmaul uses hardware transactional memory to speed up\n  parallel operations.\n- **sn**: The [_snmalloc_](https://github.com/microsoft/snmalloc) allocator\n  is a recent concurrent message passing\n  allocator by Liétar et al. \\[8].\n- **tbb**: The Intel [TBB](https://github.com/intel/tbb) allocator that comes\n  with the Thread Building Blocks (TBB) library \\[7].\n- **tc**: The [_tcmalloc_](https://github.com/gperftools/gperftools)\n  allocator which comes as part of the Google performance tools,\n  now maintained by the commuity.\n- **tcg**: The [_tcmalloc_](https://github.com/google/tcmalloc)\n  allocator, maintained and [used](https://cloud.google.com/blog/topics/systems/trading-off-malloc-costs-and-fleet-efficiency)\n  by Google.\n- **yal**: The [_yalloc_](https://github.com/jorisgeer/yalloc) yet another allocator aims at balancing safety and compactness.\n- **sys**: The system allocator. Here we usually use the _glibc_ allocator\n  (which is originally based on _Ptmalloc2_).\n\n\n## Current Benchmarks\n\nThe first set of benchmarks are real world programs, or are trying to mimic\nsome, and consists of:\n\n- __barnes__: a hierarchical n-body particle solver \\[4], simulating the\n  gravitational forces between 163840 particles. It uses relatively few\n  allocations compared to `cfrac` and `espresso` but is multithreaded.\n- __cfrac__: by Dave Barrett, implementation of continued fraction\n  factorization, using many small short-lived allocations.\n- __espresso__: a programmable logic array analyzer, described by\n  Grunwald, Zorn, and Henderson \\[3]. in the context of cache aware memory allocation.\n- __gs__: have [ghostscript](https://www.ghostscript.com) process the entire\n  Intel Software Developer’s Manual PDF, which is around 5000 pages.\n- __leanN__:  The [Lean](https://github.com/leanprover/lean) compiler by\n  de Moura _et al_, version 3.4.1,\n  compiling its own standard library concurrently using N threads\n  (`./lean --make -j N`). Big real-world workload with intensive\n  allocations.\n- __redis__: running [redis-benchmark](https://redis.io/topics/benchmarks),\n  with 1 million requests pushing 10 new list elements and then requesting the\n  head 10 elements, and measures the requests handled per second. Simulates a\n  real-world workload.\n- __larsonN__: by Larson and Krishnan \\[2]. Simulates a server workload using 100 separate\n   threads which each allocate and free many objects but leave some\n   objects to be freed by other threads. Larson and Krishnan observe this\n   behavior (which they call _bleeding_) in actual server applications,\n   and the benchmark simulates this.\n- __larsonN-sized__: same as the __larsonN__ except it uses sized deallocation calls which\n   have a fast path in some allocators. \n- __lua__: compiling the [lua interpreter](https://github.com/lua/lua).\n- __z3__: perform some computations in [z3](https://github.com/Z3Prover/z3).\n\nThe second set of benchmarks are stress tests and consist of:\n\n- __alloc-test__: a modern allocator test developed by\n  OLogN Technologies AG ([ITHare.com](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/))\n  Simulates intensive allocation workloads with a Pareto size\n  distribution. The _alloc-testN_ benchmark runs on N cores doing\n  100·10⁶ allocations per thread with objects up to 1KiB\n  in size. Using commit `94f6cb`\n  ([master](https://github.com/node-dot-cpp/alloc-test), 2018-07-04)\n- __cache-scratch__: by Emery Berger \\[1]. Introduced with the\n  [Hoard](https://github.com/emeryberger/Hoard) allocator to test for\n  _passive-false_ sharing of cache lines: first some small objects are\n  allocated and given to each thread; the threads free that object and allocate\n  immediately another one, and access that repeatedly. If an allocator\n  allocates objects from different threads close to each other this will lead\n  to cache-line contention.\n- __cache_trash__: part of [Hoard](https://github.com/emeryberger/Hoard)\n  benchmarking suite, designed to exercise heap cache locality.\n- __glibc-simple__ and __glibc-thread__: benchmarks for the [glibc](https://github.com/bminor/glibc/tree/master/benchtests).\n- __malloc-large__: part of mimalloc benchmarking suite, designed\n  to exercice large (several MiB) allocations.\n- __mleak__: check that terminate threads don't \"leak\" memory.\n- __rptest__: modified version of the [rpmalloc-benchmark](https://github.com/mjansson/rpmalloc-benchmark) suite.\n- __mstress__: simulates real-world server-like allocation patterns, using N threads with with allocations in powers of 2  \n  where objects can migrate between threads and some have long life times. Not all threads have equal workloads and \n  after each phase all threads are destroyed and new threads created where some objects survive between phases.\n- __rbstress__: modified version of [allocator_bench](https://github.com/SamSaffron/allocator_bench),\n  allocates chunks in memory via ruby shenanigans.\n- __sh6bench__: by [MicroQuill](http://www.microquill.com) as part of\n  [SmartHeap](http://www.microquill.com/smartheap/sh_tspec.htm). Stress test\n  where some of the objects are freed in a usual last-allocated, first-freed\n  (LIFO) order, but others are freed in reverse order. Using the public\n  [source](http://www.microquill.com/smartheap/shbench/bench.zip) (retrieved\n  2019-01-02)\n- __sh8benchN__: by [MicroQuill](http://www.microquill.com) as part of\n  [SmartHeap](http://www.microquill.com/smartheap/sh_tspec.htm). Stress test\n  for multi-threaded allocation (with N threads) where, just as in _larson_,\n  some objects are freed by other threads, and some objects freed in reverse\n  (as in _sh6bench_). Using the public\n  [source](http://www.microquill.com/smartheap/SH8BENCH.zip) (retrieved\n  2019-01-02)\n- __xmalloc-testN__: by Lever and Boreham \\[5] and Christian Eder. We use the\n  updated version from the\n  [SuperMalloc](https://github.com/kuszmaul/SuperMalloc) repository. This is a\n  more extreme version of the _larson_ benchmark with 100 purely allocating\n  threads, and 100 purely deallocating threads with objects of various sizes\n  migrating between them. This asymmetric producer/consumer pattern is usually\n  difficult to handle by allocators with thread-local caches.\n\nFinally, there is a\n[security benchmark](https://github.com/daanx/mimalloc-bench/tree/master/bench/security)\naiming at checking basic security properties of allocators.\n\n## Example\n\nBelow is an example (Apr 2019) of the benchmark results on an HP\nZ4-G4 workstation with a 4-core Intel® Xeon® W2123 at 3.6 GHz with 16GB\nECC memory, running Ubuntu 18.04.1 with LibC 2.27 and GCC 7.3.0.\n\n![bench-z4-1](doc/bench-z4-1.svg)\n![bench-z4-2](doc/bench-z4-2.svg)\n\nMemory usage:\n\n![bench-z4-rss-1](doc/bench-z4-rss-1.svg)\n![bench-z4-rss-2](doc/bench-z4-rss-2.svg)\n\n(note: the _xmalloc-testN_ memory usage should be disregarded is it\nallocates more the faster the program runs. Unfortunately,\nthere are no entries for _SuperMalloc_ in the _leanN_ and _xmalloc-testN_\nbenchmarks as it faulted on those)\n\n# Results and notable usages\n\n## Improvements\n- [Minor performances improvement](https://github.com/struct/isoalloc/commit/049c12e4c2ad5c21a768f7f3873d84bf1106646a) in isoalloc\n- [Parallel compilation](https://github.com/emeryberger/DieHard/issues/15) support in DieHarder\n- [Portability improvement](https://github.com/oneapi-src/oneTBB/pull/764) in Intel TBB malloc\n- [Various](https://github.com/google/tcmalloc/issues/155) [portability](https://github.com/google/tcmalloc/issues/128)\n  [improvements]( https://github.com/google/tcmalloc/issues/125 ) [in](https://github.com/google/tcmalloc/issues/179) Google's tcmalloc\n- [Improved double-free detection]( https://github.com/microsoft/snmalloc/pull/550 ) in snmalloc\n- [Fixed compilation on modern glibc]( https://github.com/ssrg-vt/SlimGuard/pull/13 ) in SlimGuard\n- A [crash]( https://github.com/struct/isoalloc/issues/56 ) in isoalloc\n- Caught a [compilation issue](https://github.com/mjansson/rpmalloc/issues/263) in rpmalloc\n- [Portability issues](https://github.com/mjansson/rpmalloc/issues/293) in rpmalloc\n\n## Notable usages\n- Provided [data]( https://gitlab.gnome.org/GNOME/glib/-/issues/1079#note_1627978 ) for the glib allocator.\n- Provided [data]( https://github.com/microsoft/snmalloc/pull/587#issuecomment-1442077886 ) for snmalloc hardening.\n- Used as main benchmark suite by [S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More](https://arxiv.org/abs/2402.01894) by Ruizhe Wang, Meng Xu and N. Asokan\n\n# References\n\n- \\[1] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson.\n   _Hoard: A Scalable Memory Allocator for Multithreaded Applications_\n   the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). Cambridge, MA, November 2000.\n   [pdf](http://www.cs.utexas.edu/users/mckinley/papers/asplos-2000.pdf)\n\n\n- \\[2] P. Larson and M. Krishnan. _Memory allocation for long-running server applications_. In ISMM, Vancouver, B.C., Canada, 1998.\n      [pdf](http://citeseemi.ist.psu.edu/viewdoc/download;jsessionid=5F0BFB4F57832AEB6C11BF8257271088?doi=10.1.1.45.1947\u0026rep=rep1\u0026type=pdf)\n\n- \\[3] D. Grunwald, B. Zorn, and R. Henderson.\n  _Improving the cache locality of memory allocation_. In R. Cartwright, editor,\n  Proceedings of the Conference on Programming Language Design and Implementation, pages 177–186, New York, NY, USA, June 1993.\n  [pdf](http://citeseemi.ist.psu.edu/viewdoc/download?doi=10.1.1.43.6621\u0026rep=rep1\u0026type=pdf)\n\n- \\[4] J. Barnes and P. Hut. _A hierarchical O(n*log(n)) force-calculation algorithm_. Nature, 324:446-449, 1986.\n\n- \\[5] C. Lever, and D. Boreham. _Malloc() Performance in a Multithreaded Linux Environment._\n  In USENIX Annual Technical Conference, Freenix Session. San Diego, CA. Jun. 2000.\n  Available at \u003chttps://​github.​com/​kuszmaul/​SuperMalloc/​tree/​master/​tests\u003e\n\n- \\[6] Timothy Crundal. _Reducing Active-False Sharing in TCMalloc._\n   2016. \u003chttp://​courses.​cecs.​anu.​edu.​au/​courses/​CSPROJECTS/​16S1/​Reports/​Timothy*​Crundal*​Report.​pdf\u003e. CS16S1 project at the Australian National University.\n\n- \\[7] Alexey Kukanov, and Michael J Voss.\n   _The Foundations for Scalable Multi-Core Software in Intel Threading Building Blocks._\n   Intel Technology Journal 11 (4). 2007\n\n- \\[8] Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J Parkinson,\n  Alex Shamis, Christoph M Wintersteiger, and David Chisnall.\n  _Snmalloc: A Message Passing Allocator._\n  In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management, 122–135. ACM. 2019.\n","funding_links":[],"categories":["C","C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanx%2Fmimalloc-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaanx%2Fmimalloc-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaanx%2Fmimalloc-bench/lists"}