{"id":13730216,"url":"https://github.com/GrapheneOS/hardened_malloc","last_synced_at":"2025-05-08T02:32:05.850Z","repository":{"id":37251522,"uuid":"145801335","full_name":"GrapheneOS/hardened_malloc","owner":"GrapheneOS","description":"Hardened allocator designed for modern systems. It has integration into Android's Bionic libc and can be used externally with musl and glibc as a dynamic library for use on other Linux-based platforms. It will gain more portability / integration over time.","archived":false,"fork":false,"pushed_at":"2025-04-11T23:41:06.000Z","size":1089,"stargazers_count":1387,"open_issues_count":27,"forks_count":103,"subscribers_count":47,"default_branch":"main","last_synced_at":"2025-04-12T02:56:36.871Z","etag":null,"topics":["grapheneos","hardening","malloc","malloc-library","memory","memory-allocation","memory-allocator","quarantine","security","slab-allocator"],"latest_commit_sha":null,"homepage":"https://grapheneos.org/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GrapheneOS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"thestinger","custom":"https://grapheneos.org/donate"}},"created_at":"2018-08-23T04:43:10.000Z","updated_at":"2025-04-11T21:25:55.000Z","dependencies_parsed_at":"2023-02-19T10:05:18.506Z","dependency_job_id":"9556d680-ccbc-40a5-905c-063282af43bd","html_url":"https://github.com/GrapheneOS/hardened_malloc","commit_stats":{"total_commits":726,"total_committers":14,"mean_commits":"51.857142857142854","dds":"0.12396694214876036","last_synced_commit":"c894f3ec1d78234657ae5f581abec16b5e518275"},"previous_names":[],"tags_count":808,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrapheneOS%2Fhardened_malloc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrapheneOS%2Fhardened_malloc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrapheneOS%2Fhardened_malloc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrapheneOS%2Fhardened_malloc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GrapheneOS","download_url":"https://codeload.github.com/GrapheneOS/hardened_malloc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252986899,"owners_count":21836250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["grapheneos","hardening","malloc","malloc-library","memory","memory-allocation","memory-allocator","quarantine","security","slab-allocator"],"created_at":"2024-08-03T02:01:11.688Z","updated_at":"2025-05-08T02:32:05.532Z","avatar_url":"https://github.com/GrapheneOS.png","language":"C","readme":"# Hardened malloc\n\n* [Introduction](#introduction)\n* [Dependencies](#dependencies)\n* [Testing](#testing)\n    * [Individual Applications](#individual-applications)\n    * [Automated Test Framework](#automated-test-framework)\n* [Compatibility](#compatibility)\n* [OS integration](#os-integration)\n    * [Android-based operating systems](#android-based-operating-systems)\n    * [Traditional Linux-based operating systems](#traditional-linux-based-operating-systems)\n* [Configuration](#configuration)\n* [Core design](#core-design)\n* [Security properties](#security-properties)\n* [Randomness](#randomness)\n* [Size classes](#size-classes)\n* [Scalability](#scalability)\n    * [Small (slab) allocations](#small-slab-allocations)\n        * [Thread caching (or lack thereof)](#thread-caching-or-lack-thereof)\n    * [Large allocations](#large-allocations)\n* [Memory tagging](#memory-tagging)\n* [API extensions](#api-extensions)\n* [Stats](#stats)\n* [System calls](#system-calls)\n\n## Introduction\n\nThis is a security-focused general purpose memory allocator providing the\nmalloc API along with various extensions. It provides substantial hardening\nagainst heap corruption vulnerabilities. The security-focused design also leads\nto much less metadata overhead and memory waste from fragmentation than a more\ntraditional allocator design. It aims to provide decent overall performance\nwith a focus on long-term performance and memory usage rather than allocator\nmicro-benchmarks. It offers scalability via a configurable number of entirely\nindependent arenas, with the internal locking within arenas further divided\nup per size class.\n\nThis project currently supports Bionic (Android), musl and glibc. It may\nsupport other non-Linux operating systems in the future. For Android, there's\ncustom integration and other hardening features which is also planned for musl\nin the future. The glibc support will be limited to replacing the malloc\nimplementation because musl is a much more robust and cleaner base to build on\nand can cover the same use cases.\n\nThis allocator is intended as a successor to a previous implementation based on\nextending OpenBSD malloc with various additional security features. It's still\nheavily based on the OpenBSD malloc design, albeit not on the existing code\nother than reusing the hash table implementation. The main differences in the\ndesign are that it's solely focused on hardening rather than finding bugs, uses\nfiner-grained size classes along with slab sizes going beyond 4k to reduce\ninternal fragmentation, doesn't rely on the kernel having fine-grained mmap\nrandomization and only targets 64-bit to make aggressive use of the large\naddress space. There are lots of smaller differences in the implementation\napproach. It incorporates the previous extensions made to OpenBSD malloc\nincluding adding padding to allocations for canaries (distinct from the current\nOpenBSD malloc canaries), write-after-free detection tied to the existing\nclearing on free, queues alongside the existing randomized arrays for\nquarantining allocations and proper double-free detection for quarantined\nallocations. The per-size-class memory regions with their own random bases were\nloosely inspired by the size and type-based partitioning in PartitionAlloc. The\nplanned changes to OpenBSD malloc ended up being too extensive and invasive so\nthis project was started as a fresh implementation better able to accomplish\nthe goals. For 32-bit, a port of OpenBSD malloc with small extensions can be\nused instead as this allocator fundamentally doesn't support that environment.\n\n## Dependencies\n\nDebian stable (currently Debian 12) determines the most ancient set of\nsupported dependencies:\n\n* glibc 2.36\n* Linux 6.1\n* Clang 14.0.6 or GCC 12.2.0\n\nFor Android, the Linux GKI 5.10, 5.15 and 6.1 branches are supported.\n\nHowever, using more recent releases is highly recommended. Older versions of\nthe dependencies may be compatible at the moment but are not tested and will\nexplicitly not be supported.\n\nFor external malloc replacement with musl, musl 1.1.20 is required. However,\nthere will be custom integration offering better performance in the future\nalong with other hardening for the C standard library implementation.\n\nFor Android, only the current generation, actively developed maintenance branch of the Android\nOpen Source Project will be supported, which currently means `android15-release`.\n\n## Testing\n\n### Individual Applications\n\nThe `preload.sh` script can be used for testing with dynamically linked\nexecutables using glibc or musl:\n\n    ./preload.sh krita --new-image RGBA,U8,500,500\n\nIt can be necessary to substantially increase the `vm.max_map_count` sysctl to\naccommodate the large number of mappings caused by guard slabs and large\nallocation guard regions. The number of mappings can also be drastically\nreduced via a significant increase to `CONFIG_GUARD_SLABS_INTERVAL` but the\nfeature has a low performance and memory usage cost so that isn't recommended.\n\nIt can offer slightly better performance when integrated into the C standard\nlibrary and there are other opportunities for similar hardening within C\nstandard library and dynamic linker implementations. For example, a library\nregion can be implemented to offer similar isolation for dynamic libraries as\nthis allocator offers across different size classes. The intention is that this\nwill be offered as part of hardened variants of the Bionic and musl C standard\nlibraries.\n\n### Automated Test Framework\n\nA collection of simple, automated tests are provided and can be run with the\nmake command as follows:\n\n    make test\n\n## Compatibility\n\nOpenSSH 8.1 or higher is required to allow the mprotect `PROT_READ|PROT_WRITE`\nsystem calls in the seccomp-bpf filter rather than killing the process.\n\n## OS integration\n\n### Android-based operating systems\n\nOn GrapheneOS, hardened\\_malloc is integrated into the standard C library as\nthe standard malloc implementation. Other Android-based operating systems can\nreuse [the integration\ncode](https://github.com/GrapheneOS/platform_bionic/commit/20160b81611d6f2acd9ab59241bebeac7cf1d71c)\nto provide it. If desired, jemalloc can be left as a runtime configuration\noption by only conditionally using hardened\\_malloc to give users the choice\nbetween performance and security. However, this reduces security for threat\nmodels where persistent state is untrusted, i.e. verified boot and attestation\n(see the [attestation sister project](https://attestation.app/about)).\n\nMake sure to raise `vm.max_map_count` substantially too to accommodate the very\nlarge number of guard pages created by hardened\\_malloc. This can be done in\n`init.rc` (`system/core/rootdir/init.rc`) near the other virtual memory\nconfiguration:\n\n    write /proc/sys/vm/max_map_count 1048576\n\nThis is unnecessary if you set `CONFIG_GUARD_SLABS_INTERVAL` to a very large\nvalue in the build configuration.\n\n### Traditional Linux-based operating systems\n\nOn traditional Linux-based operating systems, hardened\\_malloc can either be\nintegrated into the libc implementation as a replacement for the standard\nmalloc implementation or loaded as a dynamic library. Rather than rebuilding\neach executable to be linked against it, it can be added as a preloaded\nlibrary to `/etc/ld.so.preload`. For example, with `libhardened_malloc.so`\ninstalled to `/usr/local/lib/libhardened_malloc.so`, add that full path as a\nline to the `/etc/ld.so.preload` configuration file:\n\n    /usr/local/lib/libhardened_malloc.so\n\nThe format of this configuration file is a whitespace-separated list, so it's\ngood practice to put each library on a separate line.\n\nOn Debian systems `libhardened_malloc.so` should be installed into `/usr/lib/`\nto avoid preload failures caused by AppArmor profile restrictions.\n\nUsing the `LD_PRELOAD` environment variable to load it on a case-by-case basis\nwill not work when `AT_SECURE` is set such as with setuid binaries. It's also\ngenerally not a recommended approach for production usage. The recommendation\nis to enable it globally and make exceptions for performance critical cases by\nrunning the application in a container / namespace without it enabled.\n\nMake sure to raise `vm.max_map_count` substantially too to accommodate the very\nlarge number of guard pages created by hardened\\_malloc. As an example, in\n`/etc/sysctl.d/hardened_malloc.conf`:\n\n    vm.max_map_count = 1048576\n\nThis is unnecessary if you set `CONFIG_GUARD_SLABS_INTERVAL` to a very large\nvalue in the build configuration.\n\nOn arm64, make sure your kernel is configured to use 4k pages since we haven't\nyet added support for 16k and 64k pages. The kernel also has to be configured\nto use 4 level page tables for the full 48 bit address space instead of only\nhaving a 39 bit address space for the default hardened\\_malloc configuration.\nIt's possible to reduce the class region size substantially to make a 39 bit\naddress space workable but the defaults won't work.\n\n## Configuration\n\nYou can set some configuration options at compile-time via arguments to the\nmake command as follows:\n\n    make CONFIG_EXAMPLE=false\n\nConfiguration options are provided when there are significant compromises\nbetween portability, performance, memory usage or security. The core design\nchoices are not configurable and the allocator remains very security-focused\neven with all the optional features disabled.\n\nThe configuration system supports a configuration template system with two\nstandard presets: the default configuration (`config/default.mk`) and a light\nconfiguration (`config/light.mk`). Packagers are strongly encouraged to ship\nboth the standard `default` and `light` configuration. You can choose the\nconfiguration to build using `make VARIANT=light` where `make VARIANT=default`\nis the same as `make`. Non-default configuration templates will build a library\nwith the suffix `-variant` such as `libhardened_malloc-light.so` and will use\nan `out-variant` directory instead of `out` for the build.\n\nThe `default` configuration template has all normal optional security features\nenabled (just not the niche `CONFIG_SEAL_METADATA`) and is quite aggressive in\nterms of sacrificing performance and memory usage for security. The `light`\nconfiguration template disables the slab quarantines, write after free check,\nslot randomization and raises the guard slab interval from 1 to 8 but leaves\nzero-on-free and slab canaries enabled. The `light` configuration has solid\nperformance and memory usage while still being far more secure than mainstream\nallocators with much better security properties. Disabling zero-on-free would\ngain more performance but doesn't make much difference for small allocations\nwithout also disabling slab canaries. Slab canaries slightly raise memory use\nand slightly slow down performance but are quite important to mitigate small\noverflows and C string overflows. Disabling slab canaries is not recommended\nin most cases since it would no longer be a strict upgrade over traditional\nallocators with headers on allocations and basic consistency checks for them.\n\nFor reduced memory usage at the expense of performance (this will also reduce\nthe size of the empty slab caches and quarantines, saving a lot of memory,\nsince those are currently based on the size of the largest size class):\n\n    make \\\n    N_ARENA=1 \\\n    CONFIG_EXTENDED_SIZE_CLASSES=false\n\nThe following boolean configuration options are available:\n\n* `CONFIG_WERROR`: `true` (default) or `false` to control whether compiler\n  warnings are treated as errors. This is highly recommended, but it can be\n  disabled to avoid patching the Makefile if a compiler version not tested by\n  the project is being used and has warnings. Investigating these warnings is\n  still recommended and the intention is to always be free of any warnings.\n* `CONFIG_NATIVE`: `true` (default) or `false` to control whether the code is\n  optimized for the detected CPU on the host. If this is disabled, setting up a\n  custom `-march` higher than the baseline architecture is highly recommended\n  due to substantial performance benefits for this code.\n* `CONFIG_CXX_ALLOCATOR`: `true` (default) or `false` to control whether the\n  C++ allocator is replaced for slightly improved performance and detection of\n  mismatched sizes for sized deallocation (often type confusion bugs). This\n  will result in linking against the C++ standard library.\n* `CONFIG_ZERO_ON_FREE`: `true` (default) or `false` to control whether small\n  allocations are zeroed on free, to mitigate use-after-free and uninitialized\n  use vulnerabilities along with purging lots of potentially sensitive data\n  from the process as soon as possible. This has a performance cost scaling to\n  the size of the allocation, which is usually acceptable. This is not relevant\n  to large allocations because the pages are given back to the kernel.\n* `CONFIG_WRITE_AFTER_FREE_CHECK`: `true` (default) or `false` to control\n  sanity checking that new small allocations contain zeroed memory. This can\n  detect writes caused by a write-after-free vulnerability and mixes well with\n  the features for making memory reuse randomized / delayed. This has a\n  performance cost scaling to the size of the allocation, which is usually\n  acceptable. This is not relevant to large allocations because they're always\n  a fresh memory mapping from the kernel.\n* `CONFIG_SLOT_RANDOMIZE`: `true` (default) or `false` to randomize selection\n  of free slots within slabs. This has a measurable performance cost and isn't\n  one of the important security features, but the cost has been deemed more\n  than acceptable to be enabled by default.\n* `CONFIG_SLAB_CANARY`: `true` (default) or `false` to enable support for\n  adding 8 byte canaries to the end of memory allocations. The primary purpose\n  of the canaries is to render small fixed size buffer overflows harmless by\n  absorbing them. The first byte of the canary is always zero, containing\n  overflows caused by a missing C string NUL terminator. The other 7 bytes are\n  a per-slab random value. On free, integrity of the canary is checked to\n  detect attacks like linear overflows or other forms of heap corruption caused\n  by imprecise exploit primitives. However, checking on free will often be too\n  late to prevent exploitation so it's not the main purpose of the canaries.\n* `CONFIG_SEAL_METADATA`: `true` or `false` (default) to control whether Memory\n  Protection Keys are used to disable access to all writable allocator state\n  outside of the memory allocator code. It's currently disabled by default due\n  to a significant performance cost for this use case on current generation\n  hardware, which may become drastically lower in the future. Whether or not\n  this feature is enabled, the metadata is all contained within an isolated\n  memory region with high entropy random guard regions around it.\n\nThe following integer configuration options are available:\n\n* `CONFIG_SLAB_QUARANTINE_RANDOM_LENGTH`: `1` (default) to control the number\n  of slots in the random array used to randomize reuse for small memory\n  allocations. This sets the length for the largest size class (either 16kiB\n  or 128kiB based on `CONFIG_EXTENDED_SIZE_CLASSES`) and the quarantine length\n  for smaller size classes is scaled to match the total memory of the\n  quarantined allocations (1 becomes 1024 for 16 byte allocations with 16kiB\n  as the largest size class, or 8192 with 128kiB as the largest).\n* `CONFIG_SLAB_QUARANTINE_QUEUE_LENGTH`: `1` (default) to control the number of\n  slots in the queue used to delay reuse for small memory allocations. This\n  sets the length for the largest size class (either 16kiB or 128kiB based on\n  `CONFIG_EXTENDED_SIZE_CLASSES`) and the quarantine length for smaller size\n  classes is scaled to match the total memory of the quarantined allocations (1\n  becomes 1024 for 16 byte allocations with 16kiB as the largest size class, or\n  8192 with 128kiB as the largest).\n* `CONFIG_GUARD_SLABS_INTERVAL`: `1` (default) to control the number of slabs\n  before a slab is skipped and left as an unused memory protected guard slab.\n  The default of `1` leaves a guard slab between every slab. This feature does\n  not have a *direct* performance cost, but it makes the address space usage\n  sparser which can indirectly hurt performance. The kernel also needs to track\n  a lot more memory mappings, which uses a bit of extra memory and slows down\n  memory mapping and memory protection changes in the process. The kernel uses\n  O(log n) algorithms for this and system calls are already fairly slow anyway,\n  so having many extra mappings doesn't usually add up to a significant cost.\n* `CONFIG_GUARD_SIZE_DIVISOR`: `2` (default) to control the maximum size of the\n  guard regions placed on both sides of large memory allocations, relative to\n  the usable size of the memory allocation.\n* `CONFIG_REGION_QUARANTINE_RANDOM_LENGTH`: `256` (default) to control the\n  number of slots in the random array used to randomize region reuse for large\n  memory allocations.\n* `CONFIG_REGION_QUARANTINE_QUEUE_LENGTH`: `1024` (default) to control the\n  number of slots in the queue used to delay region reuse for large memory\n  allocations.\n* `CONFIG_REGION_QUARANTINE_SKIP_THRESHOLD`: `33554432` (default) to control\n  the size threshold where large allocations will not be quarantined.\n* `CONFIG_FREE_SLABS_QUARANTINE_RANDOM_LENGTH`: `32` (default) to control the\n  number of slots in the random array used to randomize free slab reuse.\n* `CONFIG_CLASS_REGION_SIZE`: `34359738368` (default) to control the size of\n  the size class regions.\n* `CONFIG_N_ARENA`: `4` (default) to control the number of arenas\n* `CONFIG_STATS`: `false` (default) to control whether stats on allocation /\n  deallocation count and active allocations are tracked. See the [section on\n  stats](#stats) for more details.\n* `CONFIG_EXTENDED_SIZE_CLASSES`: `true` (default) to control whether small\n  size class go up to 128kiB instead of the minimum requirement for avoiding\n  memory waste of 16kiB. The option to extend it even further will be offered\n  in the future when better support for larger slab allocations is added. See\n  the [section on size classes](#size-classes) below for details.\n* `CONFIG_LARGE_SIZE_CLASSES`: `true` (default) to control whether large\n  allocations use the slab allocation size class scheme instead of page size\n  granularity. See the [section on size classes](#size-classes) below for\n  details.\n\nThere will be more control over enabled features in the future along with\ncontrol over fairly arbitrarily chosen values like the size of empty slab\ncaches (making them smaller improves security and reduces memory usage while\nlarger caches can substantially improves performance).\n\n## Core design\n\nThe core design of the allocator is very simple / minimalist. The allocator is\nexclusive to 64-bit platforms in order to take full advantage of the abundant\naddress space without being constrained by needing to keep the design\ncompatible with 32-bit.\n\nThe mutable allocator state is entirely located within a dedicated metadata\nregion, and the allocator is designed around this approach for both small\n(slab) allocations and large allocations. This provides reliable, deterministic\nprotections against invalid free including double frees, and protects metadata\nfrom attackers. Traditional allocator exploitation techniques do not work with\nthe hardened\\_malloc implementation.\n\nSmall allocations are always located in a large memory region reserved for slab\nallocations. On free, it can be determined that an allocation is one of the\nsmall size classes from the address range. If arenas are enabled, the arena is\nalso determined from the address range as each arena has a dedicated sub-region\nin the slab allocation region. Arenas provide totally independent slab\nallocators with their own allocator state and no coordination between them.\nOnce the base region is determined (simply the slab allocation region as a\nwhole without any arenas enabled), the size class is determined from the\naddress range too, since it's divided up into a sub-region for each size class.\nThere's a top level slab allocation region, divided up into arenas, with each\nof those divided up into size class regions. The size class regions each have a\nrandom base within a large guard region. Once the size class is determined, the\nslab size is known, and the index of the slab is calculated and used to obtain\nthe slab metadata for the slab from the slab metadata array. Finally, the index\nof the slot within the slab provides the index of the bit tracking the slot in\nthe bitmap. Every slab allocation slot has a dedicated bit in a bitmap tracking\nwhether it's free, along with a separate bitmap for tracking allocations in the\nquarantine. The slab metadata entries in the array have intrusive lists\nthreaded through them to track partial slabs (partially filled, and these are\nthe first choice for allocation), empty slabs (limited amount of cached free\nmemory) and free slabs (purged / memory protected).\n\nLarge allocations are tracked via a global hash table mapping their address to\ntheir size and random guard size. They're simply memory mappings and get mapped\non allocation and then unmapped on free. Large allocations are the only dynamic\nmemory mappings made by the allocator, since the address space for allocator\nstate (including both small / large allocation metadata) and slab allocations\nis statically reserved.\n\nThis allocator is aimed at production usage, not aiding with finding and fixing\nmemory corruption bugs for software development. It does find many latent bugs\nbut won't include features like the option of generating and storing stack\ntraces for each allocation to include the allocation site in related error\nmessages. The design choices are based around minimizing overhead and\nmaximizing security which often leads to different decisions than a tool\nattempting to find bugs. For example, it uses zero-based sanitization on free\nand doesn't minimize slack space from size class rounding between the end of an\nallocation and the canary / guard region. Zero-based filling has the least\nchance of uncovering latent bugs, but also the best chance of mitigating\nvulnerabilities. The canary feature is primarily meant to act as padding\nabsorbing small overflows to render them harmless, so slack space is helpful\nrather than harmful despite not detecting the corruption on free. The canary\nneeds detection on free in order to have any hope of stopping other kinds of\nissues like a sequential overflow, which is why it's included.  It's assumed\nthat an attacker can figure out the allocator is in use so the focus is\nexplicitly not on detecting bugs that are impossible to exploit with it in use\nlike an 8 byte overflow. The design choices would be different if performance\nwas a bit less important and if a core goal was finding latent bugs.\n\n## Security properties\n\n* Fully out-of-line metadata/state with protection from corruption\n    * Address space for allocator state is entirely reserved during\n      initialization and never reused for allocations or anything else\n    * State within global variables is entirely read-only after initialization\n      with pointers to the isolated allocator state so leaking the address of\n      the library doesn't leak the address of writable state\n    * Allocator state is located within a dedicated region with high entropy\n      randomly sized guard regions around it\n    * Protection via Memory Protection Keys (MPK) on x86\\_64 (disabled by\n      default due to low benefit-cost ratio on top of baseline protections)\n    * [future] Protection via MTE on ARMv8.5+\n* Deterministic detection of any invalid free (unallocated, unaligned, etc.)\n    * Validation of the size passed for C++14 sized deallocation by `delete`\n      even for code compiled with earlier standards (detects type confusion if\n      the size is different) and by various containers using the allocator API\n      directly\n* Isolated memory region for slab allocations\n    * Top-level isolated regions for each arena\n    * Divided up into isolated inner regions for each size class\n        * High entropy random base for each size class region\n        * No deterministic / low entropy offsets between allocations with\n          different size classes\n    * Metadata is completely outside the slab allocation region\n        * No references to metadata within the slab allocation region\n        * No deterministic / low entropy offsets to metadata\n    * Entire slab region starts out non-readable and non-writable\n    * Slabs beyond the cache limit are purged and become non-readable and\n      non-writable memory again\n        * Placed into a queue for reuse in FIFO order to maximize the time\n          spent memory protected\n        * Randomized array is used to add a random delay for reuse\n* Fine-grained randomization within memory regions\n    * Randomly sized guard regions for large allocations\n    * Random slot selection within slabs\n    * Randomized delayed free for small and large allocations along with slabs\n      themselves\n    * [in-progress] Randomized choice of slabs\n    * [in-progress] Randomized allocation of slabs\n* Slab allocations are zeroed on free\n* Detection of write-after-free for slab allocations by verifying zero filling\n  is intact at allocation time\n* Delayed free via a combination of FIFO and randomization for slab allocations\n* Large allocations are purged and memory protected on free with the memory\n  mapping kept reserved in a quarantine to detect use-after-free\n    * The quarantine is primarily based on a FIFO ring buffer, with the oldest\n      mapping in the quarantine being unmapped to make room for the most\n      recently freed mapping\n    * Another layer of the quarantine swaps with a random slot in an array to\n      randomize the number of large deallocations required to push mappings out\n      of the quarantine\n* Memory in fresh allocations is consistently zeroed due to it either being\n  fresh pages or zeroed on free after previous usage\n* Random canaries placed after each slab allocation to *absorb*\n  and then later detect overflows/underflows\n    * High entropy per-slab random values\n    * Leading byte is zeroed to contain C string overflows\n* Possible slab locations are skipped and remain memory protected, leaving slab\n  size class regions interspersed with guard pages\n* Zero size allocations are a dedicated size class with the entire region\n  remaining non-readable and non-writable\n* Extension for retrieving the size of allocations with fallback to a sentinel\n  for pointers not managed by the allocator [in-progress, full implementation\n  needs to be ported from the previous OpenBSD malloc-based allocator]\n    * Can also return accurate values for pointers *within* small allocations\n    * The same applies to pointers within the first page of large allocations,\n      otherwise it currently has to return a sentinel\n* No alignment tricks interfering with ASLR like jemalloc, PartitionAlloc, etc.\n* No usage of the legacy brk heap\n* Aggressive sanity checks\n    * Errors other than ENOMEM from mmap, munmap, mprotect and mremap treated\n      as fatal, which can help to detect memory management gone wrong elsewhere\n      in the process.\n* Memory tagging for slab allocations via MTE on ARMv8.5+\n    * random memory tags as the baseline, providing probabilistic protection\n      against various forms of memory corruption\n    * dedicated tag for free slots, set on free, for deterministic protection\n      against accessing freed memory\n    * guarantee distinct tags for adjacent memory allocations by incrementing\n      past matching values for deterministic detection of linear overflows\n    * [future] store previous random tag and increment it to get the next tag\n      for that slot to provide deterministic use-after-free detection through\n      multiple cycles of memory reuse\n\n## Randomness\n\nThe current implementation of random number generation for randomization-based\nmitigations is based on generating a keystream from a stream cipher (ChaCha8)\nin small chunks. Separate CSPRNGs are used for each small size class in each\narena, large allocations and initialization in order to fit into the\nfine-grained locking model without needing to waste memory per thread by\nhaving the CSPRNG state in Thread Local Storage. Similarly, it's protected via\nthe same approach taken for the rest of the metadata. The stream cipher is\nregularly reseeded from the OS to provide backtracking and prediction\nresistance with a negligible cost. The reseed interval simply needs to be\nadjusted to the point that it stops registering as having any significant\nperformance impact. The performance impact on recent Linux kernels is\nprimarily from the high cost of system calls and locking since the\nimplementation is quite efficient (ChaCha20), especially for just generating\nthe key and nonce for another stream cipher (ChaCha8).\n\nChaCha8 is a great fit because it's extremely fast across platforms without\nrelying on hardware support or complex platform-specific code. The security\nmargins of ChaCha20 would be completely overkill for the use case. Using\nChaCha8 avoids needing to resort to a non-cryptographically secure PRNG or\nsomething without a lot of scrutiny. The current implementation is simply the\nreference implementation of ChaCha8 converted into a pure keystream by ripping\nout the XOR of the message into the keystream.\n\nThe random range generation functions are a highly optimized implementation\ntoo. Traditional uniform random number generation within a range is very high\noverhead and can easily dwarf the cost of an efficient CSPRNG.\n\n## Size classes\n\nThe zero byte size class is a special case of the smallest regular size class.\nIt's allocated in a dedicated region like other size classes but with the slabs\nnever being made readable and writable so the only memory usage is for the slab\nmetadata.\n\nThe choice of size classes for slab allocation is the same as jemalloc, which\nis a careful balance between minimizing internal and external fragmentation. If\nthere are more size classes, more memory is wasted on free slots available only\nto allocation requests of those sizes (external fragmentation). If there are\nfewer size classes, the spacing between them is larger and more memory is\nwasted due to rounding up to the size classes (internal fragmentation). There\nare 4 special size classes for the smallest sizes (16, 32, 48, 64) that are\nsimply spaced out by the minimum spacing (16). Afterwards, there are four size\nclasses for every power of two spacing which results in bounding the internal\nfragmentation below 20% for each size class. This also means there are 4 size\nclasses for each doubling in size.\n\nThe slot counts tied to the size classes are specific to this allocator rather\nthan being taken from jemalloc. Slabs are always a span of pages so the slot\ncount needs to be tuned to minimize waste due to rounding to the page size. For\nnow, this allocator is set up only for 4096 byte pages as a small page size is\ndesirable for finer-grained memory protection and randomization. It could be\nported to larger page sizes in the future. The current slot counts are only a\npreliminary set of values.\n\n| size class | worst case internal fragmentation | slab slots | slab size | internal fragmentation for slabs |\n| - | - | - | - | - |\n| 16 | 93.75% | 256 | 4096 | 0.0% |\n| 32 | 46.88% | 128 | 4096 | 0.0% |\n| 48 | 31.25% | 85 | 4096 | 0.390625% |\n| 64 | 23.44% | 64 | 4096 | 0.0% |\n| 80 | 18.75% | 51 | 4096 | 0.390625% |\n| 96 | 15.62% | 42 | 4096 | 1.5625% |\n| 112 | 13.39% | 36 | 4096 | 1.5625% |\n| 128 | 11.72% | 64 | 8192 | 0.0% |\n| 160 | 19.38% | 51 | 8192 | 0.390625% |\n| 192 | 16.15% | 64 | 12288 | 0.0% |\n| 224 | 13.84% | 54 | 12288 | 1.5625% |\n| 256 | 12.11% | 64 | 16384 | 0.0% |\n| 320 | 19.69% | 64 | 20480 | 0.0% |\n| 384 | 16.41% | 64 | 24576 | 0.0% |\n| 448 | 14.06% | 64 | 28672 | 0.0% |\n| 512 | 12.3% | 64 | 32768 | 0.0% |\n| 640 | 19.84% | 64 | 40960 | 0.0% |\n| 768 | 16.54% | 64 | 49152 | 0.0% |\n| 896 | 14.17% | 64 | 57344 | 0.0% |\n| 1024 | 12.4% | 64 | 65536 | 0.0% |\n| 1280 | 19.92% | 16 | 20480 | 0.0% |\n| 1536 | 16.6% | 16 | 24576 | 0.0% |\n| 1792 | 14.23% | 16 | 28672 | 0.0% |\n| 2048 | 12.45% | 16 | 32768 | 0.0% |\n| 2560 | 19.96% | 8 | 20480 | 0.0% |\n| 3072 | 16.63% | 8 | 24576 | 0.0% |\n| 3584 | 14.26% | 8 | 28672 | 0.0% |\n| 4096 | 12.48% | 8 | 32768 | 0.0% |\n| 5120 | 19.98% | 8 | 40960 | 0.0% |\n| 6144 | 16.65% | 8 | 49152 | 0.0% |\n| 7168 | 14.27% | 8 | 57344 | 0.0% |\n| 8192 | 12.49% | 8 | 65536 | 0.0% |\n| 10240 | 19.99% | 6 | 61440 | 0.0% |\n| 12288 | 16.66% | 5 | 61440 | 0.0% |\n| 14336 | 14.28% | 4 | 57344 | 0.0% |\n| 16384 | 12.49% | 4 | 65536 | 0.0% |\n\nThe slab allocation size classes end at 16384 since that's the final size for\n2048 byte spacing and the next spacing class matches the page size of 4096\nbytes on the target platforms. This is the minimum set of small size classes\nrequired to avoid substantial waste from rounding.\n\nThe `CONFIG_EXTENDED_SIZE_CLASSES` option extends the size classes up to\n131072, with a final spacing class of 16384. This offers improved performance\ncompared to the minimum set of size classes. The security story is complicated,\nsince the slab allocation has both advantages like size class isolation\ncompletely avoiding reuse of any of the address space for any other size\nclasses or other data. It also has disadvantages like caching a small number of\nempty slabs and deterministic guard sizes. The cache will be configurable in\nthe future, making it possible to disable slab caching for the largest slab\nallocation sizes, to force unmapping them immediately and putting them in the\nslab quarantine, which eliminates most of the security disadvantage at the\nexpense of also giving up most of the performance advantage, but while\nretaining the isolation.\n\n| size class | worst case internal fragmentation | slab slots | slab size | internal fragmentation for slabs |\n| - | - | - | - | - |\n| 20480 | 20.0% | 1 | 20480 | 0.0% |\n| 24576 | 16.66% | 1 | 24576 | 0.0% |\n| 28672 | 14.28% | 1 | 28672 | 0.0% |\n| 32768 | 12.5% | 1 | 32768 | 0.0% |\n| 40960 | 20.0% | 1 | 40960 | 0.0% |\n| 49152 | 16.66% | 1 | 49152 | 0.0% |\n| 57344 | 14.28% | 1 | 57344 | 0.0% |\n| 65536 | 12.5% | 1 | 65536 | 0.0% |\n| 81920 | 20.0% | 1 | 81920 | 0.0% |\n| 98304 | 16.67% | 1 | 98304 | 0.0% |\n| 114688 | 14.28% | 1 | 114688 | 0.0% |\n| 131072 | 12.5% | 1 | 131072 | 0.0% |\n\nThe `CONFIG_LARGE_SIZE_CLASSES` option controls whether large allocations use\nthe same size class scheme providing 4 size classes for every doubling of size.\nIt increases virtual memory consumption but drastically improves performance\nwhere realloc is used without proper growth factors, which is fairly common and\ndestroys performance in some commonly used programs. If large size classes are\ndisabled, the granularity is instead the page size, which is currently always\n4096 bytes on supported platforms.\n\n## Scalability\n\n### Small (slab) allocations\n\nAs a baseline form of fine-grained locking, the slab allocator has entirely\nseparate allocators for each size class. Each size class has a dedicated lock,\nCSPRNG and other state.\n\nThe slab allocator's scalability primarily comes from dividing up the slab\nallocation region into independent arenas assigned to threads. The arenas are\njust entirely separate slab allocators with their own sub-regions for each size\nclass. Using 4 arenas reserves a region 4 times as large and the relevant slab\nallocator metadata is determined based on address, as part of the same approach\nto finding the per-size-class metadata. The part that's still open to different\ndesign choices is how arenas are assigned to threads. One approach is\nstatically assigning arenas via round-robin like the standard jemalloc\nimplementation, or statically assigning to a random arena which is essentially\nthe current implementation. Another option is dynamic load balancing via a\nheuristic like `sched_getcpu` for per-CPU arenas, which would offer better\nperformance than randomly choosing an arena each time while being more\npredictable for an attacker. There are actually some security benefits from\nthis assignment being completely static, since it isolates threads from each\nother. Static assignment can also reduce memory usage since threads may have\nvarying usage of size classes.\n\nWhen there's substantial allocation or deallocation pressure, the allocator\ndoes end up calling into the kernel to purge / protect unused slabs by\nreplacing them with fresh `PROT_NONE` regions along with unprotecting slabs\nwhen partially filled and cached empty slabs are depleted. There will be\nconfiguration over the amount of cached empty slabs, but it's not entirely a\nperformance vs. memory trade-off since memory protecting unused slabs is a nice\nopportunistic boost to security. However, it's not really part of the core\nsecurity model or features so it's quite reasonable to use much larger empty\nslab caches when the memory usage is acceptable. It would also be reasonable to\nattempt to use heuristics for dynamically tuning the size, but there's not a\ngreat one size fits all approach so it isn't currently part of this allocator\nimplementation.\n\n#### Thread caching (or lack thereof)\n\nThread caches are a commonly implemented optimization in modern allocators but\naren't very suitable for a hardened allocator even when implemented via arrays\nlike jemalloc rather than free lists. They would prevent the allocator from\nhaving perfect knowledge about which memory is free in a way that's both race\nfree and works with fully out-of-line metadata. It would also interfere with\nthe quality of fine-grained randomization even with randomization support in\nthe thread caches. The caches would also end up with much weaker protection\nthan the dedicated metadata region. Potentially worst of all, it's inherently\nincompatible with the important quarantine feature.\n\nThe primary benefit from a thread cache is performing batches of allocations\nand batches of deallocations to amortize the cost of the synchronization used\nby locking. The issue is not contention but rather the cost of synchronization\nitself. Performing operations in large batches isn't necessarily a good thing\nin terms of reducing contention to improve scalability. Large thread caches\nlike TCMalloc are a legacy design choice and aren't a good approach for a\nmodern allocator. In jemalloc, thread caches are fairly small and have a form\nof garbage collection to clear them out when they aren't being heavily used.\nSince this is a hardened allocator with a bunch of small costs for the security\nfeatures, the synchronization is already a smaller percentage of the overall\ntime compared to a much leaner performance-oriented allocator. These benefits\ncould be obtained via allocation queues and deallocation queues which would\navoid bypassing the quarantine and wouldn't have as much of an impact on\nrandomization. However, deallocation queues would also interfere with having\nglobal knowledge about what is free. An allocation queue alone wouldn't have\nmany drawbacks, but it isn't currently planned even as an optional feature\nsince it probably wouldn't be enabled by default and isn't worth the added\ncomplexity.\n\nThe secondary benefit of thread caches is being able to avoid the underlying\nallocator implementation entirely for some allocations and deallocations when\nthey're mixed together rather than many allocations being done together or many\nfrees being done together. The value of this depends a lot on the application\nand it's entirely unsuitable / incompatible with a hardened allocator since it\nbypasses all of the underlying security and would destroy much of the security\nvalue.\n\n### Large allocations\n\nThe expectation is that the allocator does not need to perform well for large\nallocations, especially in terms of scalability. When the performance for large\nallocations isn't good enough, the approach will be to enable more slab\nallocation size classes. Doubling the maximum size of slab allocations only\nrequires adding 4 size classes while keeping internal waste bounded below 20%.\n\nLarge allocations are implemented as a wrapper on top of the kernel memory\nmapping API. The addresses and sizes are tracked in a global data structure\nwith a global lock. The current implementation is a hash table and could easily\nuse fine-grained locking, but it would have little benefit since most of the\nlocking is in the kernel. Most of the contention will be on the `mmap_sem` lock\nfor the process in the kernel. Ideally, it could simply map memory when\nallocating and unmap memory when freeing. However, this is a hardened allocator\nand the security features require extra system calls due to lack of direct\nsupport for this kind of hardening in the kernel. Randomly sized guard regions\nare placed around each allocation which requires mapping a `PROT_NONE` region\nincluding the guard regions and then unprotecting the usable area between them.\nThe quarantine implementation requires clobbering the mapping with a fresh\n`PROT_NONE` mapping using `MAP_FIXED` on free to hold onto the region while\nit's in the quarantine, until it's eventually unmapped when it's pushed out of\nthe quarantine. This means there are 2x as many system calls for allocating and\nfreeing as there would be if the kernel supported these features directly.\n\n## Memory tagging\n\nRandom tags are set for all slab allocations when allocated, with 4 excluded values:\n\n1. the reserved `0` tag\n2. the previous tag used for the slot\n3. the current (or previous) tag used for the slot to the left\n4. the current (or previous) tag used for the slot to the right\n\nWhen a slab allocation is freed, the reserved `0` tag is set for the slot.\nSlab allocation slots are cleared before reuse when memory tagging is enabled.\n\nThis ensures the following properties:\n\n- Linear overflows are deterministically detected.\n- Use-after-free are deterministically detected until the freed slot goes through\n  both the random and FIFO quarantines, gets allocated again, goes through both\n  quarantines again and then finally gets allocated again for a 2nd time.\n- Since the default `0` tag is reserved, untagged pointers can't access slab\n  allocations and vice versa.\n\nSlab allocations are done in a statically reserved region for each size class\nand all metadata is in a statically reserved region, so interactions between\ndifferent uses of the same address space is not applicable.\n\nLarge allocations beyond the largest slab allocation size class (128k by\ndefault) are guaranteed to have randomly sized guard regions to the left and\nright. Random and FIFO address space quarantines provide use-after-free\ndetection. We need to test whether the cost of random tags is acceptable to enabled them by default,\nsince they would be useful for:\n\n- probabilistic detection of overflows\n- probabilistic detection of use-after-free once the address space is\n  out of the quarantine and reused for another allocation\n- deterministic detection of use-after-free for reuse by another allocator.\n\nWhen memory tagging is enabled, checking for write-after-free at allocation\ntime and checking canaries are both disabled. Canaries will be more thoroughly\ndisabled when using memory tagging in the future, but Android currently has\n[very dynamic memory tagging support](https://source.android.com/docs/security/test/memory-safety/arm-mte)\nwhere it can be disabled at any time which creates a barrier to optimizing\nby disabling redundant features.\n\n## API extensions\n\nThe `void free_sized(void *ptr, size_t expected_size)` function exposes the\nsized deallocation sanity checks for C. A performance-oriented allocator could\nuse the same API as an optimization to avoid a potential cache miss from\nreading the size from metadata.\n\nThe `size_t malloc_object_size(void *ptr)` function returns an *upper bound* on\nthe accessible size of the relevant object (if any) by querying the malloc\nimplementation. It's similar to the `__builtin_object_size` intrinsic used by\n`_FORTIFY_SOURCE` but via dynamically querying the malloc implementation rather\nthan determining constant sizes at compile-time. The current implementation is\njust a naive placeholder returning much looser upper bounds than the intended\nimplementation. It's a valid implementation of the API already, but it will\nbecome fully accurate once it's finished. This function is **not** currently\nsafe to call from signal handlers, but another API will be provided to make\nthat possible with a compile-time configuration option to avoid the necessary\noverhead if the functionality isn't being used (in a way that doesn't change\nbreak API compatibility based on the configuration).\n\nThe `size_t malloc_object_size_fast(void *ptr)` is comparable, but avoids\nexpensive operations like locking or even atomics. It provides significantly\nless useful results falling back to higher upper bounds, but is very fast. In\nthis implementation, it retrieves an upper bound on the size for small memory\nallocations based on calculating the size class region. This function is safe\nto use from signal handlers already.\n\n## Stats\n\nIf stats are enabled, hardened\\_malloc keeps tracks allocator statistics in\norder to provide implementations of `mallinfo` and `malloc_info`.\n\nOn Android, `mallinfo` is used for [mallinfo-based garbage collection\ntriggering](https://developer.android.com/preview/features#mallinfo) so\nhardened\\_malloc enables `CONFIG_STATS` by default. The `malloc_info`\nimplementation on Android is the standard one in Bionic, with the information\nprovided to Bionic via Android's internal extended `mallinfo` API with support\nfor arenas and size class bins. This means the `malloc_info` output is fully\ncompatible, including still having `jemalloc-1` as the version of the data\nformat to retain compatibility with existing tooling.\n\nOn non-Android Linux, `mallinfo` has zeroed fields even with `CONFIG_STATS`\nenabled because glibc `mallinfo` is inherently broken. It defines the fields as\n`int` instead of `size_t`, resulting in undefined signed overflows. It also\nmisuses the fields and provides a strange, idiosyncratic set of values rather\nthan following the SVID/XPG `mallinfo` definition. The `malloc_info` function\nis still provided, with a similar format as what Android uses, with tweaks for\nhardened\\_malloc and the version set to `hardened_malloc-1`. The data format\nmay be changed in the future.\n\nAs an example, consider the following program from the hardened\\_malloc tests:\n\n```c\n#include \u003cpthread.h\u003e\n\n#include \u003cmalloc.h\u003e\n\n__attribute__((optimize(0)))\nvoid leak_memory(void) {\n    (void)malloc(1024 * 1024 * 1024);\n    (void)malloc(16);\n    (void)malloc(32);\n    (void)malloc(4096);\n}\n\nvoid *do_work(void *p) {\n    leak_memory();\n    return NULL;\n}\n\nint main(void) {\n    pthread_t thread[4];\n    for (int i = 0; i \u003c 4; i++) {\n        pthread_create(\u0026thread[i], NULL, do_work, NULL);\n    }\n    for (int i = 0; i \u003c 4; i++) {\n        pthread_join(thread[i], NULL);\n    }\n\n    malloc_info(0, stdout);\n}\n```\n\nThis produces the following output when piped through `xmllint --format -`:\n\n```xml\n\u003c?xml version=\"1.0\"?\u003e\n\u003cmalloc version=\"hardened_malloc-1\"\u003e\n  \u003cheap nr=\"0\"\u003e\n    \u003cbin nr=\"2\" size=\"32\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e32\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"3\" size=\"48\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e48\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"13\" size=\"320\"\u003e\n      \u003cnmalloc\u003e4\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e20480\u003c/slab_allocated\u003e\n      \u003callocated\u003e1280\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"29\" size=\"5120\"\u003e\n      \u003cnmalloc\u003e2\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e40960\u003c/slab_allocated\u003e\n      \u003callocated\u003e10240\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"45\" size=\"81920\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e81920\u003c/slab_allocated\u003e\n      \u003callocated\u003e81920\u003c/allocated\u003e\n    \u003c/bin\u003e\n  \u003c/heap\u003e\n  \u003cheap nr=\"1\"\u003e\n    \u003cbin nr=\"2\" size=\"32\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e32\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"3\" size=\"48\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e48\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"29\" size=\"5120\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e40960\u003c/slab_allocated\u003e\n      \u003callocated\u003e5120\u003c/allocated\u003e\n    \u003c/bin\u003e\n  \u003c/heap\u003e\n  \u003cheap nr=\"2\"\u003e\n    \u003cbin nr=\"2\" size=\"32\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e32\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"3\" size=\"48\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e48\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"29\" size=\"5120\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e40960\u003c/slab_allocated\u003e\n      \u003callocated\u003e5120\u003c/allocated\u003e\n    \u003c/bin\u003e\n  \u003c/heap\u003e\n  \u003cheap nr=\"3\"\u003e\n    \u003cbin nr=\"2\" size=\"32\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e32\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"3\" size=\"48\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e4096\u003c/slab_allocated\u003e\n      \u003callocated\u003e48\u003c/allocated\u003e\n    \u003c/bin\u003e\n    \u003cbin nr=\"29\" size=\"5120\"\u003e\n      \u003cnmalloc\u003e1\u003c/nmalloc\u003e\n      \u003cndalloc\u003e0\u003c/ndalloc\u003e\n      \u003cslab_allocated\u003e40960\u003c/slab_allocated\u003e\n      \u003callocated\u003e5120\u003c/allocated\u003e\n    \u003c/bin\u003e\n  \u003c/heap\u003e\n  \u003cheap nr=\"4\"\u003e\n    \u003callocated_large\u003e4294967296\u003c/allocated_large\u003e\n  \u003c/heap\u003e\n\u003c/malloc\u003e\n```\n\nThe heap entries correspond to the arenas. Unlike jemalloc, hardened\\_malloc\ndoesn't handle large allocations within the arenas, so it presents those in the\n`malloc_info` statistics as a separate arena dedicated to large allocations.\nFor example, with 4 arenas enabled, there will be a 5th arena in the statistics\nfor the large allocations.\n\nThe `nmalloc` / `ndalloc` fields are 64-bit integers tracking allocation and\ndeallocation count. These are defined as wrapping on overflow, per the jemalloc\nimplementation.\n\nSee the [section on size classes](#size-classes) to map the size class bin\nnumber to the corresponding size class. The bin index begins at 0, mapping to\nthe 0 byte size class, followed by 1 for the 16 bytes, 2 for 32 bytes, etc. and\nlarge allocations are treated as one group.\n\nWhen stats aren't enabled, the `malloc_info` output will be an empty `malloc`\nelement.\n\n## System calls\n\nThis is intended to aid with creating system call whitelists via seccomp-bpf\nand will change over time.\n\nSystem calls used by all build configurations:\n\n* `futex(uaddr, FUTEX_WAIT_PRIVATE, val, NULL)` (via `pthread_mutex_lock`)\n* `futex(uaddr, FUTEX_WAKE_PRIVATE, val)` (via `pthread_mutex_unlock`)\n* `getrandom(buf, buflen, 0)` (to seed and regularly reseed the CSPRNG)\n* `mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0)`\n* `mmap(ptr, size, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0)`\n* `mprotect(ptr, size, PROT_READ)`\n* `mprotect(ptr, size, PROT_READ|PROT_WRITE)`\n* `mremap(old, old_size, new_size, 0)`\n* `mremap(old, old_size, new_size, MREMAP_MAYMOVE|MREMAP_FIXED, new)`\n* `munmap`\n* `write(STDERR_FILENO, buf, len)` (before aborting due to memory corruption)\n* `madvise(ptr, size, MADV_DONTNEED)`\n\nThe main distinction from a typical malloc implementation is the use of\ngetrandom. A common compatibility issue is that existing system call whitelists\noften omit getrandom partly due to older code using the legacy `/dev/urandom`\ninterface along with the overall lack of security features in mainstream libc\nimplementations.\n\nAdditional system calls when `CONFIG_SEAL_METADATA=true` is set:\n\n* `pkey_alloc`\n* `pkey_mprotect` instead of `mprotect` with an additional `pkey` parameter,\n  but otherwise the same (regular `mprotect` is never called)\n\nAdditional system calls for Android builds with `LABEL_MEMORY`:\n\n* `prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr, size, name)`\n","funding_links":["https://github.com/sponsors/thestinger","https://grapheneos.org/donate"],"categories":["Endpoint","C","Android","C++"],"sub_categories":["Mobile / Android / iOS","Other"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGrapheneOS%2Fhardened_malloc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGrapheneOS%2Fhardened_malloc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGrapheneOS%2Fhardened_malloc/lists"}