{"id":13438023,"url":"https://github.com/mkirchner/gc","last_synced_at":"2025-04-12T15:36:12.636Z","repository":{"id":36639889,"uuid":"228448213","full_name":"mkirchner/gc","owner":"mkirchner","description":"Simple, zero-dependency garbage collection for C","archived":false,"fork":false,"pushed_at":"2021-12-15T00:29:13.000Z","size":136,"stargazers_count":1118,"open_issues_count":3,"forks_count":64,"subscribers_count":24,"default_branch":"master","last_synced_at":"2024-02-17T06:34:53.019Z","etag":null,"topics":["c","garbage-collection","memory-management","zero-dependency"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mkirchner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-16T18:19:15.000Z","updated_at":"2024-02-10T14:16:04.000Z","dependencies_parsed_at":"2022-07-08T07:33:28.630Z","dependency_job_id":null,"html_url":"https://github.com/mkirchner/gc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fgc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fgc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fgc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mkirchner%2Fgc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mkirchner","download_url":"https://codeload.github.com/mkirchner/gc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248590291,"owners_count":21129778,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","garbage-collection","memory-management","zero-dependency"],"created_at":"2024-07-31T03:01:02.259Z","updated_at":"2025-04-12T15:36:12.609Z","avatar_url":"https://github.com/mkirchner.png","language":"C","readme":"![Build Status](https://github.com/mkirchner/gc/workflows/C/C++%20CI/badge.svg)\n[![Coverage Status](https://coveralls.io/repos/github/mkirchner/gc/badge.svg)](https://coveralls.io/github/mkirchner/gc)\n\n# gc: mark \u0026 sweep garbage collection for C\n\n`gc` is an implementation of a conservative, thread-local, mark-and-sweep\ngarbage collector. The implementation provides a fully functional replacement\nfor the standard POSIX `malloc()`, `calloc()`, `realloc()`, and `free()` calls.\n\nThe focus of `gc` is to provide a conceptually clean implementation of\na mark-and-sweep GC, without delving into the depths of architecture-specific\noptimization (see e.g. the [Boehm GC][boehm] for such an undertaking). It\nshould be particularly suitable for learning purposes and is open for all kinds\nof optimization (PRs welcome!).\n\nThe original motivation for `gc` is my desire to write [my own LISP][stutter]\nin C, entirely from scratch - and that required garbage collection.\n\n\n### Acknowledgements\n\nThis work would not have been possible without the ability to read the work of\nothers, most notably the [Boehm GC][boehm], orangeduck's [tgc][tgc] (which also\nfollows the ideals of being tiny and simple), and [The Garbage Collection\nHandbook][garbage_collection_handbook].\n\n\n## Table of contents\n\n* [Table of contents](#table-of-contents)\n* [Documentation Overview](#documentation-overview)\n* [Quickstart](#quickstart)\n  * [Download and test](#download-and-test)\n  * [Basic usage](#basic-usage)\n* [Core API](#core-api)\n  * [Starting, stopping, pausing, resuming and running GC](#starting-stopping-pausing-resuming-and-running-gc)\n  * [Memory allocation and deallocation](#memory-allocation-and-deallocation)\n  * [Helper functions](#helper-functions)\n* [Basic Concepts](#basic-concepts)\n  * [Data Structures](#data-structures)\n  * [Garbage collection](#garbage-collection)\n  * [Reachability](#reachability)\n  * [The Mark-and-Sweep Algorithm](#the-mark-and-sweep-algorithm)\n  * [Finding roots](#finding-roots)\n  * [Depth-first recursive marking](#depth-first-recursive-marking)\n  * [Dumping registers on the stack](#dumping-registers-on-the-stack)\n  * [Sweeping](#sweeping)\n\n## Documentation Overview\n\n* Read the [quickstart](#quickstart) below to see how to get started quickly\n* The [concepts](#concepts) section describes the basic concepts and design\n  decisions that went into the implementation of `gc`.\n* Interleaved with the concepts, there are implementation sections that detail\n  the implementation of the core components, see [hash map\n  implementation](#data-structures), [dumping registers on the\n  stack](#dumping-registers-on-the-stack), [finding roots](#finding-roots), and\n  [depth-first, recursive marking](#depth-first-recursive-marking).\n\n\n## Quickstart\n\n### Download, compile and test\n\n    $ git clone git@github.com:mkirchner/gc.git\n    $ cd gc\n    \nTo compile using the `clang` compiler:\n\n    $ make test\n    \nTo use the GNU Compiler Collection (GCC):\n\n    $ make test CC=gcc\n    \nThe tests should complete successfully. To create the current coverage report:\n\n    $ make coverage\n\n\n### Basic usage\n\n```c\n...\n#include \"gc.h\"\n...\n\n\nvoid some_fun() {\n    ...\n    int* my_array = gc_calloc(\u0026gc, 1024, sizeof(int));\n    for (size_t i=0; i\u003c1024; ++i) {\n        my_array[i] = 42;\n    }\n    ...\n    // look ma, no free!\n}\n\nint main(int argc, char* argv[]) {\n    gc_start(\u0026gc, \u0026argc);\n    ...\n    some_fun();\n    ...\n    gc_stop(\u0026gc);\n    return 0;\n}\n```\n\n## Core API\n\nThis describes the core API, see `gc.h` for more details and the low-level API.\n\n### Starting, stopping, pausing, resuming and running GC\n\nIn order to initialize and start garbage collection, use the `gc_start()`\nfunction and pass a *bottom-of-stack* address:\n\n```c\nvoid gc_start(GarbageCollector* gc, void* bos);\n```\n\nThe bottom-of-stack parameter `bos` needs to point to a stack-allocated\nvariable and marks the low end of the stack from where [root\nfinding](#root-finding) (scanning) starts. \n\nGarbage collection can be stopped, paused and resumed with\n\n```c\nvoid gc_stop(GarbageCollector* gc);\nvoid gc_pause(GarbageCollector* gc);\nvoid gc_resume(GarbageCollector* gc);\n```\n\nand manual garbage collection can be triggered with\n\n```c\nsize_t gc_run(GarbageCollector* gc);\n```\n\n### Memory allocation and deallocation\n\n`gc` supports `malloc()`, `calloc()`and `realloc()`-style memory allocation.\nThe respective function signatures mimick the POSIX functions (with the\nexception that we need to pass the garbage collector along as the first\nargument):\n\n```c\nvoid* gc_malloc(GarbageCollector* gc, size_t size);\nvoid* gc_calloc(GarbageCollector* gc, size_t count, size_t size);\nvoid* gc_realloc(GarbageCollector* gc, void* ptr, size_t size);\n```\n\nIt is possible to pass a pointer to a destructor function through the\nextended interface:\n\n```c\nvoid* dtor(void* obj) {\n   // do some cleanup work\n   obj-\u003eparent-\u003ederegister();\n   obj-\u003edb-\u003edisconnect()\n   ...\n   // no need to free obj\n}\n...\nSomeObject* obj = gc_malloc_ext(gc, sizeof(SomeObject), dtor);\n...\n``` \n\n`gc` supports static allocations that are garbage collected only when the\nGC shuts down via `gc_stop()`. Just use the appropriate helper function:\n\n```c\nvoid* gc_malloc_static(GarbageCollector* gc, size_t size, void (*dtor)(void*));\n```\n\nStatic allocation expects a pointer to a finalization function; just set to\n`NULL` if finalization is not required.\n\nNote that `gc` currently does not guarantee a specific ordering when it\ncollects static variables, If static vars need to be deallocated in a\nparticular order, the user should call `gc_free()` on them in the desired\nsequence prior to calling `gc_stop()`, see below.\n\nIt is also possible to trigger explicit memory deallocation using \n\n```c\nvoid gc_free(GarbageCollector* gc, void* ptr);\n```\n\nCalling `gc_free()` is guaranteed to (a) finalize/destruct on the object\npointed to by `ptr` if applicable and (b) to free the memory that `ptr` points to\nirrespective of the current scheduling for garbage collection and will also\nwork if GC has been paused using `gc_pause()` above.\n\n\n### Helper functions\n\n`gc` also offers a `strdup()` implementation that returns a garbage-collected\ncopy:\n\n```c\nchar* gc_strdup (GarbageCollector* gc, const char* s);\n```\n\n\n## Basic Concepts\n\nThe fundamental idea behind garbage collection is to automate the memory\nallocation/deallocation cycle. This is accomplished by keeping track of all\nallocated memory and periodically triggering deallocation for memory that is\nstill allocated but [unreachable](#reachability).\n\nMany advanced garbage collectors also implement their own approach to memory\nallocation (i.e. replace `malloc()`). This often enables them to layout memory\nin a more space-efficient manner or for faster access but comes at the price of\narchitecture-specific implementations and increased complexity. `gc` sidesteps\nthese issues by falling back on the POSIX `*alloc()` implementations and keeping\nmemory management and garbage collection metadata separate. This makes `gc`\nmuch simpler to understand but, of course, also less space- and time-efficient\nthan more optimized approaches.\n\n### Data Structures\n\nThe core data structure inside `gc` is a hash map that maps the address of\nallocated memory to the garbage collection metadata of that memory:\n\nThe items in the hash map are allocations, modeled with the `Allocation`\n`struct`:\n\n```c\ntypedef struct Allocation {\n    void* ptr;                // mem pointer\n    size_t size;              // allocated size in bytes\n    char tag;                 // the tag for mark-and-sweep\n    void (*dtor)(void*);      // destructor\n    struct Allocation* next;  // separate chaining\n} Allocation;\n```\n\nEach `Allocation` instance holds a pointer to the allocated memory, the size of\nthe allocated memory at that location, a tag for mark-and-sweep (see below), an\noptional pointer to the destructor function and a pointer to the next\n`Allocation` instance (for separate chaining, see below).\n\nThe allocations are collected in an `AllocationMap` \n\n```c\ntypedef struct AllocationMap {\n    size_t capacity;\n    size_t min_capacity;\n    double downsize_factor;\n    double upsize_factor;\n    double sweep_factor;\n    size_t sweep_limit;\n    size_t size;\n    Allocation** allocs;\n} AllocationMap;\n```\n\nthat, together with a set of `static` functions inside `gc.c`, provides hash\nmap semantics for the implementation of the public API.\n\nThe `AllocationMap` is the central data structure in the `GarbageCollector`\nstruct which is part of the public API:\n\n```c\ntypedef struct GarbageCollector {\n    struct AllocationMap* allocs;\n    bool paused;\n    void *bos;\n    size_t min_size;\n} GarbageCollector;\n```\n\nWith the basic data structures in place, any `gc_*alloc()` memory allocation\nrequest is a two-step procedure: first, allocate the memory through system (i.e.\nstandard `malloc()`) functionality and second, add or update the associated\nmetadata to the hash map.\n\nFor `gc_free()`, use the pointer to locate the metadata in the hash map,\ndetermine if the deallocation requires a destructor call, call if required,\nfree the managed memory and delete the metadata entry from the hash map.\n\nThese data structures and the associated interfaces enable the\nmanagement of the metadata required to build a garbage collector.\n\n\n### Garbage collection\n\n`gc` triggers collection under two circumstances: (a) when any of the calls to\nthe system allocation fail (in the hope to deallocate sufficient memory to\nfulfill the current request); and (b) when the number of entries in the hash\nmap passes a dynamically adjusted high water mark.\n\nIf either of these cases occurs, `gc` stops the world and starts a\nmark-and-sweep garbage collection run over all current allocations. This\nfunctionality is implemented in the `gc_run()` function which is part of the\npublic API and delegates all work to the `gc_mark()` and `gc_sweep()` functions\nthat are part of the private API.\n\n`gc_mark()` has the task of [finding roots](#finding-roots) and tagging all\nknown allocations that are referenced from a root (or from an allocation that\nis referenced from a root, i.e. transitively) as \"used\". Once the marking of\nis completed, `gc_sweep()` iterates over all known allocations and\ndeallocates all unused (i.e. unmarked) allocations, returns to `gc_run()` and\nthe world continues to run.\n\n\n### Reachability\n\n`gc` will keep memory allocations that are *reachable* and collect everything\nelse. An allocation is considered reachable if any of the following is true:\n\n1. There is a pointer on the stack that points to the allocation content.\n   The pointer must reside in a stack frame that is at least as deep in the call\n   stack as the bottom-of-stack variable passed to `gc_start()` (i.e. `bos` is\n   the smallest stack address considered during the mark phase).\n2. There is a pointer inside `gc_*alloc()`-allocated content that points to the\n   allocation content.\n3. The allocation is tagged with `GC_TAG_ROOT`.\n\n\n### The Mark-and-Sweep Algorithm\n\nThe naïve mark-and-sweep algorithm runs in two stages. First, in a *mark*\nstage, the algorithm finds and marks all *root* allocations and all allocations\nthat are reachable from the roots.  Second, in the *sweep* stage, the algorithm\npasses over all known allocations, collecting all allocations that were not\nmarked and are therefore deemed unreachable.\n\n### Finding roots\n\nAt the beginning of the *mark* stage, we first sweep across all known\nallocations and find explicit roots with the `GC_TAG_ROOT` tag set.\nEach of these roots is a starting point for [depth-first recursive\nmarking](#depth-first-recursive-marking).\n\n`gc` subsequently detects all roots in the stack (starting from the bottom-of-stack\npointer `bos` that is passed to `gc_start()`) and the registers (by [dumping them\non the stack](#dumping-registers-on-the-stack) prior to the mark phase) and\nuses these as starting points for marking as well.\n\n### Depth-first recursive marking\n\nGiven a root allocation, marking consists of (1) setting the `tag` field in an\n`Allocation` object to `GC_TAG_MARK` and (2) scanning the allocated memory for\npointers to known allocations, recursively repeating the process.\n\nThe underlying implementation is a simple, recursive depth-first search that\nscans over all memory content to find potential references:\n\n```c\nvoid gc_mark_alloc(GarbageCollector* gc, void* ptr)\n{\n    Allocation* alloc = gc_allocation_map_get(gc-\u003eallocs, ptr);\n    if (alloc \u0026\u0026 !(alloc-\u003etag \u0026 GC_TAG_MARK)) {\n        alloc-\u003etag |= GC_TAG_MARK;\n        for (char* p = (char*) alloc-\u003eptr;\n             p \u003c (char*) alloc-\u003eptr + alloc-\u003esize;\n             ++p) {\n            gc_mark_alloc(gc, *(void**)p);\n        }\n    }\n}\n```\n\nIn `gc.c`, `gc_mark()` starts the marking process by marking the\nknown roots on the stack via a call to `gc_mark_roots()`. To mark the roots we\ndo one full pass through all known allocations. We then proceed to dump the\nregisters on the stack.\n\n\n### Dumping registers on the stack\n\nIn order to make the CPU register contents available for root finding, `gc`\ndumps them on the stack. This is implemented in a somewhat portable way using\n`setjmp()`, which stores them in a `jmp_buf` variable right before we mark the\nstack:\n\n```c\n...\n/* Dump registers onto stack and scan the stack */\nvoid (*volatile _mark_stack)(GarbageCollector*) = gc_mark_stack;\njmp_buf ctx;\nmemset(\u0026ctx, 0, sizeof(jmp_buf));\nsetjmp(ctx);\n_mark_stack(gc);\n...\n```\n\nThe detour using the `volatile` function pointer `_mark_stack` to the\n`gc_mark_stack()` function is necessary to avoid the inlining of the call to\n`gc_mark_stack()`.\n\n\n### Sweeping\n\nAfter marking all memory that is reachable and therefore potentially still in\nuse, collecting the unreachable allocations is trivial. Here is the\nimplementation from `gc_sweep()`:\n\n```c\nsize_t gc_sweep(GarbageCollector* gc)\n{\n    size_t total = 0;\n    for (size_t i = 0; i \u003c gc-\u003eallocs-\u003ecapacity; ++i) {\n        Allocation* chunk = gc-\u003eallocs-\u003eallocs[i];\n        Allocation* next = NULL;\n        while (chunk) {\n            if (chunk-\u003etag \u0026 GC_TAG_MARK) {\n                /* unmark */\n                chunk-\u003etag \u0026= ~GC_TAG_MARK;\n                chunk = chunk-\u003enext;\n            } else {\n                total += chunk-\u003esize;\n                if (chunk-\u003edtor) {\n                    chunk-\u003edtor(chunk-\u003eptr);\n                }\n                free(chunk-\u003eptr);\n                next = chunk-\u003enext;\n                gc_allocation_map_remove(gc-\u003eallocs, chunk-\u003eptr, false);\n                chunk = next;\n            }\n        }\n    }\n    gc_allocation_map_resize_to_fit(gc-\u003eallocs);\n    return total;\n}\n```\n\nWe iterate over all allocations in the hash map (the `for` loop), following every\nchain (the `while` loop with the `chunk = chunk-\u003enext` update) and either (1)\nunmark the chunk if it was marked; or (2) call the destructor on the chunk and\nfree the memory if it was not marked, keeping a running total of the amount of\nmemory we free.\n\nThat concludes the mark \u0026 sweep run. The stopped world is resumed and we're\nready for the next run!\n\n\n\n[naive_mas]: https://en.wikipedia.org/wiki/Tracing_garbage_collection#Naïve_mark-and-sweep\n[boehm]: https://www.hboehm.info/gc/ \n[stutter]: https://github.com/mkirchner/stutter\n[tgc]: https://github.com/orangeduck/tgc\n[garbage_collection_handbook]: https://amzn.to/2VdEvjC\n","funding_links":[],"categories":["C","Members"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkirchner%2Fgc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmkirchner%2Fgc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmkirchner%2Fgc/lists"}