{"id":13730225,"url":"https://github.com/orangeduck/tgc","last_synced_at":"2025-04-12T19:44:26.715Z","repository":{"id":45224272,"uuid":"56610280","full_name":"orangeduck/tgc","owner":"orangeduck","description":"A Tiny Garbage Collector for C","archived":false,"fork":false,"pushed_at":"2023-06-26T14:13:14.000Z","size":36941,"stargazers_count":991,"open_issues_count":5,"forks_count":65,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-04-03T22:09:54.499Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/orangeduck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-04-19T15:45:45.000Z","updated_at":"2025-04-02T23:33:07.000Z","dependencies_parsed_at":"2024-01-03T01:29:57.147Z","dependency_job_id":null,"html_url":"https://github.com/orangeduck/tgc","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orangeduck%2Ftgc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orangeduck%2Ftgc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orangeduck%2Ftgc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/orangeduck%2Ftgc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/orangeduck","download_url":"https://codeload.github.com/orangeduck/tgc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625479,"owners_count":21135512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T02:01:11.798Z","updated_at":"2025-04-12T19:44:26.688Z","avatar_url":"https://github.com/orangeduck.png","language":"C","readme":"Tiny Garbage Collector\n======================\n\nAbout\n-----\n\n`tgc` is a tiny garbage collector for C written in ~500 lines of code and based \non the [Cello Garbage Collector](http://libcello.org/learn/garbage-collection).\n\n```c\n#include \"tgc.h\"\n\nstatic tgc_t gc;\n\nstatic void example_function() {\n  char *message = tgc_alloc(\u0026gc, 64);\n  strcpy(message, \"No More Memory Leaks!\");\n}\n\nint main(int argc, char **argv) {\n  tgc_start(\u0026gc, \u0026argc);\n  \n  example_function();\n\n  tgc_stop(\u0026gc);\n}\n```\n\nUsage\n-----\n\n`tgc` is a conservative, thread local, mark and sweep garbage collector,\nwhich supports destructors, and automatically frees memory allocated by \n`tgc_alloc` and friends after it becomes _unreachable_.\n\nA memory allocation is considered _reachable_ by `tgc` if...\n\n* a pointer points to it, located on the stack at least one function call \ndeeper than the call to `tgc_start`, or,\n* a pointer points to it, inside memory allocated by `tgc_alloc` \nand friends.\n\nOtherwise a memory allocation is considered _unreachable_.\n\nTherefore some things that _don't_ qualify an allocation as _reachable_ are, \nif...\n\n* a pointer points to an address inside of it, but not at the start of it, or,\n* a pointer points to it from inside the `static` data segment, or, \n* a pointer points to it from memory allocated by `malloc`, \n`calloc`, `realloc` or any other non-`tgc` allocation methods, or, \n* a pointer points to it from a different thread, or, \n* a pointer points to it from any other unreachable location.\n\nGiven these conditions, `tgc` will free memory allocations some time after \nthey become _unreachable_. To do this it performs an iteration of _mark and \nsweep_ when `tgc_alloc` is called and the number of memory allocations exceeds \nsome threshold. It can also be run manually with `tgc_run`.\n\nMemory allocated by `tgc_alloc` can be manually freed with `tgc_free`, and \ndestructors (functions to be run just before memory is freed), can be \nregistered with `tgc_set_dtor`.\n\n\nReference\n---------\n\n```c\nvoid tgc_start(tgc_t *gc, void *stk);\n```\n\nStart the garbage collector on the current thread, beginning at the stack \nlocation given by the `stk` variable. Usually this can be found using the \naddress of any local variable, and then the garbage collector will cover all \nmemory at least one function call deeper.\n\n* * *\n\n```c\nvoid tgc_stop(tgc_t *gc);\n```\n\nStop the garbage collector and free its internal memory.\n\n* * *\n\n```c\nvoid tgc_run(tgc_t *gc);\n```\n\nRun an iteration of the garbage collector, freeing any unreachable memory.\n\n* * *\n\n```c\nvoid tgc_pause(tgc_t *gc);\nvoid tgc_resume(tgc_t *gc);\n```\n\nPause or resume the garbage collector. While paused the garbage collector will\nnot run during any allocations made.\n\n* * * \n\n```c\nvoid *tgc_alloc(gc_t *gc, size_t size);\n```\n\nAllocate memory via the garbage collector to be automatically freed once it\nbecomes unreachable.\n\n* * *\n\n```c\nvoid *tgc_calloc(gc_t *gc, size_t num, size_t size);\n```\n\nAllocate memory via the garbage collector and initalise it to zero.\n\n* * *\n\n```c\nvoid *tgc_realloc(gc_t *gc, void *ptr, size_t size);\n```\n\nReallocate memory allocated by the garbage collector.\n\n* * *\n\n```c\nvoid tgc_free(gc_t *gc, void *ptr);\n```\n\nManually free an allocation made by the garbage collector. Runs any destructor\nif registered.\n\n* * *\n\n```c\nvoid *tgc_alloc_opt(tgc_t *gc, size_t size, int flags, void(*dtor)(void*));\n```\n\nAllocate memory via the garbage collector with the given flags and destructor.\n\nFor the `flags` argument, the flag `TGC_ROOT` may be specified to indicate that \nthe allocation is a garbage collection _root_ and so should not be \nautomatically freed and instead will be manually freed by the user with \n`tgc_free`. Because roots are not automatically freed, they can exist in \nnormally unreachable locations such as in the `static` data segment or in \nmemory allocated by `malloc`. \n\nThe flag `TGC_LEAF` may be specified to indicate that the allocation is a \ngarbage collection _leaf_ and so contains no pointers to other allocations\ninside. This can benefit performance in many cases. For example, when \nallocating a large string there is no point the garbage collector scanning \nthis allocation - it can take a long time and doesn't contain any pointers.\n\nOtherwise the `flags` argument can be set to zero.\n\nThe `dtor` argument lets the user specify a _destructor_ function to be run \njust before the memory is freed. Destructors have many uses, for example they \nare often used to automatically release system resources (such as file handles)\nwhen a data structure is finished with them. For no destructor the value `NULL` \ncan be used.\n\n* * *\n\n```c\nvoid *tgc_calloc_opt(tgc_t *gc, size_t num, size_t size, int flags, void(*dtor)(void*));\n```\n\nAllocate memory via the garbage collector with the given flags and destructor \nand initalise to zero.\n\n* * *\n\n```c\nvoid tgc_set_dtor(tgc_t *gc, void *ptr, void(*dtor)(void*));\n```\n\nRegister a destructor function to be called after the memory allocation `ptr`\nbecomes unreachable, and just before it is freed by the garbage collector.\n\n* * *\n\n```c\nvoid tgc_set_flags(tgc_t *gc, void *ptr, int flags);\n```\n\nSet the flags associated with a memory allocation, for example the value \n`TGC_ROOT` can be used to specify that an allocation is a garbage collection \nroot.\n\n* * *\n\n```c\nint tgc_get_flags(tgc_t *gc, void *ptr);\n```\n\nGet the flags associated with a memory allocation.\n\n* * *\n\n```c\nvoid(*tgc_get_dtor(tgc_t *gc, void *ptr))(void*);\n```\n\nGet the destructor associated with a memory allocation.\n\n* * *\n\n```c\nsize_t tgc_get_size(tgc_t *gc, void *ptr);\n```\n\nGet the size of a memory allocation.\n\nF.A.Q\n-----\n\n### Is this real/safe/portable?\n\nDefinitely! While there is no way to create a _completely_ safe/portable \ngarbage collector in C this collector doesn't use any platform specific tricks \nand only makes the most basic assumptions about the platform, such as that the \narchitecture using a continuous call stack to implement function frames.\n\nIt _should_ be safe to use for more or less all reasonable architectures found \nin the wild and has been tested on Linux, Windows, and OSX, where it was easily \nintegrated into several large real world programs (see `examples`) such as \n`bzip2` and `oggenc` without issue.\n\nSaying all of that, there are the normal warnings - this library performs \n_undefined behaviour_ as specified by the C standard and so you use it at your \nown risk - there is no guarantee that something like a compiler or OS update \nwont mysteriously break it.\n\n\n### What happens when some data just happens to look like a pointer?\n\nIn this unlikely case `tgc` will treat the data as a pointer and assume that \nthe memory allocation it points to is still reachable. If this is causing your\napplication trouble by not allowing a large memory allocation to be freed \nconsider freeing it manually with `tgc_free`.\n\n\n### `tgc` isn't working when I increment pointers!\n\nDue to the way `tgc` works, it always needs a pointer to the start of each \nmemory allocation to be reachable. This can break algorithms such as the \nfollowing, which work by incrementing a pointer.\n\n```c\nvoid bad_function(char *y) {\n  char *x = tgc_alloc(\u0026gc, strlen(y) + 1);\n  strcpy(x, y);\n  while (*x) {\n    do_some_processsing(x);\n    x++;\n  }\n}\n```\n\nHere, when `x` is incremented, it no longer points to the start of the memory \nallocation made by `tgc_alloc`. Then during `do_some_processing`, if a sweep \nis performed, `x` will be declared as unreachable and the memory freed.\n\nIf the pointer `x` is also stored elsewhere such as inside a heap structure \nthere is no issue with incrementing a copy of it - so most of the time you \ndon't need to worry, but occasionally you may need to adjust algorithms which\ndo significant pointer arithmetic. For example, in this case the pointer can be \nleft as-is and an integer used to index it instead:\n\n```c\nvoid good_function(char *y) {\n  int i;\n  char *x = tgc_alloc(\u0026gc, strlen(y) + 1);\n  strcpy(x, y);\n  for (i = 0; i \u003c strlen(x); i++) {\n    do_some_processsing(\u0026x[i]);\n  }\n}\n```\n\nFor now this is the behaviour of `tgc` until I think of a way to \ndeal with offset pointers nicely.\n\n\n### `tgc` isn't working when optimisations are enabled!\n\nVariables are only considered reachable if they are one function call shallower \nthan the call to `tgc_start`. If optimisations are enabled sometimes the \ncompiler will inline functions which removes this one level of indirection.\n\nThe most portable way to get compilers not to inline functions is to call them \nthrough `volatile` function pointers.\n\n```c\nstatic tgc_t gc;\n\nvoid please_dont_inline(void) {\n  ...\n}\n\nint main(int argc, char **argv) {\n  \n  tgc_start(\u0026gc, \u0026argc);\n\n  void (*volatile func)(void) = please_dont_inline;\n  func();\n  \n  tgc_stop(\u0026gc);\n\n  return 1;\n}\n```\n\n### `tgc` isn't working with `setjmp` and `longjmp`!\n\nUnfortunately `tgc` doesn't work properly with `setjmp` and `longjmp` since \nthese functions can cause complex stack behaviour. One simple option is to \ndisable the garbage collector while using these functions and to re-enable\nit afterwards.\n\n### Why do I get _uninitialised values_ warnings with Valgrind?\n\nThe garbage collector scans the stack memory and this naturally contains \nuninitialised values. It scans memory safely, but if you are running through \nValgrind these accesses will be reported as warnings/errors. Other than this \n`tgc` shouldn't have any memory errors in Valgrind, so the easiest way to \ndisable these to examine any real problems is to run Valgrind with the option \n`--undef-value-errors=no`.\n\n### Is `tgc` fast?\n\nAt the moment `tgc` has decent performance - it is competative with many \nexisting memory management systems - but definitely can't claim to be the \nfastest garbage collector on the market. Saying that, there is a fair amount of \nlow hanging fruit for anyone interested in optimising it - so some potential to\nbe faster exists.\n\n\nHow it Works\n------------\n\nFor a basic _mark and sweep_ garbage collector two things are required. The \nfirst thing is a list of all of the allocations made by the program. The second \nis a list of all the allocations _in use_ by the program at any given time. \nWith these two things the algorithm is simple - compare the two lists and free \nany allocations which are in the first list, but not in the second - exactly \nthose allocations which are no longer in use.\n\nTo get a list of all the allocations made by the progam is relatively \nsimple. We make the programmer use a special function we've prepared (in this\ncase `tgc_alloc`) which allocates memory, and then adds a pointer to that \nmemory to an internal list. If at any point this allocation is freed (such as \nby `tgc_free`), it is removed from the list.\n\nThe second list is the difficult one - the list of allocations _in use_ by the \nprogram. At first, with C's semantics, pointer arithematic, and all the crazy \nflexibility that comes with it, it might seem like finding all the allocations \nin use by the program at any point in time is impossible, and to some extent \nyou'd be right. It can actually be shown that this problem reduces to the \nhalting problem in the most general case - even for languages saner than C - \nbut by slightly adjusting our problem statement, and assuming we are only \ndealing with a set of _well behaved_ C programs of some form, we can come up \nwith something that works.\n\nFirst we have to relax our goal a little. Instead of trying to find all of \nthe memory allocations _in use_ by a program, we can instead try to find all \nthe _reachable_ memory allocations - those allocations which have a pointer \npointing to them somewhere in the program's memory. The distinction here is \nsubtle but important. For example, I _could_ write a C program which makes an \nallocation, encodes the returned pointer as a string, and performs `rot13` on \nthat string, later on decoding the string, casting it back to a pointer, \nand using the memory as if nothing had happened. This is a perfectly valid, C \nprogram, and the crazy memory allocation is _is use_ throughout. It is just \nthat during the pointer's `rot13` encoding there is no practical way to know \nthat this memory allocation is still going to be used later on.\n\nSo instead we want to make a list of all memory allocations which are pointed \nto by pointers in the program's memory. For most _well behaved_ C programs this\nis enough to tell if an allocation is in use.\n\nIn general, memory in C exists in three different segments. We have the stack,\nthe heap, and the data segment. This means - if a pointer to a certain \nallocation exists in the program's memory it must be in one of these locations.\nNow the challenge is to find these locations, and scan them for pointers.\n\nThe data segment is the most difficult - there is no portable way to get the \nbounds of this segment. But because the data segment is somewhat limited in use \nwe can choose to ignore it - we tell users that allocations only pointed to \nfrom the data segment are not considered reachable.\n\nAs an aside, for programmers coming from other languages, this might seem like \na poor solution - to simply ask the programmer not to store pointers to \nallocations in this segment - and in many ways it is. It is never a good \ninterface to _request_ the programmer do something in the documentation - \ninstead it is better to handle every edge case to make it impossible for them \nto create an error. But this is C - in C programmers are constantly asked _not_\nto do things which are perfectly possible. In fact - one of the very things \nthis library is trying to deal with is the fact that programmers are only \n_asked_ to make sure they free dynamically allocated memory - there is no \nsystem in place to enforce this. So _for C_ this is a perfectly reasonable \ninterface. And there is an added advantage - it makes the implementation far \nmore simple - far more adaptable. In other words - [Worse Is Better](https://en.wikipedia.org/wiki/Worse_is_better).\n\nWith the data segment covered we have the heap and the stack. If we consider\nonly the heap allocations which have been made via `tgc_alloc` and friends then\nour job is again made easy - in our list of all allocations we also store the\nsize of each allocation. Then, if we need to scan one of the memory regions \nwe've allocated, the task is made easy.\n\nWith the heap and the data segment covered, this leaves us with the stack - \nthis is the most tricky segment. The stack is something we don't have any \ncontrol over, but we do know that for most reasonable implementations of C, the\nstack is a continuous area of memory that is expanded downwards (or for some\nimplementations upwards, but it doesn't matter) for each function call. It \ncontains the most important memory in regards to reachability - all of the \nlocal variables used in functions.\n\nIf we can get the memory addresses of the top and the bottom of the stack we \ncan scan the memory inbetween as if it were heap memory, and add to our list of \nreachable pointers all those found inbetween.\n\nAssuming the stack grows from top to bottom we can get a conservative \napproximation of the bottom of the stack by just taking the address of some \nlocal variable.\n\n```c\nvoid *stack_bottom(void) {\n  int x;\n  return \u0026x;\n}\n```\n\nThis address should cover the memory of all the local variables for whichever\nfunction calls it. For this reason we need to ensure two things before we \nactually do call it. First we want to make sure we flush all of the values in \nthe registers onto the stack so that we don't miss a pointer hiding in a \nregister, and secondly we want to make sure the call to `stack_bottom` isn't \ninlined by the compiler.\n\nWe can spill the registers into stack memory in a somewhat portable way with \n`setjmp` - which puts the registers into a `jmp_buf` variable. And we can \nensure that the function is not inlined by only calling it via a volatile \nfunction pointer. The `volatile` keyword forces the compiler to always manually\nread the pointer value from memory before calling the function, ensuring it\ncannot be inlined.\n\n```c\nvoid *get_stack_bottom(void) {\n  jmt_buf env;\n  setjmp(env);\n  void *(*volatile f)(void) = stack_bottom;\n  return f();\n}\n```\n\nTo get the top of the stack we can again get the address of a local variable.\nThis time it is easier if we simply ask the programmer to supply us with one.\nIf the programmer wishes for the garbage collector to scan the whole stack he \ncan give the address of a local variable in `main`. This address should cover \nall function calls one deeper than `main`. This we can store in some global \n(or local) variable.\n\n\n```c\nstatic void *stack_top = NULL;\n\nint main(int argc, char **argv) {\n  stack_top = \u0026argc;\n  run_program(argc, argv);\n  return 1;\n}\n```\n\nNow, at any point we can get a safe approximate upper and lower bound of the \nstack memory, allowing us to scan it for pointers. We interprit each bound as a\n`void **` - a pointer to an array of pointers, and iterate, interpriting the\nmemory inbetween as pointers.\n\n```c\nvoid mark(void) {\n  void **p;\n  void **t = stack_top;\n  void **b = get_stack_bottom();\n  \n  for (p = t; p \u003c b; p++) {\n    scan(*p);\n  }\n}\n```\n\n\n\n\n","funding_links":[],"categories":["Memory Allocation","C","内存分配","C++"],"sub_categories":["数学"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forangeduck%2Ftgc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Forangeduck%2Ftgc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Forangeduck%2Ftgc/lists"}