An open API service indexing awesome lists of open source software.

https://github.com/no-defun-allowed/baker-gc

A loose interpretation of an incremental copying collector
https://github.com/no-defun-allowed/baker-gc

do-not-use garbage-collector gc henry-baker-more-like-henry-based

Last synced: 2 months ago
JSON representation

A loose interpretation of an incremental copying collector

Awesome Lists containing this project

README

        

#+BEGIN_QUOTE
2018-10-08 08:31 i'm going to create the world's first conservative copying collector
#+END_QUOTE

2018 =no-defun-allowed= did in fact not create a conservative copying collector,
nor would she have been /even close/ to being first to do it. But 2020
=no-defun-allowed= has a conservative copying collector at least.

+This is a rough implementation of
[[https://www.cs.purdue.edu/homes/hosking/690M/p280-baker.pdf][ /List Processing in Real Time on a Serial Computer/ ]], which can collect
allocated conses incrementally, and somewhat conservatively (on the stack and
registers). However, the client must use a read barrier (as implemented by the
=car= and =cdr= functions provided.)+

Actually, no, it's a replication copying collector. It copies everything
reachable from the roots to newspace, then updates all the references in
referenced conses. This only requires a write barrier to replicate modifications
if there is a newspace copy of a cons.

This utterly abuses =setjmp= and how stacks appear to work on my machine, so
by using this code, you agree to not sue me for limbs lost to it. One can trace
roots (no pun intended) to [[https://medium.com/@MartinCracauer/llvms-garbage-collection-facilities-and-sbcl-s-generational-gc-a13eedfb1b31][SBCL's semi-conservative copying collector]] and the
[[https://wiki.openjdk.java.net/display/shenandoah/Main][Shenandoah JVM garbage collector]], which are not likely to blow your limbs off.
The stack walking code was also gleaned off
[[http://clim.rocks/gilbert/mark-sweep.c][Gilbert Baumann's mark-sweep implementation]].

** How it works

#+CAPTION: Pages and some variables that point inside them
[[./images/pages.png]]

*** Allocation

Conses are organised into pages (=struct page= of =pages.h=), with each page
16KiB wide. Each cons is a triple . We allocate
from the start of a page to its end, as typical for compacting collectors.

*** Scanning

At the start of a collection cycle, the stack and registers are scanned for
roots.The registers are scanned by using =setjmp= to "spill" the registers into
the stack, and the stack can be carefully probed for values that look like
pointers into the heap.

Checking what looks like a pointer can be done better with some heuristics
instead of iterating over pages most of the time; provided pages are allocated
mostly contiguously, we can eliminate most pointers outside the heap by checking
for presence in an interval holding all pages. Pages are also cached, so that
most searching can be eliminated.

At the end of a collection cycle, the stack and registers are scanned for
conses that must be pinned (as we cannot update the conservative roots), and
have their slots fixed up, by copying them from the copy.

*** Barriers

Each object has a /forwarding pointer/, either to itself, or its copy in
newspace. With /replication copying/, this pointer is only used when writing, to
replicate modifications in oldspace to newspace. Checking pointer (physical)
equality also requires retrieving the forwarding pointer to get canonical
pointers to compare.

*** Scheduling

Garbage collections start after some number of pages, the /threshold/, have
been allocated after the last collection finished. In order of priority, the
threshold is chosen so that:

- it is never below 10 pages (about 80kB),
- stack scanning completes in less than 3 milliseconds (adjustable with
=PAUSE_TARGET=), and
- the heap remains between 0.5x and 1.3x its last size after a collection.