Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/wingo/walloc

A small malloc implementation, for use in WebAssembly
https://github.com/wingo/walloc
Last synced: 20 days ago
JSON representation
A small malloc implementation, for use in WebAssembly
Host: GitHub
URL: https://github.com/wingo/walloc
Owner: wingo
License: other
Created: 2020-09-04T08:47:38.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2023-12-19T18:55:46.000Z (11 months ago)
Last Synced: 2024-02-17T06:36:40.282Z (9 months ago)
Language: C
Size: 43 KB
Stars: 123
Watchers: 4
Forks: 9
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

README

        # walloc

walloc is a bare-bones implementation of `malloc` for use by C

programs when targetting WebAssembly.  It is a single-file

implementation with no dependencies: no stdlib, no JavaScript imports,

no emscripten.

Walloc was designed with the following priorities, in order:

 1. Standalone.  No stdlib needed; no emscripten.  Can be included in a

    project without pulling in anything else.

 2. Reasonable allocation speed and fragmentation/overhead.

 3. Small size, to minimize download time.

 4. Standard interface: a drop-in replacement for malloc.

 5. Single-threaded (currently, anyway).

Emscripten includes a couple of good malloc implementations

([dlmalloc](https://github.com/emscripten-core/emscripten/blob/master/system/lib/dlmalloc.c)

and

[emmalloc](https://github.com/emscripten-core/emscripten/blob/master/system/lib/emmalloc.cpp));

perhaps consider using one of those?  But if you are really looking for

a bare-bones malloc, walloc is fine.

## Test

```

$ make CC=$LLVM/clang LD=$LLVM/wasm-ld JS=node test

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o test.o test.c

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

wasm-ld --no-entry --import-memory -o test.wasm test.o walloc.o

node test.js

Seeding RNG with [2959819678, 4094888344, 3121363251, 822200628].

Allocating 2 MB, iteration 0.

Freeing 2031 allocations.

Allocating 2 MB, iteration 1.

Freeing 1956 allocations.

Allocating 2 MB, iteration 2.

Freeing 2000 allocations.

Allocating 2 MB, iteration 3.

Freeing 2037 allocations.

...

Allocating 2 MB, iteration 38.

Freeing 2029 allocations.

Allocating 2 MB, iteration 39.

Freeing 2023 allocations.

Success.

```

You can link `walloc.c` into your program just by adding it to your link

line, as above.

## Size

The resulting wasm file is about 2 kB (uncompressed).

Walloc isn't the smallest allocator out there.  A simple bump-pointer

allocator that never frees is the fastest thing you can have.  There is

also an alternate allocator for Rust,

[wee_alloc](https://github.com/rustwasm/wee_alloc), which is said to be

smaller than walloc, though it is less space-efficient for small

objects.  But still, walloc is pretty small.

## Design

When a C program is compiled to WebAssembly, the resulting wasm module

(usually) has associated linear memory.  It can be linked in a way that

the memory is created by the module when it's instantiated, or such that

the module is given a memory by its host.  The above example passed

`--import-memory` to the linker, allowing the host to bound memory

usage for the module instance.

The linear memory has the usual data, stack, and heap segments.  The

data and stack are placed first.  The heap starts at the `&__heap_base`

symbol.  (This symbol is computed and defined by the linker.)  All bytes

above `&__heap_base` can be used by the wasm program as it likes.  So

`&__heap_base` is the lower bound of memory managed by walloc.

```

                                              memory growth ->

+----------------+-----------+-------------+-------------+----

| data and stack | alignment | walloc page | walloc page | ...

+----------------+-----------+-------------+-------------+----

^ 0              ^ &__heap_base            ^ 64 kB aligned

```

The upper bound of memory managed by walloc is the total size of the

memory, which is aligned on 64-kilobyte boundaries.  (WebAssembly

ensures this alignment.)  Walloc manages memory in 64-kb pages as well.

It starts with whatever memory is initially given to the module, and

will expand the memory if it runs out.  The host can specify a maximum

memory size, in pages; if no more pages are available, walloc's `malloc`

will simply return `NULL`; handling out-of-memory is up to the caller.

Walloc has two allocation strategies: small and large objects.

### Large objects

A large object is more than 256 bytes.

There is a global freelist of available large objects, each of which has

a header indicating its size.  When allocating, walloc does a best-fit

search through that list.  

```c

struct large_object {

  struct large_object *next;

  size_t size;

  char payload[0];

};

struct large_object* large_object_free_list;

```

Large object allocations are rounded up to 256-byte boundaries,

including the header.

If there is no object on the freelist that can satisfy an allocation,

walloc will expand the heap by the size of the allocation, or by half of

the current walloc heap size, whichever is larger.  The resulting page

or pages form a large object that can satisfy the allocation.

If the best object on the freelist has more than a chunk of space on the

end, it is split, and the tail put back on the freelist.  A chunk is 256

bytes.

```

+-------------+---------+---------+-----+-----------+

| page header | chunk 1 | chunk 2 | ... | chunk 255 |

+-------------+---------+---------+-----+-----------+

^ +0          ^ +256    ^ +512                      ^ +64 kB

```

As each page is 65536 bytes, and each chunk is 256 bytes, there are

therefore 256 chunks in a page.  The first chunk in a page that begins

an allocated object, large or small, contains a header chunk.  The page

header has a byte for each of the 256 chunks in the page.  The byte is

255 if the corresponding chunk starts a large object; otherwise the byte

indicates the size class for packed small-object allocations (see

below).

```

+-------------+---------+---------+----------+-----------+

| page header | large object 1    | large object 2 ...   |

+-------------+---------+---------+----------+-----------+

^ +0          ^ +256    ^ +512                           ^ +64 kB

```

When splitting large objects, we avoid starting a new large object on a

page header chunk.  A large object can only span where a page header

chunk would be if it includes the entire page.

Freeing a large object pushes it on the global freelist.  We know a

pointer is a large object by looking at the page header.  We know the

size of the allocation, because the large object header precedes the

allocation.  When the next large object allocation happens after a free,

the freelist will be compacted by merging adjacent large objects.

### Small objects

Small objects are allocated from segregated freelists.  The granule size

is 8 bytes.  Small object allocations are packed in a chunk of uniform

allocation size.  There are size classes for allocations of each size

from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all.

For example, an allocation of e.g. 12 granules will be satisfied from a

16-granule chunk.  Each size class has its own free list.

```c

struct small_object_freelist {

  struct small_object_freelist *next;

};

struct small_object_freelist small_object_freelists[10];

```

When allocating, if there is nothing on the corresponding freelist,

walloc will allocate a new large object, then change its chunk kind in

the page header to the size class.  It then goes through the fresh

chunk, threading the objects through each other onto a free list.

```

+-------------+---------+---------+------------+---------------------+

| page header | large object 1    | granules=4 | large object 2' ... |

+-------------+---------+---------+------------+---------------------+

^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

```

In this example, we imagine that the 4-granules freelist was empty, and

that the large object freelist contained only large object 2, running

all the way to the end of the page.  We allocated a new 4-granules

chunk, splitting the first chunk off the large object, and pushing the

newly trimmed large object back onto the large object freelist, updating

the page header appropriately.  We then thread the 4-granules (32-byte)

allocations in the fresh chunk together (the chunk has room for 8 of

them), treating them as if they were instances of `struct freelist`,

pushing them onto the global freelist for 4-granules allocations.

```

           in fresh chunk, next link for object N points to object N+1

                                 /--------\                     

                                 |        |

            +------------------+-^--------v-----+----------+

granules=4: | (padding, maybe) | object 0 | ... | object 7 |

            +------------------+----------+-----+----------+

                               ^ 4-granule freelist now points here 

```

The size classes were chosen so that any wasted space (padding) is less

than the size class.

Freeing a small object pushes it back on its size class's free list.

Given a pointer, we know its size class by looking in the chunk kind in

the page header.

## License

`walloc` is available under a permissive MIT-style license.  See

[LICENSE.md](./LICENSE.md) for full details.