{"id":21363636,"url":"https://github.com/fredericbonnet/colibri","last_synced_at":"2025-10-07T03:53:27.359Z","repository":{"id":40798506,"uuid":"192409194","full_name":"fredericbonnet/colibri","owner":"fredericbonnet","description":"Colibri is a fast and lightweight garbage-collected datatype library written in C","archived":false,"fork":false,"pushed_at":"2025-07-25T09:01:05.000Z","size":2825,"stargazers_count":1,"open_issues_count":7,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-07-25T15:19:17.061Z","etag":null,"topics":["abstract-data-structures","abstract-data-types","c","data-structures","datatypes","garbage-collection","garbage-collector","memory-allocation"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fredericbonnet.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-17T19:49:40.000Z","updated_at":"2022-08-25T20:44:50.000Z","dependencies_parsed_at":"2023-01-21T16:00:45.072Z","dependency_job_id":null,"html_url":"https://github.com/fredericbonnet/colibri","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/fredericbonnet/colibri","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredericbonnet%2Fcolibri","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredericbonnet%2Fcolibri/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredericbonnet%2Fcolibri/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredericbonnet%2Fcolibri/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fredericbonnet","download_url":"https://codeload.github.com/fredericbonnet/colibri/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredericbonnet%2Fcolibri/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278717451,"owners_count":26033542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-data-structures","abstract-data-types","c","data-structures","datatypes","garbage-collection","garbage-collector","memory-allocation"],"created_at":"2024-11-22T06:20:28.782Z","updated_at":"2025-10-07T03:53:27.343Z","avatar_url":"https://github.com/fredericbonnet.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Colibri: A fast and lightweight garbage-collected datatype library\n\nColibri is an abstract datatype/in-memory storage infrastructure for the C\nlanguage. It features:\n\n- An extensible data type system dubbed \"words\"\n- A fast and efficient cell-based allocator\n- An exact (AKA accurate or precise), generational, copying, mark-and-sweep\n  garbage collector for automatic memory management\n\nColibri is written in plain C and is free of any dependency besides system\nlibraries. The compiled binary DLL on Windows is about 85kB. The source code is\nheavily commented using the [Doxygen] syntax.\n\n## License\n\nColibri is released under the terms of the 3-Clause BSD License:\n\nhttps://opensource.org/licenses/BSD-3-Clause\n\n## History\n\nI began working on Colibri around 2008 while thinking about the future of the\n[Tcl] language. Tcl is a string-based language on the\noutside, and although it uses more elaborate data types on the inside for\nperformance, its string implementation still relies on flat arrays of\ncharacters. Colibri started as an experiment with ropes, a string data structure\nbased on self-balancing binary trees, along with automatic memory management\nthanks to a garbage collector. Nowadays Colibri supports a wide variety of\nprimitive data types and structures for general purpose application development,\nwith emphasis on performance, frugality, and simplicity.\n\n## What are colibris?\n\nColibris, known in English as hummingbirds, are a family of birds known for\ntheir small size and high wing speed. The bee hummingbird (_Mellisuga helenae_),\nfound in Cuba, is the smallest of all birds, with a mass of 1.8g and a total\nlength of about 5cm. They are renown for their stationary and backward flight\nabilities on par with flying insects, which allow them to feed on the nectar of\nplants and flowers in-flight.\n\n[![mellisuga_helenae]][mellisuga_helenae_link]\n\nI've chosen this name for this project because its goal is to be fast and\nlightweight, and also to follow the feather and bird theme shared with Tcl and\nmany related projects.\n\n## Features\n\n### Words\n\nWords are a generic abstract datatype framework. Colibri supports the following\nword types:\n\n- Immutable primitives:\n\n  - Nil: the nil singleton\n  - Booleans: true or false singletons\n  - Integer numbers\n  - Floating point numbers\n  - Unicode Characters\n\n- Immutable Unicode strings:\n\n  - Regular strings: flat arrays of characters using 1/2/4-byte fixed width or\n    UTF-8/16 variable width encodings\n  - [Ropes]: self-balancing binary trees of strings\n\n- String buffers for efficient dynamic string building\n\n- Linear collections:\n\n  - Immutable vectors: fixed-length, flat arrays of words\n  - Mutable words: flat arrays of words with a preallocated capacity\n  - Immutable lists: self-balancing binary trees of vectors; lists can be cyclic\n  - Mutable lists: modifiable lists with efficient copy-on-write semantics and\n    conversion to immutable form; large mutable lists can be sparsely allocated\n  - Custom lists: collections implemented with custom code\n\n- Mutable associative arrays:\n\n  - Hash maps: randomized associative arrays with integer, string or custom keys\n  - Trie maps: self-sorting associative arrays with integer, string or custom\n    keys\n  - Custom maps: associative arrays implemented with custom code\n\n- Custom word types: dynamic datatypes implemented with custom code, with all\n  the benefits of automatic memory management\n\nStrings, linear collections and associative arrays come with generic iterators.\n\n### Word synonyms\n\nIn [duck-typed][duck-typing] languages like Tcl, the type of a value depends on\nthe way it is used, which can vary over time. For example, the literal `123` can\nrepresent:\n\n- the string \"123\"\n- the integer 123, in its various binary forms (8, 16, 32 bits...)\n- the floating point number 123.0\n- some internal type (e.g. a bucket index in a hash map)\n- etc.\n\nHigh-level language implementations typically associate such values with an\ninternal structure that represent the underlying type at a given time. This\ntechnique avoids repeated data conversions, a typically expensive operation, and\nhence gives a performance boost when a value is used repeatedly (in a loop for\nexample). However, in some cases the type may alternate between two or more\ninternal representations; for example the above string may switch several times\nbetween integer and float representations. In the Tcl world this phenomenon is\nknown as \"shimmering\". In such occurrences, the previous internal representation\nis lost and will have to be computed again shall the value be used with the same\napparent type later on.\n\nShimmering happens when an external representation can only have one internal\nrepresentation. However, in Colibri any word can have an arbitrary number of\nsynonyms, other words that share common characteristics but use different\ndatatypes: they could for example convey the same semantics or simply have the\nsame external representation (a string, like in the above example). Words can\nform circular chains of synonyms; circular structures are notoriously\ntroublesome when it comes to memory management, causing memory leaks or\nexcessive bookkeeping, but fortunately a garbage collector is an elegant way to\navoid this class of problems altogether.\n\n### Cell-based pool allocator\n\nWords are either immediate or allocated.\n\nMost primitives are represented as immediate values whenever possible: the value\nis stored directly in the pointer rather than in an allocated structure. In\neffect, this means that collections of such words need no extra storage space\nbesides the collection itself.\n\nAllocated words are stored in cells. Each cell is made of 4 machine words, i.e.\n16 bytes on 32-bit architectures, and 32 bytes on 64-bits. Predefined datatypes\nmake the most use of single-cell storage to minimize overhead. Custom types can\nstore a minimum of 2 machine words with no upper limit.\n\nLow-level memory allocation uses system pages, usually 4 kilobytes, themselves\ndivided into logical pages. The cell allocation algorithm is a simple pool\nallocator over logical pages. The overhead is very small: only 2 bits per cell.\n\nAllocation is performed on a per-thread basis for maximum performances and\nminimum contention. Page-based pool allocation also improves locality of\nreference and cache use.\n\n### Garbage collector\n\nMemory management relies on an exact (AKA accurate or precise), generational,\ncopying, mark-and-sweep, garbage collector.\n\n#### Exact\n\nContrary to conservative garbage collectors such as the [Boehm GC][boehm-gc], an\nexact garbage collector doesn't rely on heuristics to know when and where to\nfollow a pointer to reachable memory. This ensures that no memory will leak or\nbe freed accidentally.\n\nColibri words are designed with that goal in mind. All predefined word types are\nautomatically garbage-collected. Custom types can define a finalizer that is\ncalled at deletion time for cleanup, and can declare nested structures as well.\nThat way, it is perfectly acceptable to mix words with regular malloc'ed memory\nblocks, or to not use the predefined word typesystem at all and still get all\nthe benefits of the garbage collector.\n\n#### Generational\n\nGenerational GC is a form of incremental GC where memory areas are gathered in\nmemory pools depending on their age. Colibri implements the following policy:\n\n- Cell allocation is done in the 'eden' memory pool\n- GC occurs on a memory pool when its size reaches a given threshold\n- GC is performed in the generational order, from newer to older pools,\n  following a logarithmic frequency: newer pools are collected more frequently\n  than older pools\n- Surviving cells are promoted to the next generation\n\nThis policy ensures that long-lived cells are collected less often as they get\npromoted to older generations, whereas short-lived cells are less likely to\nsurvive the next GC.\n\n#### Copying\n\nIn the general case, memory promotion is done by moving whole pages from older\nto newer memory pools. This minimizes CPU overhead but may lead to memory\nfragmentation over time. So when fragmentation exceeds a certain threshold,\ncells are instead moved to a new, compact page in the target pool. This improves\nlocality of reference and cache use over time.\n\n#### Mark-and-sweep\n\nColibri uses a mark-and-sweep algorithm to mark reachable cells and eventually\nsweep unreachable ones. The marking phase starts at root cells and follow all\nreferences recursively; at the end of this phase, marked cells are promoted, and\nunmarked cells are discarded.\n\nRoots must be explicitly declared by the application using a simple API.\n\nMutations in older generations are detected automatically thanks to write\nbarriers. Modified cells from uncollected pool will be traversed during the\nmarking phase in case they refer to cells that belong to a collected pool.\n\n#### Threading model\n\nThe GC process is fully controllable (pause/resume) so that applications don't\nget interrupted unexpectedly. To allocate cells, the client code must pause the\nGC first (enter a GC-protected section), and must resume the GC when done (leave\nthe GC-protected section).\n\nColibri supports several threading models:\n\n- **Single**: Strict appartment, single-threaded model + stop-the-world GC.\n  Allocated memory is thread-local. GC is performed synchronously in the calling\n  thread when it resumes the GC.\n\n- **Asynchronous**: Strict appartment, single-threaded model + asynchronous GC.\n  Allocated memory is thread-local, but GC is performed in a separate thread so\n  that the main thread can perform other tasks in the meantime (I/O, event\n  handling...).\n\n- **Shared**: Multithreaded model + asynchronous GC. Each thread has its own\n  eden pool for better performance, however memory is shared by all threads of\n  the group. GC is performed in a separate thread when no working thread is\n  paused.\n\nA thread can only belong to one thread group of the above models. However there\ncan be several distinct groups in the same process.\n\nReferences cannot cross group boundaries; a word from one group cannot reference\na word from another group. Among the predefined word types, immutable words are\nthread-safe, but mutable words are not, and thus cannot be used concurrently by\nseveral threads of the same group without explicit synchronization. Custom words\ncan implement their own thread-safe data structures though.\n\n## Portability\n\nThe code is fairly portable on 32-bit and 64-bit systems: all public and\ninternal types are based on portable C-99 types such as `intptr_t`.\n\nThe only parts that need platform-specific code are low-level page allocation,\nmemory protection and related exception/signal handling, and GC-related\nsynchronization. Colibri needs system calls that allocate boundary-aligned\npages, as well as synchronization primitives such as mutexes and condition\nvariables, and write barriers for change detection. At present both Windows and\nUnix (Linux) versions are provided, the latter using `mmap`. Porting to other\nsystems should require minimal effort as long as they provide the necessary\nfeatures; the platform-specific code is limited to a handful of functions\ngathered in dedicated source subtrees. Platform-specific peculiarities should\nnot impact the overall architecture. Indeed, Windows and Unix platforms are\ndifferent enough to be confident on this point.\n\nA medium-term goal is to support the [WebAssembly] target. For now there remains\nsome roadblocks that prevent Colibri from being compiled to WASM using the usual\nmethods ([clang] or [Emscripten])\n\n## Build \u0026 install\n\nThe build process depends on [CMake]. You also have to install a toolchain for\nyour platform if needed.\n\n### Windows\n\nThe easiest and recommended way to build Colibri on Windows is to install the\nMicrosoft Visual Studio build tools. You can either download and install them\nmanually, or use the package manager [Chocolatey]:\n\n```bat\nchoco install visualstudio2017buildtools\n```\n\nCMake will then find and select the toolchain automatically.\n\nNote that you can install CMake with Chocolatey as well:\n\n```bat\nchoco install cmake\n```\n\nTo build the release version of Colibri, first go to the source directory and\ngenerate the build system:\n\n```bat\ncmake -S . -B build\n```\n\nYou can then launch the build process:\n\n```bat\ncmake --build build --config release\n```\n\nThis should build the binaries in the `build\\Release` directory.\n\n### Unix\n\nTo build Colibri on Unix systems with the default toolchain, go to the source\ndirectory and generate the build system:\n\n```sh\ncmake -S . -B build\n```\n\nYou can then launch the build process:\n\n```sh\ncmake --build build --config release\n```\n\nThis should build the binaries in the `build` directory.\n\n## Tests\n\nColibri requires [PicoTest] for testing. The test suite is enabled when CMake\ndetects the PicoTest package. The simplest way to do so is to give CMake the\npath to PicoTest when generating the build system, like so:\n\n```sh\ncmake -S . -B build -DCMAKE_MODULE_PATH=\u003c/path/to/picotest\u003e\n```\n\nTo run the test suite, simply run `ctest` from within the `build` directory.\n\n## Documentation\n\nThe complete documentation is available here:\n\nhttps://fredericbonnet.github.io/colibri\n\nThe documentation site was built using these great tools:\n\n- [Doxygen] extracts the documentation from the source code as both HTML and XML\n  formats\n- [seaborg] converts the XML files to Markdown (full disclosure: I'm the author\n  of this tool!)\n- [docsify] generates the documentation site from the Markdown files\n\nTo rebuild the documentation you'll need the following tools:\n\n- [Doxygen] to process the provided `Doxyfile` and parse the source code\n- [Node.js] to run the build scripts:\n\n```sh\nnpm run docs\n```\n\nIf you want to serve the documentation locally you can use the provided script:\n\n```sh\nnpm run docsify\n```\n\n[docs]: https://fredericbonnet.github.io/colibri\n[htmldocs]: https://fredericbonnet.github.io/colibri/public/html/index.html\n[doxygen]: http://www.stack.nl/~dimitri/doxygen/\n[seaborg]: https://github.com/fredericbonnet/seaborg\n[docsify]: https://docsify.js.org/\n[node.js]: https://nodejs.org/\n[picotest]: https://github.com/fredericbonnet/picotest\n[chocolatey]: https://chocolatey.org\n[cmake]: https://cmake.org\n[webassembly]: https://webassembly.org\n[clang]: https://clang.llvm.org\n[emscripten]: https://emscripten.org/\n[tcl]: https://www.tcl-lang.org\n[boehm-gc]: https://en.wikipedia.org/wiki/Boehm_garbage_collector\n[ropes]: https://en.wikipedia.org/wiki/Rope_(data_structure)\n[duck-typing]: https://en.wikipedia.org/wiki/Duck_typing\n[mellisuga_helenae_link]: https://commons.wikimedia.org/wiki/File:Bee_hummingbird_(Mellisuga_helenae)_adult_male_in_flight.jpg \":target=_blank\"\n[mellisuga_helenae]: https://upload.wikimedia.org/wikipedia/commons/thumb/c/ca/Bee_hummingbird_%28Mellisuga_helenae%29_adult_male_in_flight.jpg/1024px-Bee_hummingbird_%28Mellisuga_helenae%29_adult_male_in_flight.jpg \"Bee hummingbird (Melisuga helenae) adult male, Cuba\"\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffredericbonnet%2Fcolibri","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffredericbonnet%2Fcolibri","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffredericbonnet%2Fcolibri/lists"}