{"id":16828206,"url":"https://github.com/cowtowncoder/low-gc-membuffers","last_synced_at":"2025-03-17T04:31:46.603Z","repository":{"id":57719236,"uuid":"1567760","full_name":"cowtowncoder/low-gc-membuffers","owner":"cowtowncoder","description":"Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead","archived":false,"fork":false,"pushed_at":"2022-05-31T18:29:06.000Z","size":1045,"stargazers_count":136,"open_issues_count":6,"forks_count":18,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-16T08:11:22.420Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cowtowncoder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2011-04-04T16:04:20.000Z","updated_at":"2025-03-02T18:45:26.000Z","dependencies_parsed_at":"2022-08-26T09:42:12.494Z","dependency_job_id":null,"html_url":"https://github.com/cowtowncoder/low-gc-membuffers","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cowtowncoder%2Flow-gc-membuffers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cowtowncoder%2Flow-gc-membuffers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cowtowncoder%2Flow-gc-membuffers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cowtowncoder%2Flow-gc-membuffers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cowtowncoder","download_url":"https://codeload.github.com/cowtowncoder/low-gc-membuffers/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243974303,"owners_count":20377338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T11:25:26.047Z","updated_at":"2025-03-17T04:31:46.252Z","avatar_url":"https://github.com/cowtowncoder.png","language":"Java","funding_links":[],"categories":["Memory and concurrency"],"sub_categories":[],"readme":"# Overview\n\nThis project aims at creating a simple efficient building block for \"Big Data\" libraries, applications and frameworks; thing that can be used as an in-memory, bounded queue with opaque values (sequence of JDK primitive values): insertions at tail, removal from head, single entry peeks), and that has minimal garbage collection overhead. Insertions and removals are as individual entries, which are sub-sequences of the full buffer.\n\nGC overhead minimization is achieved by use of direct `ByteBuffer`s (memory allocated outside of GC-prone heap); and bounded nature by only supporting storage of simple primitive value (`byte`, `long') sequences where size is explicitly known.\n\nConceptually memory buffers are just simple circular buffers (ring buffers) that hold a sequence of primitive values, bit like arrays, but in a way that allows dynamic automatic resizings of the underlying storage.\nLibrary supports efficient reusing and sharing of underlying segments for sets of buffers, although for many use cases a single buffer suffices.\n\nBuffers vary in two dimensions:\n\n1. Type of primitive value contained: currently `byte` and `long` variants are implemente, but others (like `int` or `char`) will be easy to add as needed\n2. Whether sequences are \"chunky\" -- sequences consists of 'chunks' created by distinct `appendEntry()` calls (and retrieved in exactly same sized chunks with `getNextEntry()`) -- or \"streamy\", meaning that values are coalesced and form a logical stream (so multiple `appendEntry()` calls may be coalesced into just one entry returned by `getNextEntry()`).\n\nSince Java has no support for \"generic primitives\", there are separate classes for all combinations.\nThis means that there are currently 4 flavors of buffers:\n\n* for `byte` (using `MemBuffersForBytes`)\n * `ChunkyBytesMemBuffer`\n * `StreamyBytesMemBuffer`\n* for `long` (using `MemBuffersForLongs`)\n * `ChunkyLongsMemBuffer`\n * `StreamyLongsMemBuffer`\n\nAnother thing that can vary is the way underlying segments are allocated; default is to use native (\"direct\") `ByteBuffer`s. But more on this later on.\n\n## Licensing\n\nStandard [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0.html) license.\n\n## Fancier stuff: multiple buffers\n\nAlthough having individual buffers is useful as is, this is just the beginning.\nConceptually library supports \"buffer groups\", sets of similary-valued buffer instances owned by a single factory (like `MemBuffersForBytes`) that share same segment allocator (`com.fasterxml.util.membuf.SegmentAllocator`).\nThis makes it possible to share set of reusable underlying `ByteBuffer` instances for buffers in the same group.\n\nThis ability to share underlying segments between buffers, with strict memory bounds makes it possible to use library as basic buffer manager; for example to buffer input and/or output of a web server (byte-based \"streamy\" buffers), or as simplistic event queues (usually using \"chunky\" buffers).\n\nTo have multiple buffer groups simply construct multiple factory instances.\n\n## Thread-safety\n\nAll pieces are designed to be used by multiple threads (often just 2, producer/consumer), so all access is properly synchronized.\n\nIn addition, locking is done using buffer instances, so it may occasionally make sense to synchronize on buffer instance since this allows you to create atomic sequences of operations, like so:\n\n    MemBuffersForBytes factory = new MemBuffersForBytes(...);\n    ChunkyBytesMemBuffer buffer = factory.createChunkyBuffer(...);\n    synchronized (buffer) {\n      // read latest, add right back:\n      byte[] msg = buffer.getNextEntry();\n      buffer.appendEntry(msg);\n    }\n\nor similarly if you need to read a sequence of entries as atomic unit.\n\n# Status\n\nProject has been used by multiple production systems (by multiple companies) since 2012,\nand by now has proven stable and performant for expected use cases.\nAs such it is considered production ready: the official 1.0 version was released in October 2013.\n\nThe first accessible project that uses it is [Arecibo](https://github.com/ning/Arecibo),\na metrics collection, aggregation and visualization.\n\nCompanies that use this library for production systems include:\n\n* [Mode Media](http://www.modemediacorp.com/) (nee Glam Media)\n* [Salesforce](http://www.salesforce.com/)\n\n# Usage\n\n## Getting it\n\nTo use with Maven, add:\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003ecom.fasterxml.util\u003c/groupId\u003e\n  \u003cartifactId\u003elow-gc-membuffers\u003c/artifactId\u003e\n  \u003cversion\u003e1.1.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nFor downloadables, javadocs check out [Wiki](https://github.com/cowtowncoder/low-gc-membuffers/wiki).\n\n## Start with a factory\n\nExact factory to use depends on value type: here we assume you are looking\nfor byte-based buffers. If so, you will use `MemBuffersForBytes`.\nThis object can be viewed as container and factory of actual buffers\n(`ChunkyBytesMemBuffer` or `StreamyBytesMemBuffer`).\nTo construct one, you need to specify amount of memory to use, as well as how memory should be sliced: so, for example:\n\n```java\nMemBuffersForBytes factory = new MemBuffersForBytes(30 * 1024, 2, 11);\n```\n\nwould create instance that allocates at least 2 (and at most 11) segments (which wrap direct `ByteBuffer` instances) with size of 30 kB: that is, has memory usage between 60 and 330 kilobytes.\nThe segments are then used by actual buffer instances (more on this in a bit)\n\nSo how do you choose parameters? Smaller the segments, more granular is memory allocation, which can mean more efficient memory use (since overhead is bounded to at most 1 segment-full per active buffer). But it also increases number of segment instances, possibly increasing fragmentation and adding overhead.\n\nNote that you can create multiple instances of `MemBuffers[Type]` factories, if you want to have more control over how pool of segments is allocated amongst individual buffers.\n\n### Detour: allocating underlying storage segments\n\nBy default segments are allocated as `ByteBuffer`s (or typed sub-types for `long`s and so on). But this behavior can be changed by passing alternate\n`SegmentAllocator` instances.\n\nFor example, if you instead wanted to use in-heap segments stored as basic\nbyte arrays (`byte[]`), you could do this by:\n\n```java\nMemBuffersForBytes factory = new MemBuffersForBytes(\n  ArrayBytesSegment.allocator(30 * 1024, 2, 11));\n```\n\nor to use non-direct `ByteBuffer`s:\n\n```java\nMemBuffersForBytes factory = new MemBuffersForBytes(\n  ByteBufferBytesSegment.allocator(30 * 1024, 2, 11, false));\n```\n\nNote that `SegmentAllocator` instances are implemented as inner classes of\nmatching segment type, that is as `ArrayBytesSegment.Allocator` and\n`ByteBufferBytesSegment.Allocator`.\n\nAlso note that neither `Allocator`s nor `MemBuffers` keep track of underlying\nsegments. What this means it that buffers MUST be closed (explicitly, or indirectly by using wrappers) to make sure segments are released for reuse.\n\n## Create individual buffers, `MemBuffer`\n\nActual buffers are then allocated using\n\n```java\nChunkyBytesMemBuffer items = bufs.createChunkyBuffer(2, 5);\n```\n\nwhich would indicate that this buffer will hold on to at least 2 segments (i.e. about 60kB raw storage) and use at most 5 (so max usage of 150kB).\nDue to circular buffer style of allocation, at least 'segments - 1' amount of memory will be available for actual queue (i.e. guaranteed space of 120kB; that is, up to one segment may be temporarily unavailable depending on pattern of append/remove operations.\n\n## And start buffering/unbuffering\n\nTo append entries, you use:\n\n```java\nbyte[] dataEntry = ...; // serialize from, say, JSON\nitems.appendEntry(dataEntry);\n```\n\nor, if you don't want an exception if there is no more room:\n\n```java\nif (!items.tryAppendEntry(dataEntry)) {\n   // recover? Drop entry? Log?\n}\n```\n\nand to pop entries:\n\n```java\nbyte[] next = items.getNextEntry(); // blocks if nothing available\n// or:\nnext = items.getNextEntryIfAvailable();\nif (next == null) { // nothing yet available\n    //...\n}\n// or:\nnext = items.getNextEntry(1000L); // block for at most 1 second before giving up\n```\n\n## And make sure that...\n\nYou '''always close buffers''' when you are done with them -- otherwise underlying segments may be leaked. This because buffers are only objects that keep track of segments; and nothing keeps track of `MemBuffer` instances created -- this is intentional, as synchronization otherwise needed is very expensive from concurrency perspective.\n\nNote that version 0.9.1 allows use of `MemBufferDecorator` instances, which makes it possible to build wrappers that can implement simple auto-closing of buffers.\n\n\n## Statistics, anyone?\n\nFinally, you can also obtain various statistics of buffer instances:\n\n```java\nint entries = items.getEntryCount(); // how many available for getting?\nint segmentsInUse = items.getSegmentCount(); // nr of internal segments\nlong maxFree = items.getMaximumAvailableSpace(); // approximate free space\nlong payload = items.getTotalPayloadLength(); // how much used by data?\n```\n\n# Download\n\nCheck out [Wiki](https://github.com/cowtowncoder/low-gc-membuffers/wiki) for downloads, Javadocs etc.\n\n# Known/potential problems\n\nDefault (and currently only) buffer implementation uses direct `ByteBuffer`s, and amount of memory that can be allocated is limited by JVM option `-XX:MaxDirectMemorySize`, which by default has relatively low size of 64megs.\nTo increase this setting, add setting like:\n\n    -XX:MaxDirectMemorySize=512m\n\notherwise you are likely to hit an OutOfMemoryError when using larger buffers.\n\n# Future ideas\n\nHere are some improvement ideas:\n\n* \"Slab\" allocation (issue #14): allow initial allocation of a longer off-heap memory segment, with size of `N` segments: this \"slab\" will be fixed and NOT dynamically allocated or freed; segments will be sub-allocated as needed.\n    * Main benefit is reduced need for actually memory management (no per-operation `malloc` or `free`)\n    * Adds fixed overhead: slab size needs to be balanced with costs\n    * Segments from slabs allocated before dynamic segments, as they do not incur additional allocation or memory usage cost (due to fixed default overhead)\n* Expose streamy byte buffers as `InputStream` (issue #19).\n    * Would need to choose what happens with end-of-input: snapshot (expose current and as EOF) vs blocking (works like pipe)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcowtowncoder%2Flow-gc-membuffers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcowtowncoder%2Flow-gc-membuffers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcowtowncoder%2Flow-gc-membuffers/lists"}