{"id":13411662,"url":"https://github.com/embano1/memlog","last_synced_at":"2025-04-13T00:48:06.545Z","repository":{"id":43041129,"uuid":"444025083","full_name":"embano1/memlog","owner":"embano1","description":"A Kafka log inspired in-memory and append-only data structure","archived":false,"fork":false,"pushed_at":"2025-04-07T06:51:48.000Z","size":154,"stargazers_count":132,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-13T00:48:02.271Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/embano1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-03T10:44:56.000Z","updated_at":"2025-04-09T14:31:18.000Z","dependencies_parsed_at":"2023-12-17T10:25:19.863Z","dependency_job_id":"163f334d-fed6-487a-b044-195cd40ebf33","html_url":"https://github.com/embano1/memlog","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embano1%2Fmemlog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embano1%2Fmemlog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embano1%2Fmemlog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/embano1%2Fmemlog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/embano1","download_url":"https://codeload.github.com/embano1/memlog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650437,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:15.521Z","updated_at":"2025-04-13T00:48:06.522Z","avatar_url":"https://github.com/embano1.png","language":"Go","funding_links":[],"categories":["数据结构与算法","Data Structures and Algorithms","Generators","Data Integration Frameworks","Uncategorized"],"sub_categories":["队列","Queues"],"readme":"[![Go\nReference](https://pkg.go.dev/badge/github.com/embano1/memlog.svg)](https://pkg.go.dev/github.com/embano1/memlog)\n[![Tests](https://github.com/embano1/memlog/actions/workflows/tests.yaml/badge.svg)](https://github.com/embano1/memlog/actions/workflows/tests.yaml)\n[![Latest\nRelease](https://img.shields.io/github/release/embano1/memlog.svg?logo=github\u0026style=flat-square)](https://github.com/embano1/memlog/releases/latest)\n[![Go Report\nCard](https://goreportcard.com/badge/github.com/embano1/memlog)](https://goreportcard.com/report/github.com/embano1/memlog)\n[![codecov](https://codecov.io/gh/embano1/memlog/branch/main/graph/badge.svg?token=TC7MW723JO)](https://codecov.io/gh/embano1/memlog)\n[![go.mod Go\nversion](https://img.shields.io/github/go-mod/go-version/embano1/memlog)](https://github.com/embano1/memlog)\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)  \n\n# About\n\n## tl;dr\n\nAn easy to use, lightweight, thread-safe and append-only in-memory data\nstructure modeled as a *Log*.\n\nThe `Log` also serves as an abstraction and building block. See\n[`sharded.Log`](./sharded/README.md) for an implementation of a *sharded*\nvariant of `memlog.Log`.\n\n❌ Note: this package is not about providing an in-memory `logging` library. To\nread more about the ideas behind `memlog` please see [\"The Log: What every\nsoftware engineer should know about real-time data's unifying\nabstraction\"](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying).\n\n## Motivation\n\nI keep hitting the same user story (use case) over and over again: one or more\nclients connected to my application wanting to read an **immutable** stream of\ndata, e.g. events or sensor data, **in-order**, **concurrently** (thread-safe)\nand **asynchronously** (at their own pace) and in a resource (memory)\n**efficient** way.\n\nThere's many solutions to this problem, e.g. exposing some sort of streaming API\n(*gRPC*, HTTP/REST long-polling) based on custom logic using Go channels or an\ninternal [ring buffer](https://pkg.go.dev/container/ring), or putting data into\nan external platform like [Kafka](https://kafka.apache.org/), [Redis\nStreams](https://redis.io/topics/streams-intro) or [RabbitMQ\nStreams](https://blog.rabbitmq.com/posts/2021/07/rabbitmq-streams-overview).\n\nThe challenges I faced with these solutions were that either they were too\n**complex** (or simply **overkill**) for my problem. Or, the system I had to\nintegrate with and read data from did not have a nice streaming API or Go SDK,\nthus repeating myself writing complex internal caching, buffering and\nconcurrency handling logic for the client APIs.\n\nI looked around and could not find a simple and easy to use Go library for this\nproblem, so I created `memlog`: an **easy to use, lightweight (in-memory),\nthread-safe, append-only log** inspired by popular streaming systems with a\n**minimal API** using Go's **standard library** primitives 🤩\n\n💡 For an end-to-end API modernization example using `memlog` see the\n`vsphere-event-streaming`\n[project](https://github.com/embano1/vsphere-event-streaming), which transforms\na SOAP-based events API into an HTTP/REST streaming API.\n\n# Usage\n\n```go\n\tml, _ := memlog.New(ctx) // create log\n\toffset, _ := ml.Write(ctx, []byte(\"Hello World\")) // write some data\n\trecord, _ := ml.Read(ctx, offset) // read back data\n\tfmt.Printf(string(record.Data)) // prints \"Hello World\"\n```\n\nThe `memlog` API is intentionally kept minimal. A new `Log` is constructed with `memlog.New(ctx, options...)`. Data as\n`[]byte` is written to the log with `Log.Write(ctx, data)`.\n\nThe first write to the `Log` using *default* `Options` starts at position\n(`Offset`) `0`. Every write creates an immutable `Record` in the `Log`.\n`Records` are purged from the `Log` when the *history* `segment` is replaced\n(see notes below).\n\nThe *earliest* and *latest* `Offset` available in a `Log` can be retrieved with\n`Log.Range(ctx)`.\n\nA specified `Record` can be read with `Log.Read(ctx, offset)`.\n\n💡 Instead of manually polling the `Log` for new `Records`, the *streaming* API\n`Log.Stream(ctx, startOffset)` should be used.\n\n## (Not) one `Log` to rule them all\n\nOne is not constrained by just creating **one** `Log`. For certain use cases,\ncreating multiple `Logs` might be useful. For example:\n\n- Manage completely different data sets/sizes in the same process\n- Setting different `Log` sizes (i.e. retention times), e.g. premium users will\n  have access to a larger *history* of `Records`\n- Partitioning input data by type or *key*\n\n💡 For use cases where you want to order the log by `key(s)`, consider using the\nspecialised [`sharded.Log`](sharded/README.md).\n\n## Full Example\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\t\"os\"\n\n\t\"github.com/embano1/memlog\"\n)\n\nfunc main() {\n\tctx := context.Background()\n\tl, err := memlog.New(ctx)\n\tif err != nil {\n\t\tfmt.Printf(\"create log: %v\", err)\n\t\tos.Exit(1)\n\t}\n\n\toffset, err := l.Write(ctx, []byte(\"Hello World\"))\n\tif err != nil {\n\t\tfmt.Printf(\"write: %v\", err)\n\t\tos.Exit(1)\n\t}\n\n\tfmt.Printf(\"reading record at offset %d\\n\", offset)\n\trecord, err := l.Read(ctx, offset)\n\tif err != nil {\n\t\tfmt.Printf(\"read: %v\", err)\n\t\tos.Exit(1)\n\t}\n\n\tfmt.Printf(\"data says: %s\", record.Data)\n\n\t// reading record at offset 0\n\t// data says: Hello World\n}\n```\n\n## Purging the `Log`\n\nThe `Log` is divided into an *active* and *history* `segment`. When the *active*\n`segment` is full (configurable via `WithMaxSegmentSize()`), it is *sealed*\n(i.e. read-only) and becomes the *history* `segment`. A new empty *active*\n`segment` is created for writes. If there is an existing *history*, it is\nreplaced, i.e. all `Records` are purged from the *history*.\n\nSee [pkg.go.dev](https://pkg.go.dev/github.com/embano1/memlog) for the API\nreference and examples.\n\n## A stateless Log? You gotta be kidding!\n\nTrue, it sounds like an oxymoron. Why would someone use (build) an *in-memory*\nappend-only log that is not durable?\n\nI'm glad you asked 😀\n\nThis library certainly is not intended to replace messaging, queuing or\nstreaming systems. It was built for use cases where there exists a *durable\ndata/event source*, e.g. a legacy system, REST API, database, etc. that can't\n(or should not) be changed. But the requirement being that the (source) data\nshould be made available over a streaming-like API, e.g. *gRPC* or processed by\na Go application which requires the properties of a `Log`.\n\n`memlog` helps as it allows to bridge between these different APIs and use cases\nas a *building block* to extract and store data `Records` from an external\nsystem into an *in-memory* `Log` (think ordered cache).\n\nThese `Records` can then be internally processed (lightweight ETL) or served\nasynchronously, in-order (`Offset`-based) and concurrently over a *modern\nstreaming API*, e.g. *gRPC* or HTTP/REST (chunked encoding via long polling), to\nremote clients.\n\nAs another example of such an in-memory log-structured design,\n[`DDlog`](https://github.com/vmware/differential-datalog) follows a similar\napproach, where a `DDlog` program is used in conjunction with a persistent\ndatabase, with database records being fed to DDlog as ground facts.\n\n### Checkpointing\n\nGiven the data source needs to be durable in this design, one can optionally\nbuild periodic checkpointing logic using the `Record` `Offset` as the checkpoint\nvalue. \n\n💡 When running in Kubernetes,\n[`kvstore`](https://github.com/knative/pkg/tree/main/kvstore) provides a nice\nabstraction on top of a `ConfigMap` for such requirements. \n\nIf the `memlog` process crashes, it can then resume from the last checkpointed\n`Offset`, load the changes since then from the source and resume streaming. \n\n💡 This approach is quiet similar to the Kubernetes `ListerWatcher()`\n[pattern](https://youtu.be/YIBQrP1grPE?t=1132). See\n[`memlog_test.go`](./memlog_test.go) for some inspiration.\n\n# Benchmark\n\nI haven't done any extensive benchmarking or code optimization. Feel free to\nchime in and provide meaningful feedback/optimizations. \n\nOne could argue, whether using two *slices* (*active* and *history* `data\n[]Record` as part of the individual `segments`) is a good engineering choice,\ne.g. over using a growable slice as an alternative. \n\nThe reason I went for two `segments` was that for me dividing the `Log` into\nmultiple `segments` with fixed *size* (and *capacity*) was easier to reason\nabout in the code (and I followed my intuition from how log-structured data\nplatforms do it). I did not inspect the Go compiler optimizations, e.g. it might\nactually be smart and create one growable slice under the hood. 🤓\n\nThese are some results on my MacBook  using a log size of `1,000` (records),\ni.e. where the `Log` history is constantly purged and new `segments` (*slices*)\nare created.\n\n```console\ngo test -v -run=none -bench=. -cpu 1,2,4,8,16 -benchmem\ngoos: darwin\ngoarch: amd64\npkg: github.com/embano1/memlog\ncpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz\nBenchmarkLog_write\nBenchmarkLog_write              11107804               103.0 ns/op            89 B/op          1 allocs/op\nBenchmarkLog_write-2            11115896               107.1 ns/op            89 B/op          1 allocs/op\nBenchmarkLog_write-4            11419497               105.7 ns/op            89 B/op          1 allocs/op\nBenchmarkLog_write-8            10253677               109.6 ns/op            89 B/op          1 allocs/op\nBenchmarkLog_write-16           10865994               107.7 ns/op            89 B/op          1 allocs/op\nBenchmarkLog_read\nBenchmarkLog_read               24461548                49.49 ns/op           32 B/op          1 allocs/op\nBenchmarkLog_read-2             25002574                46.63 ns/op           32 B/op          1 allocs/op\nBenchmarkLog_read-4             23829378                47.47 ns/op           32 B/op          1 allocs/op\nBenchmarkLog_read-8             22936821                47.47 ns/op           32 B/op          1 allocs/op\nBenchmarkLog_read-16            24121807                48.25 ns/op           32 B/op          1 allocs/op\nPASS\nok      github.com/embano1/memlog       12.541s\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembano1%2Fmemlog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembano1%2Fmemlog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembano1%2Fmemlog/lists"}