{"id":13813499,"url":"https://github.com/sahib/timeq","last_synced_at":"2025-10-12T18:31:39.845Z","repository":{"id":192081626,"uuid":"685893400","full_name":"sahib/timeq","owner":"sahib","description":"A fast file-based priority queue","archived":false,"fork":false,"pushed_at":"2024-04-20T09:42:32.000Z","size":553,"stargazers_count":58,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-26T01:04:01.898Z","etag":null,"topics":["golang","nats","priority-queue","queue"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/sahib/timeq#section-readme","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sahib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-01T08:52:01.000Z","updated_at":"2025-02-10T16:39:33.000Z","dependencies_parsed_at":"2024-01-01T15:23:17.884Z","dependency_job_id":"312d8603-b10d-4cb3-88a1-381c73af3758","html_url":"https://github.com/sahib/timeq","commit_stats":null,"previous_names":["sahib/timeq"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/sahib/timeq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahib%2Ftimeq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahib%2Ftimeq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahib%2Ftimeq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahib%2Ftimeq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sahib","download_url":"https://codeload.github.com/sahib/timeq/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sahib%2Ftimeq/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267621597,"owners_count":24116900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","nats","priority-queue","queue"],"created_at":"2024-08-04T04:01:19.623Z","updated_at":"2025-10-12T18:31:34.791Z","avatar_url":"https://github.com/sahib.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# ``timeq``\n\n[![GoDoc](https://godoc.org/github.com/sahib/timeq?status.svg)](https://godoc.org/github.com/sahib/timeq)\n![Build status](https://github.com/sahib/timeq/actions/workflows/go.yml/badge.svg)\n\nA file-based priority queue in Go.\n\nGenerally speaking, `timeq` can be used to implement these and more:\n\n- A streaming platform like [NATS](https://nats.io) or message brokers similar to [Mosquitto](https://mosquitto.org).\n- A file-backend job queue with different priorities.\n- A telemetry pipeline for IoT devices to buffer offline data.\n- Wherever you would use a regular file-based queue.\n\n## Features\n\n- Clean and well test code base based on Go 1.22\n- High throughput thanks to batch processing and `mmap()`\n- Tiny memory footprint that does not depend on the number of items in the queue.\n- Simple interface with classic `Push()` and `Read()` and only few other functions.\n- Sane default settings, with some knobs that can be tuned for your use case.\n- Consuming end can be efficiently and easily forked into several consumers.\n\nThis implementation should be generally useful, despite the ``time`` in the\nname. However, the initial design had timestamps as priority keys in mind. For\nbest performance the following assumptions were made:\n\n- Your OS supports `mmap()` and `mremap()` (i.e. Linux/FreeBSD)\n- Seeking in files during reading is cheap (i.e. no HDD)\n- The priority key ideally increases without much duplicates (like timestamps, see [FAQ](#FAQ)).\n- You push and pop your data in, ideally, big batches.\n- The underlying storage has a low risk for write errors or bit flips.\n- You trust your data to some random dude's code on the internet (don't we all?).\n\nIf some of those assumptions do not fit your use case and you still managed to make it work,\nI would be happy for some feedback or even pull requests to improve the general usability.\n\nSee the [API documentation here](https://godoc.org/github.com/sahib/timeq) for\nexamples and the actual documentation.\n\n## Use cases\n\nMy primary use case was a embedded Linux device that has different services that generate\na stream of data that needs to be send to the cloud. For this the data was required to be\nin ascending order (sorted by time) and also needed to be buffered with tight memory boundaries.\n\nA previous attempt based on``sqlite3`` did work kinda well but was much slower\nthan it had to be (partly also due to the heavy cost of ``cgo``). This motivated me to\nwrite this queue implementation.\n\n## Usage\n\nTo download the library, just do this in your project:\n\n```bash\n# Use latest or a specific tag as you like\n$ go get github.com/sahib/timeq@latest\n```\n\nWe also ship a rudimentary command-line client that can be used for experiments.\nYou can install it like this:\n\n```bash\n$ go install github.com/sahib/timeq/cmd@latest\n```\n\n## Benchmarks\n\nThe [included benchmark](https://github.com/sahib/timeq/blob/main/bench_test.go#L15) pushes 2000 items with a payload of 40 byte per operation.\n\n```\n$ make bench\ngoos: linux\ngoarch: amd64\npkg: github.com/sahib/timeq\ncpu: 12th Gen Intel(R) Core(TM) i7-1270P\nBenchmarkPopSyncNone-16      35924  33738 ns/op  240 B/op  5 allocs/op\nBenchmarkPopSyncData-16      35286  33938 ns/op  240 B/op  5 allocs/op\nBenchmarkPopSyncIndex-16     34030  34003 ns/op  240 B/op  5 allocs/op\nBenchmarkPopSyncFull-16      35170  33592 ns/op  240 B/op  5 allocs/op\nBenchmarkPushSyncNone-16     20336  56867 ns/op   72 B/op  2 allocs/op\nBenchmarkPushSyncData-16     20630  58613 ns/op   72 B/op  2 allocs/op\nBenchmarkPushSyncIndex-16    20684  58782 ns/op   72 B/op  2 allocs/op\nBenchmarkPushSyncFull-16     19994  59491 ns/op   72 B/op  2 allocs/op\n```\n\n## Multi Consumer\n\n`timeq` supports a `Fork()` operation that splits the consuming end of a queue\nin two halves. You can then consume from each of the halves individually,\nwithout modifying the state of the other one. It's even possible to fork a fork\nagain, resulting  in a consumer hierarchy. This is probably best explained by\nthis diagram:\n\n\u003cimg src=\"docs/forks.svg\" width=\"300\"\u003e\n\n1. The initial state of the queue with 8 items in it,\n2. We fork the queue by calling `Fork(\"foo\")`.\n3. We consume 3 items from the fork via `fork.Pop()`.\n4. Pushing new data will go to all existing forks.\n\nThis is implemented efficiently (see below) by just having duplicated indexes.\nIt opens up some interesting use cases:\n\n- For load-balancing purposes you could have several workers consuming data from `timeq`, each `Pop()`'ing\n  and working on different parts of the queue. Sometimes it would be nice to let workers work on the same\n  set of data (e.g. when they all transform the data in different ways). The latter is easily possibly with forks.\n- Fork the queue and consume from it until some point as experiment and remove the fork afterwards. The original\n  data is not affected by this.\n- Prevent data from data getting lost by keeping a \"deadletter\" fork that keeps track of whatever you want. This way\n  you can implement something like a `max-age` of queue's items.\n\n## Design\n\n* All data is divided into buckets by a user-defined function (»`BucketSplitConf`«).\n* Each bucket is it's own priority queue, responsible for a part of the key space.\n* A push to a bucket writes the batch of data to a memory-mapped log\n  file on disk. The location of the batch is stored in an\n  in-memory index and to a index WAL.\n* On pop we select the bucket with the lowest key first and ask the index to give\n  us the location of the lowest batch. Once done the index is updated to mark the\n  items as popped. The data stays intact in the data log.\n* Once a bucket was completely drained it is removed from disk to retain space.\n\nSince the index is quite small (only one entry per batch) we can easily fit it in memory.\nOn the initial load all bucket indexes are loaded, but no memory is mapped yet.\n\n### Limits\n\n* Each item payload might be at most 64M.\n* Each bucket can be at most 2^63 bytes in size.\n* Using priority keys close to the integer limits is most certainly a bad idea.\n* When a bucket was created with a specific `BucketSplitConf` it cannot be changed later.\n  `timeq` will error out in this case and the queue needs to be migrated.\n  If this turns out as a practical issue we could implement an automated migration path.\n\n### Data Layout\n\nThe data is stored on disk in two files per bucket:\n\n* ``data.log``: Stores a single entry of a batch.\n* ``idx.log``: Stores the key and location of batches. Can be regenerated from ``dat.log``.\n\nThis graphic shows one entry of each:\n\n![Data Layout](docs/data_format.png)\n\nEach bucket lives in its own directory called `K\u003ckey\u003e`.\nExample: If you have two buckets, your data looks like this on this:\n\n```\n/path/to/db/\n├── split.conf\n├── K00000000000000000001\n│   ├── dat.log\n│   ├── idx.log\n│   └── forkx.idx.log\n└── K00000000000000000002\n    ├── dat.log\n    ├── idx.log\n    └── forkx.idx.log\n```\n\nThe actual data is in `dat.log`. This is an append-only log that is\nmemory-mapped by `timeq`. All files that end with `idx.log` are indexes, that\npoint to the currently reachable parts of `dat.log`. Each entry in `idx.log` is\na batch, so the log will only increase marginally if your batches are big\nenough. `forkx.idx.log` (and possibly more files like that) are index forks,\nwhich work the same way as `idx.log`, but track a different state of the respective bucket.\n\nNOTE: Buckets get cleaned up on open or when completely empty (i.e. all forks\nare empty) during consumption. Do not expect that the disk usage automatically\ndecreases whenever you pop something. It does decrease, but in batches.\n\n### Applied Optimizations\n\n* Data is pushed and popped as big batches and the index only tracks batches.\n  This greatly lowers the memory usage, if you use big batches.\n* The API is very friendly towards re-using memory internally. Data is directly\n  sliced from the memory map and given to the user in the read callback. Almost\n  no allocations made during normal operation. If you need the data outside the callback,\n  you have the option to copy it.\n* Division into small, manageable buckets. Only the buckets that are accessed are actually loaded.\n* Both `dat.log` and `idx.log` are append-only, requiring no random seeking for best performance.\n* ``dat.log`` is memory mapped and resized using `mremap()` in big batches. The bigger the log, the bigger the pre-allocation.\n* Sorting into buckets during `Push()` uses binary search for fast sorting.\n* `Shovel()` can move whole bucket directories, if possible.\n* In general, the concept of »Mechanical Sympathy« was applied to some extent to make the code cache friendly.\n\n## FAQ:\n\n### Can timeq be also used with non-time based keys?\n\nThere are no notable places where the key of an item is actually assumed to be\ntimestamp, except for the default `BucketSplitConf` (which can be configured). If you\nfind a good way to sort your data into buckets you should be good to go. Keep\nin mind that timestamps were the idea behind the original design, so your\nmileage may vary - always benchmark your individual usecase. You can modify one\nof the existing benchmarks to test your assumptions.\n\n### Why should I care about buckets?\n\nMost importantly: Only buckets are loaded which are being in use.\nThis allows a very small footprint, especially if the push input is already roughly sorted.\n\nThere are also some other reasons:\n\n* If one bucket becomes corrupt for some reason, you loose only the data in this bucket.\n* On ``Shovel()`` we can cheaply move buckets if they do not exist in the destination.\n* ...and some more optimizations.\n\n### How do I choose the right size of my buckets?\n\nIt depends on a few things. Answer the following questions in a worst case scenario:\n\n- How much memory do you have at hand?\n- How many items would you push to a single bucket?\n- How big is each item?\n- How many buckets should be open at the same time?\n\nAs `timeq` uses `mmap(2)` internally, only the pages that were accessed are\nactually mapped to physical memory. However when pushing a lot of data this is\nmapped to physical memory, as all accessed pages of a bucket stay open (which is\ngood if you Pop immediately after). So you should be fine if this evaluates to true:\n\n`BytesPerItem * ItemsPerBucketInWorstCase * MaxOpenParallelBuckets \u003c BytesMemoryAvailable - WiggleRoom`.\n\nYou can lower the number of open buckets with `MaxOpenParallelBuckets`.\n\nKeep in mind that `timeq` is fast and can be memory-efficient if used correctly,\nbut it's not a magic device. In future I might introduce a feature that does not\nkeep the full bucket mapped if it's only being pushed to. The return-on-invest\nfor such an optimization would be rather small though.\n\n### Can I store more than one value per key?\n\nYes, no problem. The index may store more than one batch per key. There is a\nslight allocation overhead on ``Queue.Push()`` though. Since ``timeq`` was\nmostly optimized for mostly-unique keys (i.e. timestamps) you might see better\nperformance with less duplicates. It should not be very significant though.\n\nIf you want to use priority keys that are in a very narrow range (thus many\nduplicates) then you can think about spreading the range a bit wider.\nFor example: You have priority keys from zero to ten for the tasks in your job\nqueue. Instead of using zero to ten as keys, you can add the job-id to the key\nand shift the priority: ``(prio \u003c\u003c 32) | jobID``.\n\n### How failsafe is ``timeq``?\n\nI use it on a big fleet of embedded devices in the field at\n[GermanBionic](https://germanbionic.com), so it's already quite a bit battle\ntested. Design wise, damaged index files can be regenerated from the data log.\nThere's no error correction code applied in the data log and no checksums are\ncurrently written. If you need this, I'm happy if a PR comes in that enables it\noptionally.\n\nFor durability, the design is build to survive crashes without data loss (Push,\nRead) but, in some cases, it might result in duplicated data (Shovel). My\nrecommendation is **designing your application logic in a way that allows\nduplicate items to be handled gracefully**.\n\nThis assumes a filesystem with full journaling (``data=journal`` for ext4) or\nsome other filesystem that gives your similar guarantees. We do properly call\n`msync()` and `fsync()` in the relevant cases. For now, crash safety was not\nyet tested a lot though. Help here is welcome.\n\nThe test suite is currently roughly as big as the codebase. The best protection\nagainst bugs is a small code base, so that's not too impressive yet. We're of\ncourse working on improving the testsuite, which is a never ending task.\nAdditionally we have a bunch of benchmarks and fuzzing tests.\n\n### Is `timeq` safely usable from several go-routines?\n\nYes. There is no real speed benefit from doing so though currently,\nas the current locking strategy prohibits parallel pushes and reads.\nFuture releases might improve on this.\n\n## License\n\nSource code is available under the MIT [License](/LICENSE).\n\n## Contact\n\nChris Pahl [@sahib](https://github.com/sahib)\n\n## TODO List\n\n- [ ] Test crash safety in automated way.\n- [ ] Check for integer overflows.\n- [ ] Have locking strategy that allows more parallelism.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsahib%2Ftimeq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsahib%2Ftimeq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsahib%2Ftimeq/lists"}