{"id":13564206,"url":"https://github.com/muyo/sno","last_synced_at":"2026-01-11T03:36:47.823Z","repository":{"id":57484782,"uuid":"188736069","full_name":"muyo/sno","owner":"muyo","description":"Compact, sortable and fast unique IDs with embedded metadata.","archived":false,"fork":false,"pushed_at":"2021-11-12T01:59:41.000Z","size":242,"stargazers_count":91,"open_issues_count":1,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-04T17:47:13.449Z","etag":null,"topics":["id-generator","snowflake","unique-id","uuid"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/muyo/sno?tab=doc","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/muyo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-05-26T22:05:26.000Z","updated_at":"2024-09-26T02:13:30.000Z","dependencies_parsed_at":"2022-08-26T11:10:33.914Z","dependency_job_id":null,"html_url":"https://github.com/muyo/sno","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muyo%2Fsno","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muyo%2Fsno/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muyo%2Fsno/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/muyo%2Fsno/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/muyo","download_url":"https://codeload.github.com/muyo/sno/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247082838,"owners_count":20880727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["id-generator","snowflake","unique-id","uuid"],"created_at":"2024-08-01T13:01:27.986Z","updated_at":"2026-01-11T03:36:47.765Z","avatar_url":"https://github.com/muyo.png","language":"Go","readme":"\u003cimg src=\"./.github/logo_200x200.png\" alt=\"sno logo\" title=\"sno\" align=\"left\" height=\"200\" /\u003e\n\nA spec for **unique IDs in distributed systems** based on the Snowflake design, i.e. a coordination-based ID variant. \nIt aims to be friendly to both machines and humans, compact, *versatile* and fast.\n\nThis repository contains a **Go** package for generating such IDs. \n\n[![GoDoc](https://img.shields.io/badge/doc-reference-00a1fe.svg?style=flat-square)](https://pkg.go.dev/github.com/muyo/sno?tab=doc) \n[![Stable version](https://img.shields.io/github/v/release/muyo/sno?color=00a1fe\u0026label=stable\u0026sort=semver\u0026style=flat-square)](https://github.com/muyo/sno/releases) \n[![Travis build: master](https://img.shields.io/travis/muyo/sno/master.svg?logo=travis\u0026label=ci\u0026style=flat-square)](https://travis-ci.com/muyo/sno) \n[![Coverage](https://img.shields.io/codecov/c/github/muyo/sno.svg?logo=codecov\u0026logoColor=ffffff\u0026style=flat-square)](https://codecov.io/gh/muyo/sno)\n[![Go Report Card](https://goreportcard.com/badge/github.com/muyo/sno?style=flat-square)](https://goreportcard.com/report/github.com/muyo/sno) \n[![License](http://img.shields.io/badge/license-MIT-00a1fe.svg?style=flat-square)](https://raw.githubusercontent.com/muyo/sno/master/LICENSE) \n```bash\ngo get -u github.com/muyo/sno\n```\n\n### Features\n\n- **Compact** - **10 bytes** in its binary representation, canonically [encoded](#encoding) as **16 characters**.\n  \u003cbr /\u003eURL-safe and non-ambiguous encoding which also happens to be at the binary length of UUIDs - \n  **sno**s can be stored as UUIDs in your database of choice.\n- **K-sortable** in either representation.\n- **[Embedded timestamp](#time-and-sequence)** with a **4msec resolution**, bounded within the years **2010 - 2079**. \n  \u003cbr /\u003eHandles clock drifts gracefully, without waiting.\n- **[Embedded byte](#metabyte)** for arbitrary data.\n- **[Simple data layout](#layout)** - straightforward to inspect or encode/decode.\n- **[Optional and flexible](#usage)** configuration and coordination.\n- **[Fast](./benchmark#results)**, wait-free, safe for concurrent use.\n\u003cbr /\u003eClocks in at about 500 LoC, has no external dependencies and minimal dependencies on std.\n- ‭A pool of **≥ 16,384,000** IDs per second.\n\u003cbr /\u003e 65,536 guaranteed unique IDs per 4msec per partition (65,536 combinations) per metabyte \n(256 combinations) per tick-tock (1 bit adjustment for clock drifts). \n**549,755,813,888,000** is the global pool **per second** when all components are taken into account.\n\n### Non-features / cons\n\n- True randomness. **sno**s embed a counter and have **no entropy**. They are not suitable in a context where \nunpredictability of IDs is a must. They still, however, meet the common requirement of keeping internal counts \n(e.g. total number of entitites) unguessable and appear obfuscated;\n- Time precision. While *good enough* for many use cases, not quite there for others. The ➜ [Metabyte](#metabyte)\n  can be used to get around this limitation, however.\n- It's 10 bytes, not 8. This is suboptimal as far as memory alignment is considered (platform dependent).\n\n\n\u003cbr /\u003e\n\n## Usage (➜ [API](https://pkg.go.dev/github.com/muyo/sno?tab=doc))\n\n**sno** comes with a package-level generator on top of letting you configure your own generators. \n\nGenerating a new ID using the defaults takes no more than importing the package and:\n\n```go\nid := sno.New(0)\n```\n\nWhere `0` is the ➜ [Metabyte](#metabyte).\u003cbr /\u003e\n\nThe global generator is immutable and private. It's therefore also not possible to restore it using a Snapshot. \nIts Partition is based on time and changes across restarts.\n\n### Partitions (➜ [doc](https://pkg.go.dev/github.com/muyo/sno?tab=doc#Partition))\n\nAs soon as you run more than 1 generator, you **should** start coordinating the creation of Generators to \nactually *guarantee* a collision-free ride. This applies to all specs of the Snowflake variant.\n\nPartitions are one of several friends you have to get you those guarantees. A Partition is 2 bytes. \nWhat they mean and how you define them is up to you.\n\n```go\ngenerator, err := sno.NewGenerator(\u0026sno.GeneratorSnapshot{\n\tPartition: sno.Partition{'A', 10}\n}, nil)\n```\n\nMultiple generators can share a partition by dividing the sequence pool between \nthem (➜ [Sequence sharding](#sequence-sharding)).\n\n### Snapshots (➜ [doc](https://pkg.go.dev/github.com/muyo/sno?tab=doc#GeneratorSnapshot))\n\nSnapshots happen to serve both as configuration and a means of saving and restoring generator data. They are \noptional - simply pass `nil` to `NewGenerator()`, to get a Generator with sane defaults and a unique (in-process)\nPartition.\n\nSnapshots can be taken at runtime:\n\n```go\ns := generator.Snapshot()\n```\n\nThis exposes most of a Generator's internal bookkeeping data. In an ideal world where programmers are not lazy \nuntil their system runs into an edge case - you'd persist that snapshot across restarts and restore generators \ninstead of just creating them from scratch each time. This will keep you safe both if a large clock drift happens \nduring the restart -- or before, and you just happen to come back online again \"in the past\", relative to IDs that \nhad already been generated.\n\nA snapshot is a sample in time - it will very quickly get stale. Only take snapshots meant for restoring them \nlater when generators are already offline - or for metrics purposes when online.\n\n\n\u003cbr /\u003e\n\n## Layout\n\nA **sno** is simply 80-bits comprised of two 40-bit blocks: the **timestamp** and the **payload**. The bytes are \nstored in **big-endian** order in all representations to retain their sortable property.\n![Layout](./.github/layout.png)\nBoth blocks can be inspected and mutated independently in either representation. Bits of the components in the binary \nrepresentation don't spill over into other bytes which means no additional bit twiddling voodoo is necessary* to extract \nthem.\n\n\\*The tick-tock bit in the timestamp is the only exception (➜ [Time and sequence](#time-and-sequence)).\n\n\u003cbr /\u003e\n\n## Time and sequence\n\n### Time\n\n**sno**s embed a timestamp comprised of 39 bits with the epoch **milliseconds at a 4msec resolution** (floored, \nunsigned) and one bit, the LSB of the entire block - for the tick-tock toggle.\n\n### Epoch\n\nThe **epoch is custom** and **constant**. It is bounded within `2010-01-01 00:00:00 UTC` and \n`2079-09-07 15:47:35.548 UTC`. The lower bound is `1262304000` seconds relative to Unix. \n\nIf you *really* have to break out of the epoch - or want to store higher precision - the metabyte is your friend.\n\n### Precision\n\nHigher precision *is not necessarily* a good thing. Think in dataset and sorting terms, or in sampling rates. You \nwant to grab all requests with an error code of `403` in a given second, where the code may be encoded in the metabyte. \nAt a resolution of 1 second, you binary search for just one index and then proceed straight up linearly. \nThat's simple enough.\n\nAt a resolution of 1msec however, you now need to find the corresponding 1000 potential starting offsets because \nyour `403` requests are interleaved with the `200` requests (potentially). At 4msec, this is 250 steps.\n\nEverything has tradeoffs. This was a compromise between precision, size, simple data layout -- and factors like that above.\n\n### Sequence\n\n**sno**s embed a sequence (2 bytes) that is **relative to time**. It does not overflow and resets on each new time \nunit (4msec). A higher sequence within a given timeframe **does not necessarily indicate order of creation**. \nIt is not advertised as monotonic because its monotonicity is dependent on usage. A single generator writing \nto a single partition, *ceteris paribus*, *will* result in monotonic increments and *will* represent order of creation. \n\nWith multiple writers in the same partition, increment order is *undefined*. If the generator moves back in time, \nthe order will still be monotonic but sorted either 2msec after or before IDs previously already written at that \ntime (see tick-tock).\n\n#### Sequence sharding\n\nThe sequence pool has a range of `[0..65535]` (inclusive). **sno** supports partition sharing out of the box \nby further sharding the sequence - that is multiple writers (generators) in the same partition.\n\nThis is done by dividing the pool between all writers, via user-specified bounds.\n\nA generator will reset to its lower bound on each new time unit - and will never overflow its upper bound. \nCollisions are therefore guaranteed impossible unless misconfigured and they overlap with another \n*currently online* generator. \n\n\n\u003cdetails\u003e\n\u003csummary\u003eStar Trek: Voyager mode, \u003cb\u003eHow to shard sequences\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\n\nThis can be useful when multiple containers on one physical machine are to write as a cluster to a partition \ndefined by the machine's ID (or simpler - multiple processes on one host). Or if multiple remote \nservices across the globe were to do that.\n\n```go\nvar PeoplePartition = sno.Partition{'P', 0}\n\n// In process/container/remote host #1\ngenerator1, err := sno.NewGenerator(\u0026sno.GeneratorSnapshot{\n\tPartition: PeoplePartition,\n\tSequenceMin: 0,\n\tSequenceMax: 32767 // 32768 - 1\n}, nil)\n\n// In process/container/remote host #2\ngenerator2, err := sno.NewGenerator(\u0026sno.GeneratorSnapshot{\n\tPartition: PeoplePartition,\n\tSequenceMin: 32768,\n\tSequenceMax: 65535 // 65536 - 1\n}, nil)\n```\n\nYou will notice that we have simply divided our total pool of 65,536 into 2 even and **non-overlapping** \nsectors. In the first snapshot `SequenceMin` could be omitted - and `SequenceMax` in the second, as those are the \ndefaults used when they are not defined. You will get an error when trying to set limits above the capacity of \ngenerators, but since the library is oblivious to your setup - it cannot warn you about overlaps and cannot \nresize on its own either. \n\nThe pools can be defined arbitrarily - as long as you make sure they don't overlap across *currently online* \ngenerators. \n\nIt is safe for a range previously used by another generator to be assigned to a different generator under the\nfollowing conditions:\n- it happens in a different timeframe *in the future*, i.e. no sooner than after 4msec have passed (no orchestrator \n  is fast enough to get a new container online to replace a dead one for this to be a worry);\n- if you can guarantee the new Generator won't regress into a time the previous Generator was running in.\n\nIf you create the new Generator using a Snapshot of the former as it went offline, you do not need to worry about those\nconditions and can resume writing to the same range immediately - the obvious tradeoff being the need to coordinate \nthe exchange of Snapshots.\n\nIf your clusters are always fixed size - reserving ranges is straightforward. With dynamic sizes, a potential simple \nscheme is to reserve the lower byte of the partition for scaling. Divide your sequence pool by, say, 8, keep \nassigning higher ranges until you hit your divider. When you do, increment partition by 1, start assigning \nranges from scratch. This gave us 2048 identifiable origins by using just one byte of the partition.\n\nThat said, the partition pool available is large enough that the likelihood you'll ever *need* \nthis is slim to none. Suffice to know you *can* if you want to. \n\nBesides for guaranteeing a collision-free ride, this approach can also be used to attach more semantic meaning to \npartitions themselves, them being placed higher in the sort order. \nIn other words - with it, the origin of an ID can be determined by inspecting the sequence \nalone, which frees up the partition for another meaning.\n\nHow about...\n\n```go\nvar requestIDGenerator, _ = sno.NewGenerator(\u0026GeneratorSnapshot{\n    SequenceMax: 32767,\n}, nil)\n\ntype Service byte\ntype Call byte\n\nconst (\n    UsersSvc   Service = 1\n    UserList   Call    = 1\n    UserCreate Call    = 2\n    UserDelete Call    = 3\n)\n\nfunc genRequestID(svc Service, methodID Call) sno.ID {\n    id := requestIDGenerator.New(byte(svc))\n    // Overwrites the upper byte of the fixed partition. \n    // In our case - we didn't define it but gave a non-nil snapshot, so it is {0, 0}.\n    id[6] = byte(methodID)\n\n    return id\n}\n```\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n#### Sequence overflow\n\nRemember that limiting the sequence pool also limits max throughput of each generator. For an explanation on what \nhappens when you're running at or over capacity, see the details below or take a look at ➜ [Benchmarks](#benchmarks) \nwhich explains the numbers involved.\n\n\u003cdetails\u003e\n\u003csummary\u003eStar Trek: Voyager mode, \u003cb\u003eBehaviour on sequence overflow\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\n\nThe sequence never overflows and the generator is designed with a single-return `New()` method that does not return \nerrors nor invalid IDs. *Realistically* the default generator will never overflow simply because you won't saturate \nthe capacity.\n\nBut since you can set bounds yourself, the capacity could shrink to `4` per 4msec (smallest allowed). \nNow that's more likely. So when you start overflowing, the generator will *stall* and *pray* for a \nreduction in throughput sometime in the near future. \n\nFrom **sno**'s persective requesting more IDs than it can safely give you **immediately** is not an error - but \nit *may* require correcting on *your end*. And you should know about that. Therefore, if \nyou want to know when it happens - simply give **sno** a channel along with its configuration snapshot.\n\nWhen a thread requests an ID and gets stalled, **once** per time unit, you will get a `SequenceOverflowNotification` \non that channel.\n\n```go\ntype SequenceOverflowNotification struct {\n    Now   time.Time // Time of tick.\n    Count uint32    // Number of currently overflowing generation calls.\n    Ticks uint32    // For how many ticks in total we've already been dealing with the *current* overflow.\n}\n```\nKeep track of the counter. If it keeps increasing, you're no longer bursting - you're simply over capacity \nand *eventually* need to slow down or you'll *eventually* starve your system. The `Ticks` count lets you estimate\nhow long the generator has already been overflowing without keeping track of time yourself. A tick is *roughly* 1ms.\n\nThe order of generation when stalling occurs is `undefined`. It is not a FIFO queue, it's a race. Previously stalled \ngoroutines get woken up alongside inflight goroutines which have not yet been stalled, where the order of the former is \nhandled by the runtime. A livelock is therefore possible if demand doesn't decrease. This behaviour *may* change and \ninflight goroutines *may* get thrown onto the stalling wait list if one is up and running, but this requires careful \ninspection. And since this is considered an unrealistic scenario which can be avoided with simple configuration, \nit's not a priority.\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n#### Clock drift and the tick-tock toggle\n\nJust like all other specs that rely on clock times to resolve ambiguity, **sno**s are prone to clock drifts. But\nunlike all those others specs, **sno** adjusts itself to the new time - instead of waiting (blocking), it tick-tocks.\n\n**The tl;dr** applying to any system, really: ensure your deployments use properly synchronized system clocks \n(via NTP) to mitigate the *size* of drifts. Ideally, use a NTP server pool that applies \na gradual [smear for leap seconds](https://developers.google.com/time/smear). Despite the original Snowflake spec \nsuggesting otherwise, using NTP in slew mode (to avoid regressions entirely) \n[is not always a good idea](https://www.redhat.com/en/blog/avoiding-clock-drift-vms).\n\nAlso remember that containers tend to get *paused* meaning their clocks are paused with them.\n\nAs far as **sno**, collisions and performance are concerned, in typical scenarios you can enjoy a wait-free ride  \nwithout requiring slew mode nor having to worry about even large drifts.\n\n\u003cdetails\u003e\n\u003csummary\u003eStar Trek: Voyager mode, \u003cb\u003eHow tick-tocking works\u003c/b\u003e\u003c/summary\u003e\n\u003cp\u003e\n\n**sno** attempts to eliminate the issue *entirely* - both despite and because of its small pool of bits to work with.\n\nThe approach it takes is simple - each generator keeps track of the highest wall clock time it got from the OS\\*, \neach time it generates a new timestamp. If we get a time that is lower than the one we recorded, i.e. the clock \ndrifted backwards and we'd risk generating colliding IDs, we toggle a bit - stored from here on out in \neach **sno** generated *until the next regression*. Rinse, repeat - tick, tock.\n\n(\\*IDs created with a user-defined time are exempt from this mechanism as their time is arbitrary. The means \nto *bring your own time* are provided to make porting old IDs simpler and is assumed to be done before an ID \nscheme goes online)\n\nIn practice this means that we switch back and forth between two alternating timelines. Remember how the pool \nwe've got is 16,384,000 IDs per second? When we tick or tock, we simply jump between two pools with the same \ncapacity.\n\nWhy not simply use that bit to store a higher resolution time fraction? True, we'd get twice the pool which \nseemingly boils down to the same - except it doesn't. That is due to how the sequence increases. Even if you \nhad a throughput of merely 1 ID per hour, while the chance would be astronomically low - if the clock drifted \nback that whole hour, you *could* get a collision. The higher your throughput, the bigger the probability. \nID's of the Snowflake variant, **sno** being one of them, are about **guarantees - not probabilities**. \nSo this is a **sno-go**.\n\n(I will show myself out...)\n\nThe simplistic approach of tick-tocking *entirely eliminates* that collision chance - but with a rigorous assumption: \nregressions happen at most once into a specific period, i.e. from the highest recorded time into the past \nand never back into that particular timeframe (let alone even further into the past). \n\nThis *generally* is exactly the case but oddities as far as time synchronization, bad clocks and NTP client \nbehaviour goes *do* happen. And in distributed systems, every edge case that can happen - *will* happen. What do?\n\n##### How others do it\n\n- [Sonyflake] goes to sleep until back at the wall clock time it was already at \npreviously. All goroutines attempting to generate are blocked.\n- [snowflake] hammers the OS with syscalls to get the current time until back \nat the time it was already at previously. All goroutines attempting to generate are blocked.\n- [xid] goes ¯\\\\_(ツ)_/¯ and does not tackle drifts at all.\n- Entropy-based specs (like UUID or KSUID) don't really need to care as they are generally not prone, even to \nextreme drifts - you run with a risk all the time.\n\nThe approach one library took was to keep generating, but timestamp all IDs with the highest time recorded instead. \nThis worked, because it had a large entropy pool to work with, for one (so a potential large spike in IDs generated \nin the same timeframe wasn't much of a consideration). **sno** has none. But more importantly - it disagrees on the \nreasoning about time and clocks. If we moved backwards, it means that an *adjustment* happened and we are *now* \ncloser to the *correct* time from the perspective of a wider system.\n\n**sno** therefore keeps generating without waiting, using the time as reported by the system - in the \"past\" so to \nspeak, but with the tick-tock bit toggled.\n\n*If* another regression happens, into that timeframe or even further back, *only then* do we tell all contenders \nto wait. We get a wait-free fast path *most of the time* - and safety if things go southways.\n\n##### Tick-tocking obviously affects the sort order as it changes the timestamp\n\nEven though the toggle is *not* part of the milliseconds, you can think of it as if it were. Toggling is then like \nmoving two milliseconds back and forth, but since our milliseconds are floored to increments of 4msec, we never \nhit the range of a previous timeframe. Alternating timelines are as such sorted *as if* they were 2msec apart from \neach other, but as far as the actual stored time is considered - they are timestamped at exactly the same millisecond. \n\nThey won't sort in an interleaved fashion, but will be *right next* to the other timeline. Technically they *were* \ncreated at a different time, so being able to make that distinction is considered a plus by the author.\n\n\u003c/p\u003e\n\u003c/details\u003e\n\u003cbr /\u003e\u003cbr /\u003e\n\n## Metabyte\n\nThe **metabyte** is unique to **sno** across the specs the author researched, but the concept of embedding metadata \nin IDs is an ancient one. It's effectively just a *byte-of-whatever-you-want-it-to-be* - but perhaps \n*8-bits-of-whatever-you-want-them-to-be* does a better job of explaining its versatility.\n\n### `0` is a valid metabyte\n\n**sno** is agnostic as to what that byte represents and it is **optional**. None of the properties of **sno**s\nget violated if you simply pass a `0`.\n\nHowever, if you can't find use for it, then you may be better served using a different ID spec/library \naltogether (➜ [Alternatives](#alternatives)). You'd be wasting a byte that could give you benefits elsewhere.\n\n### Why?\n\nMany databases, especially embedded ones, are extremely efficient when all you need is the keys - not all \nthe data all those keys represent. None of the Snowflake-like specs would provide a means to do that without \nexcessive overrides (or too small a pool to work with), essentially a different format altogether, and so - **sno**.\n\n\u003cdetails\u003e\n\u003csummary\u003e\nAnd simple constants tend to do the trick.\n\u003c/summary\u003e\n\u003cp\u003e\n\nUntyped integers can pass as `uint8` (i.e. `byte`) in Go, so the following would work and keep things tidy:\n\n```go\nconst (\n\tPersonType = iota\n\tOtherType\n)\n\ntype Person struct {\n\tID   sno.ID\n\tName string\n}\n\nperson := Person{\n\tID:    sno.New(PersonType),\n\tName: \"A Special Snöflinga\",\n}\n```\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cbr /\u003e\n\n*Information that describes something* has the nice property of also helping to *identify* something across a sea \nof possibilities. It's a natural fit. \n\nDo everyone a favor, though, and **don't embed confidential information**. It will stop being confidential and \nbecome public knowledge the moment you do that. Let's stick to *nice* property, avoiding `PEBKAC`.\n\n### Sort order and placement\n\nThe metabyte follows the timestamp. This clusters IDs by the timestamp and then by the metabyte (for example - \nthe type of the entity), *before* the fixed partition.\n\nIf you were to use machine-ID based partitions across a cluster generating, say, `Person` entities, where `Person` \ncorresponds to a metabyte of `1` - this has the neat property of grouping all `People` generated across the entirety \nof your system in the given timeframe in a sortable manner. In database terms, you *could* think of the metabyte as \nidentifying a table that is sharded across many partitions - or as part of a compound key. But that's just one of \nmany ways it can be utilized.\n\nPlacement at the beginning of the second block allows the metabyte to potentially both extend the timestamp \nblock or provide additional semantics to the payload block. Even if you always leave it empty, sort \norder nor sort/insert performance won't be hampered.\n\n### But it's just a single byte!\n\nA single byte is plenty. \n\n\u003cdetails\u003e\n\u003csummary\u003eHere's a few \u003cem\u003eideas for things you did not know you wanted, yet\u003c/em\u003e.\u003c/summary\u003e\n\u003cp\u003e\n\n- IDs for requests in a HTTP context: 1 byte is enough to contain one of all possible standard HTTP status codes. \n*Et voila*, you now got all requests that resulted in an error nicely sorted and clustered.\n\u003cbr /\u003eLimit yourself to the non-exotic status codes and you can store the HTTP verb along with the status code. \nIn that single byte. Suddenly even the partition (if it's tied to a machine/cluster) gains relevant semantics, \nas you've gained a timeseries of requests that started fail-cascading in the cluster. Constrain yourself even \nfurther to just one bit for `OK` or `ERROR` and you made room to also store information about the operation that \nwas requested (think resource endpoint).\n\n- How about storing a (immutable) bitmask along with the ID? Save some 7 bytes of bools by doing so and have the \nflags readily available during an efficient sequential key traversal using your storage engine of choice.\n\n- Want to version-control a `Message`? Limit yourself to at most 256 versions and it becomes trivial. Take the ID \nof the last version created, increment its metabyte - and that's it. What you now have is effectively a simplistic \nversioning schema, where the IDs of all possible versions can be inferred without lookups, joins, indices and whatnot. \nAnd many databases will just store them *close* to each other. Locality is a thing.\n\u003cbr /\u003eHow? The only part that changed was the metabyte. All other components remained the same, but we ended up with \na new ID pointing to the most recent version. Admittedly the timestamp lost its default semantics of \n*moment of creation* and instead is *moment of creation of first version*, but you'd store a `revisedAt` timestamp \nanyways, wouldn't you?\u003cbr /\u003eAnd if you *really* wanted to support more versions - the IDs have certain properties \nthat can be (ab)used for this. Increment this, decrement that...\n\n- Sometimes a single byte is all the data that you actually need to store, along with the time \n*when something happened*. Batch processing succeeded? `sno.New(0)`, done. Failed? `sno.New(1)`, done. You now \nhave a uniquely identifiable event, know *when* and *where* it happened, what the outcome was - and you still \nhad 7 spare bits (for higher precision time, maybe?)\n\n- Polymorphism has already been covered. Consider not just data storage, but also things like (un)marshaling \npolymorphic types efficiently. Take a JSON of `{id: \"aaaaaaaa55aaaaaa\", foo: \"bar\", baz: \"bar\"}`. \nThe 8-th and 9-th (0-indexed) characters of the ID contain the encoded bits of the metabyte. Decode that \n(use one of the utilities provided by the library) and you now know what internal type the data should unmarshal \nto without first unmarshaling into an intermediary structure (nor rolling out a custom decoder for this type). \nThere are many approaches to tackle this - an ID just happens to lend itself naturally to solve it and is easily \nportable.\n\n- 2 bytes for partitions not enough for your needs? Use a fixed byte as the metabyte -- you have extended the \nfixed partition to 3 bytes. Wrap a generator with a custom one to apply that metabyte for you each time you use it. \nThe metabyte is, after all, part of the partition. It's just separated out for semantic purposes but its actual \nsemantics are left to you.\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cbr /\u003e\n\n## Encoding\n\nThe encoding is a **custom base32** variant stemming from base32hex. Let's *not* call it *sno32*. \nA canonically encoded **sno** is a regexp of `[2-9a-x]{16}`.\n\nThe following alphabet is used:\n\n```\n23456789abcdefghijklmnopqrstuvwx\n```\n\nThis is 2 contiguous ASCII ranges: `50..57` (digits) and `97..120` (*strictly* lowercase letters).\n\nOn `amd64` encoding/decoding is vectorized and **[extremely fast](./benchmark#encodingdecoding)**.\n\n\u003cbr /\u003e\n\n## Alternatives\n\n| Name        | Binary (bytes) | Encoded (chars)* | Sortable  | Random**  | Metadata | nsec/ID\n|------------:|:--------------:|:----------------:|:---------:|:---------:|:--------:|--------:\n| [UUID]      |       16       |        36        |   ![no]   | ![yes]    | ![no]    | ≥36.3\n| [KSUID]     |       20       |        27        |   ![yes]  | ![yes]    | ![no]    | 206.0\n| [ULID]      |       16       |        26        |   ![yes]  | ![yes]    | ![no]    | ≥50.3\n| [Sandflake] |       16       |        26        |   ![yes]  | ![meh]    | ![no]    | 224.0\n| [cuid]      |     ![no]      |        25        |   ![yes]  | ![meh]    | ![no]    | 342.0\n| [xid]       |       12       |        20        |   ![yes]  | ![no]     | ![no]    |  19.4\n| **sno**     |       10       |      **16**      |   ![yes]  | ![no]     | ![yes]   | **8.8**\n| [Snowflake] |      **8**     |       ≤20        |   ![yes]  | ![no]     | ![no]    |  28.9\n\n\n[UUID]: https://github.com/gofrs/uuid\n[KSUID]: https://github.com/segmentio/ksuid\n[cuid]: https://github.com/lucsky/cuid\n[Snowflake]: https://github.com/bwmarrin/snowflake\n[Sonyflake]: https://github.com/sony/sonyflake\n[Sandflake]: https://github.com/celrenheit/sandflake\n[ULID]: https://github.com/oklog/ulid\n[xid]: https://github.com/rs/xid\n\n[yes]: ./.github/ico-yes.svg\n[meh]: ./.github/ico-meh.svg\n[no]:  ./.github/ico-no.svg\n\n\\*  Using canonical encoding.\u003cbr /\u003e\n\\** When used with a proper CSPRNG. The more important aspect is the distinction between entropy-based and \ncoordination-based IDs. [Sandflake] and [cuid] do contain entropy, but not sufficient to rely on entropy\nalone to avoid collisions (3 bytes and 4 bytes respectively).\u003cbr /\u003e\n\nFor performance results see ➜ [Benchmark](./benchmark). `≥` values given for libraries which provide more\nthan one variant, whereas the fastest one is listed.\n\n\n\u003cbr /\u003e\u003cbr /\u003e\n\n## Attributions\n\n**sno** is both based on and inspired by [xid] - more so than by the original Snowflake - but the changes it \nintroduces are unfortunately incompatible with xid's spec.\n\n## Further reading\n\n- [Original Snowflake implementation](https://github.com/twitter-archive/snowflake/tree/snowflake-2010) and \n  [related post](https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake.html)\n- [Mongo ObjectIds](https://docs.mongodb.com/manual/reference/method/ObjectId/)\n- [Instagram: Sharding \u0026 IDs at Instagram](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c)\n- [Flickr: Ticket Servers: Distributed Unique Primary Keys on the Cheap](http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/)\n- [Segment: A brief history of the UUID](https://segment.com/blog/a-brief-history-of-the-uuid/) - about KSUID and the shortcomings of UUIDs.\n- [Farfetch: Unique integer generation in distributed systems](https://www.farfetchtechblog.com/en/blog/post/unique-integer-generation-in-distributed-systems) - uint32 utilizing Cassandra to coordinate.\n\nAlso potentially of interest:\n- [Lamport timestamps](https://en.wikipedia.org/wiki/Lamport_timestamps) (vector/logical clocks)\n- [The Bloom Clock](https://arxiv.org/pdf/1905.13064.pdf) by Lum Ramabaja\n","funding_links":[],"categories":["UUID","Utility","UUID`UUID 生成和操作库`","Go"],"sub_categories":["Utility/Miscellaneous","查询语","Fail injection","HTTP Clients","实用程序/Miscellaneous"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuyo%2Fsno","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmuyo%2Fsno","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmuyo%2Fsno/lists"}