{"id":19007972,"url":"https://github.com/jbee/lusid","last_synced_at":"2025-04-22T19:25:46.673Z","repository":{"id":223153594,"uuid":"759444590","full_name":"jbee/lusid","owner":"jbee","description":"Locally Unique Short Identifiers","archived":false,"fork":false,"pushed_at":"2024-02-23T16:00:02.000Z","size":94,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T18:11:17.580Z","etag":null,"topics":["ids","unique-id","unique-id-generator","unique-identifier"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jbee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-18T16:04:21.000Z","updated_at":"2024-04-11T07:42:17.000Z","dependencies_parsed_at":"2024-02-23T12:26:59.766Z","dependency_job_id":"84956e36-2dd4-494b-a75e-a4789913cafe","html_url":"https://github.com/jbee/lusid","commit_stats":null,"previous_names":["jbee/lusid"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbee%2Flusid","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbee%2Flusid/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbee%2Flusid/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jbee%2Flusid/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jbee","download_url":"https://codeload.github.com/jbee/lusid/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249297739,"owners_count":21246468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ids","unique-id","unique-id-generator","unique-identifier"],"created_at":"2024-11-08T18:40:00.978Z","updated_at":"2025-04-17T01:31:34.748Z","avatar_url":"https://github.com/jbee.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Locally Unique Short Identifiers\n\nThis is a small library that allows to generate unique string IDs from numbers. \nThese IDs are short, unique, URL-safe, and very unlikely to include words.\nEach number corresponds to a unique string which can be decoded back into the original number.\n\nHere _Locally_ refers to the fact that the encoded ID is unique \nfor a given input number in combination with the secret and mode (character mapping) used.\n\n## Usage\nEncoding and decoding is done via a `Coder` instance\n\n```java\nCoder coder = Coder.of(6, Mode.MIXED); // 6 characters, upper and lower case\nString id = coder.encodeLong(42L);     // = \"lR7wZ8\" (depends on secret)\nlong value = coder.decodeLong(id);     // = 42\n```\n\nA `Coder` has three configuration properties\n* a minimum target length (1-20)\n* a `Mode` (a configuration of the used characters)\n* a 64 bit secret\n\nThe secret is either passed to directly to `Coder.of` or, when omitted (like above), \nit is loaded from the `lusid.secret` system property or environment variable.\nThe name of an alternative property can also be passed as argument.\n\n## Properties\n* 🆔 Generates unique IDs for any `long`, `int`, `double`, `float` number(s)\n* 🆔 Generates unique IDs for `enum` names (upper case letters + `_`)\n* ⚡ Low overhead, algorithm is fast and mostly allocation free\n* 🤬 Very unlikely to result in an actual (3+ letter) word (of any language) \n* 🔧 Easy to create custom modes for specific target patterns\n* 📏 integer numbers require less or as many characters as the number written in decimal\n* ⛔ High likelihood of identifying ID input errors (typos)\n* 📢 The character mapping (mode) can be public without compromising security\n* 🛡️ No amount of encoded IDs will help to disclose the original numbers or secret \n \n\n## API \nThe `Coder` API methods to encode and decode different types of values:\n```java\nCoder coder = Coder.of(8); // minimum 8 characters mixed case\n\n// single numbers\nlong lvalue = coder.decodeLong(coder.encodeLong(42L));        // = 42\nint ivalue = coder.decodeInt(coder.encodeInt(13));            // = 13\ndouble dvalue = coder.decodeDouble(coder.encodeDouble(0.5d)); // = 0.5\nfloat fvalue = coder.decodeFloat(coder.encodeFloat(33.3f));   // = 33.3\n\n// multiple numbers\nlong[] lvalues = coder.decodeLongs(coder.encodeLongs(1L,2L));              // = [1,2]\nint[] ivalues = coder.decodeInts(coder.encodeInts(3,6,9));                 // = [3,6,9]\ndouble[] dvalues = coder.decodeDoubles(coder.encodeDoubles(0.5d,55.789d)); // = [0.5,55.789]\n\n// (enum) names (upper letters and _ only)\nString name = coder.decodeName(coder.encodeName(\"RUNTIME\")); // = \"RUNTIME\" \n\n// any text\nString text = coder.decodeText(coder.encodeText(\"🥳\"));      // = \"🥳\"\n```\n\nWhen constructing a `Coder` instance the secret can be passed explicitly or when omitted the\n`lusid.secret` system property or environment variable is used. Default mode is `Mode.MIXED`.\n\n```java\n// explicit custom secret property\nSystem.setProperty(\"my.secret.property\", \"42\");\nCoder c1 = Coder.of(\"my.secret.property\", 1);   // minimum length 1, mode MIXED\n\n// explicit secret value, explicit mode\nCoder c2 = Coder.of(42L, 2, Mode.UPPER);        // minimum length 2, mode UPPER\n\n// implicit secret from lusid.secret\nCoder c3 = Coder.of(3);                         // minimum length 3, mode MIXED\n```\n\n\n## 🔠 Modes\nFive standard modes are included:\n\n* `MIXED`: uses upper and lower case letters and digits\n* `UPPER`: uses upper case letters and digits\n* `LOWER`: uses lower case letters and digits\n* `XSAFE`: uses upper and lower consonants and digits (safest when trying to avoid 🤬 words)\n* `SHAPE`: uses upper and lower letters and digits that cannot be easily confused visually\n\nIt is also easy to create further user defined modes.\n\nThe below table demonstrates IDs using the different `Mode`s with minimum length 6:\n```\n Value   MIXED   UPPER   LOWER   XSAFE   SHAPE\n \n      1  VJH5h8  HH5VJ8  55hjv8  VVjvJr  VvJ6Er\n     12  XH6r8k  R5XK86  6hkx8r  XjxKrk  XJ7Wrk\n    123  k4eS8s  6TCO8S  rgn38f  ktCSrs  k5eSrs\n   1234  w8NcsC  L8E1SC  z81efn  LrNcsC  UrNcsC\n  12345  8KW5uS  8R7VGO  86wjt3  rXlvGS  rxL6AS\n 123456  5R9wZj  V69LW5  jr9z7h  vkhLZj  6KhUZj\n1234567 GR2oSct U6PFO1T 4rds3eg TkpFSct tK3uSct\n```\n\n## 🛡️ Security\n\u003e [!Important]\n\u003e **TLDR;** Do not expose an API to retrieve an encoded ID for a known input (original) value.\n\u003e Both the secret and original values must stay \"unknown\" to maintain information hiding.\n\nThis is not a typical \"encryption\" library. \nMeaning, the algorithm is easily reversible.\nThe information is hidden and disclosed using a simple XOR with a 64 bit secret.\nThis might appear awfully simplistic, however, as long as an attacker \ndoes not know the secret or any original value together with its corresponding \nencoded value there is no way of telling if an assumed secret is correct as any\nsecret will result in a number, just not the original one. \n\nThe security comes from removing the possibility to recognise if the\noriginal value is found when reversing the algorithm. \nThis means a brute force attack is fairly meaningless. \nIt maybe can be used to find some bits of the secret \nbased assumptions like - \"most original numbers are small positive numbers\" -\nbut the lower bits of the secret (which are used most) are also \nmost impossible to extract this way as any secret results in a\nset of small numbers (if the original set was indeed a set of small numbers)\nbut the numbers are only the correct ones with the correct secret.\n\nIt is also not possible to tell from the ID what type of value \n(`long`, `int`, `float`, `double`, `String`) it has been generated from.\nWhen the `Mode` is known one could only tell if an ID contains one or more values.\n\n\n## ⏱️ Performance\n\n\u003e [!Tip]\n\u003e **TLDR;** The takeaway here is encoding and decoding is very cheap.  \n\u003e It literally can be done millions of times per second on any HW around.\n\nSome rough numbers for encoding and decoding all values between \n-1 million and +1 million with a minimal length of 8 in `MIXED` mode.\nThis means padding was used all the time (worst case scenario; \nencoding/decoding without padding would be noticeably faster).\n```\nBenchmark                       Mode  Cnt    Score    Error  Units\nCoderAvgBenchmark.decodeDouble  avgt    3  183.560 ± 35.838  ns/op\nCoderAvgBenchmark.decodeFloat   avgt    3  150.219 ± 30.255  ns/op\nCoderAvgBenchmark.decodeInt     avgt    3  110.799 ± 19.256  ns/op\nCoderAvgBenchmark.decodeLong    avgt    3  110.073 ± 32.172  ns/op\nCoderAvgBenchmark.encodeDouble  avgt    3  149.111 ± 26.844  ns/op\nCoderAvgBenchmark.encodeFloat   avgt    3   75.705 ±  5.836  ns/op\nCoderAvgBenchmark.encodeInt     avgt    3   61.169 ±  5.636  ns/op\nCoderAvgBenchmark.encodeLong    avgt    3   58.463 ±  7.764  ns/op\nCoderAvgBenchmark.recodeDouble  avgt    3  351.010 ± 38.346  ns/op\nCoderAvgBenchmark.recodeFloat   avgt    3  234.458 ± 62.205  ns/op\nCoderAvgBenchmark.recodeInt     avgt    3  184.654 ± 26.260  ns/op\nCoderAvgBenchmark.recodeLong    avgt    3  182.452 ± 21.883  ns/op\n```\n(decode = op is just decoding, encode = op is just encoding, recode = op is encoding and decoding) \n\nThis wasn't a very accurate run as it lacks in iterations and forks (I just don't have the patience 😂)\nbut rerunning the benchmark a bunch has shown that if anything a more accurate Score is smaller (faster).\n\nAlso, any attempt to do better isn't worth much as this was running on 2018 laptop HW while having \ndevelopment tools running as well.\nThe intent here is to have a ballpark number; encoding is around 100ns, decoding around 150ns,\ndouble is worst, long is best, closely followed by int and float.\nThese numbers all make sense considering the work done in the different scenarios.\nThe algorithm is build for `long` so it is expected to do best (for same length).\n`double` also does worse since it does need 20 characters most of the time.\nVery large long values will move towards the double score but never quit get as high (slow).\n\nIn comparison to the popular [Sqids](https://github.com/sqids/sqids-java) library _Lusid_\nis almost 2 orders of magnitude faster. For variable length IDs it is around 64 times faster,\nfor minimum length 8 IDs it is around 72 times faster than Sqids. \nFor comparison _Lucid_ requires around 5.5 times the time it takes to convert a number to a \nstring and parse it back to the number whereas Sqids takes around 350 times the time.\nAgain, not measured under ideal conditions but the picture is pretty clear. \n```\nBenchmark                                Mode  Cnt      Score     Error  Units\nCoderVsSqidsBenchmark.parseLongToString  avgt    3     23.339 ±   6.054  ns/op\nCoderVsSqidsBenchmark.recodeLongLusid    avgt    3    128.457 ±   3.356  ns/op\nCoderVsSqidsBenchmark.recodeLongSqids    avgt    3   8199.822 ± 395.992  ns/op\n\nCoderVsSqidsBenchmark.recodeLongLusid8   avgt    3    174.865 ±  14.733  ns/op\nCoderVsSqidsBenchmark.recodeLongSqids8   avgt    3  12689.196 ±  13.887  ns/op\n```\n\n## 🧮 Algorithm\nThe algorithm works on bit level using `long`s. \nThe 64bits of a `long` value are split in high `int` and low `int` value\nwhich are encoded the same way concatenating the result but ignoring leading zeros.\n\nThe 32 bits of each `int` are encoded in 10 groups of 3 and 1 group of 2:\n```\ncharacter        9   8   7   6   5   4   3   2   1   0  off\nvalue           000 000 000 101 011 001 100 010 001 110 10 \nsecret          100 101 001 100 110 010 001 110 000 101 00\nOXR             --- --- --- 001 101 011 101 100 001 011 10\ncharacter index              =1  =5  =3  =5  =4  =1  =3 --\ntable index                  0   3   2   1   0   3   2  (2)\nUPPER character              C   H   3   V   G   E   3             \n```\nEach 3 bit group is XORed with the secret to get the character index on the mapping table.\nHence, each character mapping table (`Mode.tables`) must have 8 distinct characters to encode 3 bits, 0-7. \nLeading zeros in the value are not encoded. \nTables are cycled through right to left starting with the offset given by the lowest 2 bits, 0-3. \nTherefore, there must be at least 4 tables to cycle through. \nHere we assume 4 tables being used.\n\n#### Padding\n\nIf the resulting character sequence is shorter than the target minimum length\npadding is added on the left. For a single missing character the `Mode.pad1` is added.\nFor 2 or more missing characters `Mode.padN` is added left most followed by\nthe number of additional padding bytes left to the 2nd length encoding character.\n\nAssuming the example from above should be padded to different minimum length, again using `Mode.UPPER`.\n\n```\ncharacter        9   8   7   6   5   4   3   2   1   0  off\nUPPER character              C   H   3   V   G   E   3             \nsecret          100 101 001 100 110 010 001 110 000 101 00\ncharacter index              =1  =5  =3  =5  =4  =1  =3 --\ntable index          2   1   0   3   2   1   0   3   2  (2)\npadding 1                9   C   H   3   V   G   E   3\npadding 2            8   T   C   H   3   V   G   E   3\npadding 3        8   1   S   C   H   3   V   G   E   3\n```\nThe `T` encodes zero additional padding bytes (that follow the `T`).\nIt is encoded as `T` because the count (0) is XORed with the lowest 3 bits of the 32 bit (high/low int) secret (4).\nThe table used continuous the cycle so T encodes the character at index 4 in the table at index 1.\n\nIf there are padding characters right to the padding count this cycles through the existing\nresults for value XOR secret right to left but encodes the value with the table belonging to\nthe padding character's position. In the example we get the `S` because the right most XORed\nvalue was 3, the table for the padding character is 1, so the character at index 3 in the table at index 1.\n\nTo not always lead padded IDs with the padding indicator (`Mode.pad1` or `Mode.padN`) the character\nfinally switches place with the character at the index resulting from \nbit-count of value XOR secret (full 32bit) modulo the ID target length.\n\n#### Flipping\nAs negative numbers usually have leading 1s (not leading zeros) it is often preferable to do a bitwise flip and\nencode the flipped number instead. This always takes place before any of the above. If the number is flipped\nthe `Mode.flip` marker character is prepended. After the encoding is done the marker is the also swapped to another\nposition based on the bit-count of the original value (64 bit) module the ID length.\nIf padding is available the flip bit takes one of the padding places. Otherwise, it is \"extra\".\n\n#### Joining\nWhen multiple numbers are encoded each number is encoded as described above.\nThe parts are then joined (or seperated) by the `Mode.join` character.\nThe padding to reach the target length is equally distributed on the individual numbers.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbee%2Flusid","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjbee%2Flusid","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbee%2Flusid/lists"}