{"id":48078833,"url":"https://github.com/athalhammer/erdi8-py","last_synced_at":"2026-04-04T14:52:27.579Z","repository":{"id":44624474,"uuid":"383881579","full_name":"athalhammer/erdi8-py","owner":"athalhammer","description":"identifier generator","archived":false,"fork":false,"pushed_at":"2026-02-10T21:25:30.000Z","size":165,"stargazers_count":8,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-11T00:25:13.353Z","etag":null,"topics":["accession","base33","collision-free","generator","identifier","identifiers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/athalhammer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-07-07T17:50:13.000Z","updated_at":"2026-02-10T21:25:34.000Z","dependencies_parsed_at":"2023-01-31T19:00:37.448Z","dependency_job_id":"9b91b4ae-a967-41f3-a0ff-65ba17159a31","html_url":"https://github.com/athalhammer/erdi8-py","commit_stats":null,"previous_names":["athalhammer/erdi8"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/athalhammer/erdi8-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/athalhammer%2Ferdi8-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/athalhammer%2Ferdi8-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/athalhammer%2Ferdi8-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/athalhammer%2Ferdi8-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/athalhammer","download_url":"https://codeload.github.com/athalhammer/erdi8-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/athalhammer%2Ferdi8-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31403780,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accession","base33","collision-free","generator","identifier","identifiers"],"created_at":"2026-04-04T14:52:26.818Z","updated_at":"2026-04-04T14:52:27.566Z","avatar_url":"https://github.com/athalhammer.png","language":"Python","funding_links":["https://www.buymeacoffee.com/thalhamm"],"categories":[],"sub_categories":[],"readme":"[![example workflow](https://github.com/athalhammer/erdi8/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/athalhammer/erdi8-py/actions/workflows/unit_tests.yml)\n[![PyPI](https://img.shields.io/pypi/v/erdi8)](https://pypi.org/project/erdi8)\n[![GitHub license](https://img.shields.io/github/license/athalhammer/erdi8-py.svg)](https://github.com/athalhammer/erdi8-py/blob/master/LICENSE)\n[![Downloads](https://static.pepy.tech/badge/erdi8)](https://pepy.tech/project/erdi8)\n\n# erdi8\n\n\u003ca href=\"https://www.buymeacoffee.com/thalhamm\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/default-orange.png\" alt=\"Buy Me A Coffee\" height=\"41\" width=\"174\"\u003e\u003c/a\u003e\n\nerdi8 is a [unique identifier](https://www.wikidata.org/wiki/Q6545185) scheme and identifier generator + transformer that operates on the following alphabet:\n\n```\n['2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', \n'i', 'j', 'k', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']\n```\n\nIt is basically a base36 alphabet that intentionally avoids the ambiguous characters `[0, 1, and l]` and therefore shrinks to 33. In addition to that, it ensures that no identifier starts with a numeric value by using an offset of 8. The zero is represented by 'a', 25 is represented by 'a2', etc. With three characters or less one can create 28'075 (25 + 25 * 33 + 25 * 33 * 33) different identifiers. With 6 characters or less we have 1'008'959'350 options. In a traditional identifier world, one would use a prefix, e.g. M, and then an integer. This only gives you 100k identifiers (M0 to M99999) with up to 6 characters. The scheme enables consecutive counting and is therefore free of collisions. In particular, it is __not__ a method to create secret identifiers.\n\nA TypeScript implementation is available at [erdi8-ts](https://github.com/athalhammer/erdi8-ts).\n\n## Usage\n\nInstall with `pip install erdi8`\n\n### Basic (counting)\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e e8 = Erdi8()\n\u003e\u003e\u003e e8.increment(\"erdi8\")\n'erdi9'\n\u003e\u003e\u003e e8.decode_int(\"erdi8\")\n6545185\n\u003e\u003e\u003e e8.encode_int(6545185)\n'erdi8'\n```\n\n### Advanced (still counting)\nFixed length \"fancy\" identifiers with `safe=True` \n\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e safe = True\n\u003e\u003e\u003e start = 'b222222222'\n\u003e\u003e\u003e stride = 30321718760514\n\u003e\u003e\u003e e8 = Erdi8(safe)\n\u003e\u003e\u003e e8.increment_fancy(start, stride)\n'fmzz7cwc43'\n\u003e\u003e\u003e current = e8.increment_fancy('fmzz7cwc43', stride)\n\u003e\u003e\u003e print(current)\nk7zydqrp64\n\n# reverse engineer stride from two consecutive identifiers\n\u003e\u003e\u003e e8.compute_stride('fmzz7cwc43', current)\n{'stride_effective': 30321718760517, 'stride_other_candidates': [30321718760516, 30321718760515, 30321718760514]}\n\n# split modspace into six approximate equal-sized parts that can then be used as individual start values. This is useful to make erdi8 identifier generation scale horizontally over multiple geographically distributed services.\n\u003e\u003e\u003e e8.split_fancy_space(10, stride, number_chunks = 6)\n['b222222222', 'mtccmwqwzc', 'xmpq7sfsyp', 'jf22tp5pxz', 'v7cdfjvkxb', 'fzpr2fkgwn']\n\n# increase the first identifier of the third chunk and show that the increased identifer also lives in the third chunk\n\u003e\u003e\u003e e8.increment_fancy('xmpq7sfsyp', stride)\n'c7ppf5b52q'\n\u003e\u003e\u003e e8.fancy_split_index('c7ppf5b52q', stride, 6)\n2\n```\n\n**NOTE**\n\n0. These sequences may have a \"fancy\" appearance but __they are not random__. They are perfectly predictable and are designed to \"fill up the whole mod space\" before previously coined identifiers start re-appearing.\n1. The `safe=True` option helps you to avoid unintended words (i.e. removes the characters `[aeiou]` from the alphabet)\n2. The fancy increment works with fixed lengths. If you work with a length of 10 (like above) You will have `20 * 28^9 = 211'569'119'068'160` options with `safe=True`. If you think you have more things to identify at some point you have two options: a) start directly with more characters or b) check for the start value (in this case `b222222222`) to re-appear - this will be the identifier that will \"show up twice\" first.\n3. Store the following four parts in a safe place: a) `safe` parameter b) the `start` value c) the `stride` value. On top, keep good track of the `current` value.\n\n\n### Advanced (random)\nAlso see documentation of Python's integrated [`random`](https://docs.python.org/3/library/random.html) and [`secrets`](https://docs.python.org/3/library/secrets.html) modules, in particular for `random`: \"The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the `secrets` module\". In any case, you should know what you are doing.\n\n`random` module:\n\n```\n$ python3\n\n\u003e\u003e\u003e import random\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e e8 = Erdi8()\n\n# get random erdi8 identifiers with length 10\n\u003e\u003e\u003e mini, maxi, _ = e8.mod_space(10)\n\u003e\u003e\u003e e8.encode_int(random.randint(mini, maxi))\n'vvctyx7c6o'\n```\n\n`secrets` module:\n\n```\n$ python3\n\n\u003e\u003e\u003e import secrets\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e e8 = Erdi8()\n\n\u003e\u003e\u003e e8.encode_int(int.from_bytes(secrets.token_bytes()))\n'jtx3i83pii8wo98wzuucu7uag6khrfpabrdn3qrqrxdxauvcgjg'\n\n\u003e\u003e\u003e e8.encode_int(secrets.randbits(256))\n'a53mpn3xntywcbdcvfa932ub34evne9oha8pzoy6ii3ur2e364z'\n```\n\n### Advanced (hash functions)\nerdi8 is compatible to the most common hash functions that typically output the digest in hexadecimal format. Also refer to the integrated [`hashlib`](https://docs.python.org/3/library/hashlib.html) Python module. In addition, consider other [hash functions](https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed).\n\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e import hashlib\n\n# prepare the item to be hashed and display the digests for sha256 and md5\n\u003e\u003e\u003e s = \"asdf\".encode(\"ascii\")\n\u003e\u003e\u003e hashlib.sha256(s).hexdigest()\n'f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b'\n\u003e\u003e\u003e hashlib.md5(s).hexdigest()\n'912ec803b2ce49e4a541068d495ab570'\n\n# encode the respective digests with erdi8\n\u003e\u003e\u003e e8 = Erdi8()\n\u003e\u003e\u003e e8.encode_int(int.from_bytes(hashlib.sha256(s).digest()))\n'n6vz5j427zw66qx9n4jk9sw7otrvu38gdteehsocbke3xocvqok'\n\u003e\u003e\u003e e8.encode_int(int.from_bytes(hashlib.md5(s).digest()))\n'bcmhm477p7poz6sv8jpr4cqu4h'\n\n# same as above but safe=True\n\u003e\u003e\u003e e9 = Erdi8(safe=True)\n\u003e\u003e\u003e e9.encode_int(int.from_bytes(hashlib.sha256(s).digest()))\n'cg8644xv4txkj49sfzcwn49h3hvsqb8xm2pqxxfxxg7mpz3nwsmhnf'\n\u003e\u003e\u003e e9.encode_int(int.from_bytes(hashlib.md5(s).digest()))\n'fv3y2y9mgbr4xs85z5qb6bp4dxm'\n\n# re-establish the hexdigest\n\u003e\u003e\u003e hex(e8.decode_int('n6vz5j427zw66qx9n4jk9sw7otrvu38gdteehsocbke3xocvqok'))\n'0xf0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b'\n\u003e\u003e\u003e hex(e8.decode_int('bcmhm477p7poz6sv8jpr4cqu4h'))\n'0x912ec803b2ce49e4a541068d495ab570\n\n# re-establish the hexdigest with from safe=True\n\u003e\u003e\u003e hex(e9.decode_int('cg8644xv4txkj49sfzcwn49h3hvsqb8xm2pqxxfxxg7mpz3nwsmhnf'))\n'0xf0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b'\nhex(e9.decode_int('fv3y2y9mgbr4xs85z5qb6bp4dxm'))\n'0x912ec803b2ce49e4a541068d495ab570'\n\n```\n\n### Advanced (UUID)\nAlso see the documentation of the [`uuid`](https://docs.python.org/3/library/uuid.html) integrated Python module.\n\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e import uuid\n\u003e\u003e\u003e e8 = Erdi8()\n\u003e\u003e\u003e e9 = Erdi8(safe=True)\n\n\u003e\u003e\u003e a = uuid.uuid4()\n\u003e\u003e\u003e a\nUUID('6e8f578c-577c-4f48-b6ac-bf135c310dc4')\n\u003e\u003e\u003e b = e8.encode_int(a.int)\n\n# here we have the UUID encoded as erdi8 string - 10 char shorter than ordinary UUIDs\n\u003e\u003e\u003e b\n'au3jqjghpb7dqfejdanskzoaik'\n\n# same as above but with safe=True\n\u003e\u003e\u003e c = e9.encode_int(a.int)\n\u003e\u003e\u003e c\n'drmhy438mjhqdsbxhzn6v27b8n6'\n\n# reverse\n\u003e\u003e\u003e uuid.UUID(int=e8.decode_int(b))\nUUID('6e8f578c-577c-4f48-b6ac-bf135c310dc4')\n\n# reverse with safe=True\n\u003e\u003e\u003e uuid.UUID(int=e9.decode_int(c))\nUUID('6e8f578c-577c-4f48-b6ac-bf135c310dc4')\n\n```\n\n**Note**: This will never start with a zero or will in any way generate \"number only\" strings.\n\n### Advanced (xid)\nSee also [`xid`](https://github.com/rs/xid). With `erdi8` encoding you gain some properties i.e. omitting problematic `[0, 1, l]` or also `[a, e, i, o, u]` (with the `safe=True` option to avoid \"bad\" words, see below in the FAQ), reducing to 19 characters only (at least until 2065 where it will switch to 20) or maintaining 20 characters while omitting `[a, e, i, o, u]` with `safe=True` (until 2081 after which it will switch to 21), and always start with a char (in fact, current or future xids will also start with a char). The k-sortedness property of xids will be maintained with the respective length (e.g., you should not compare 19 and 20 char xid+erdi8 strings after 2065 without modifications. You could add a leading `0` which is not in the erdi8 alphabet and can serve as a padding after 2065). The properties of `xid`s are kept as there is a bijective transformation via the int value of the 12 bytes of any xid.\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e from xid import Xid\n\n\u003e\u003e\u003e x = Xid()\n\n# or, if you want to reproduce the below:\n\u003e\u003e\u003e x = Xid([100, 144, 152, 133, 98, 39, 69, 106, 189, 98, 39, 93])\n\n\u003e\u003e\u003e x.string()\n'ci89h1b24t2mlfb24teg'\n\n\u003e\u003e\u003e x.value\n[100, 144, 152, 133, 98, 39, 69, 106, 189, 98, 39, 93]\n\n\u003e\u003e\u003e e8 = Erdi8()\n\u003e\u003e\u003e e = e8.encode_int(int.from_bytes(x.value))\n\u003e\u003e\u003e e\n'op34e9rackpsch39few'\n\n\u003e\u003e\u003e y = Xid(e8.decode_int('op34e9rackpsch39few').to_bytes(12))\n\u003e\u003e\u003e y.string()\n'ci89h1b24t2mlfb24teg'\n\n\u003e\u003e\u003e e9 = Erdi8(safe=True)\n\u003e\u003e\u003e f = e9.encode_int(int.from_bytes(x.value))\n\u003e\u003e\u003e f\n'n7dsv982t6dxymy4z5t3'\n\u003e\u003e\u003e z = Xid(e9.decode_int('n7dsv982t6dxymy4z5t3').to_bytes(12))\n\u003e\u003e\u003e z.string()\n'ci89h1b24t2mlfb24teg'\n\n```\n\n### Advanced (encode bytes)\n`erdi8`, by default works with integer representations. In particular, it represents any larger sequence of bytes as an integer. There are two main assumptions: 1) The size of the integers is usually small as one of the goals is concise identifiers. 2) The data is static and we are *not* considering streams of data (at the time of encoding the beginning we don't know the end yet). However, these assumptions may be wrong or may not hold for your use case. Therefore, we offer a method that can encode four bytes as erdi8 at a time. It results in junks of `erdi8` identifiers of length seven that can be concatenated if needed. The respective function is called `encode_four_bytes`.\n\n```\n$ python3\n\n\u003e\u003e\u003e from erdi8 import Erdi8\n\u003e\u003e\u003e e8 = Erdi8()\n\u003e\u003e\u003e e8.encode_four_bytes(bytes(\"erdi\", \"ascii\"))\n'bci7jr2'\n\n\u003e\u003e\u003e e8.decode_four_bytes('bci7jr2')\nb'erdi'\n\n\u003e\u003e\u003e e9 = Erdi8(True)\n\u003e\u003e\u003e e9.encode_four_bytes(bytes(\"erdi\", \"ascii\"))\n'fjx2mt3'\n\u003e\u003e\u003e e9.decode_four_bytes('fjx2mt3')\nb'erdi'\n```\n\n**NOTE**: These two methods are not compatible to the other `erdi8` functions. The integers behind the four byte junks are altered so that we ensure it will always result in a `erdi8` identifier character length of 7.\n\n### Even more advanced\nRun a light-weight erdi8 identifier service via [fasterid](https://github.com/athalhammer/fasterid)\n\n\n## Test cases\n\n```\n$ python3 -m unittest test/erdi8_test.py \n```\n\n## FAQ\n\n__Why should I use `erdi8` instead of [`shortuuid`](https://github.com/skorokithakis/shortuuid)?__\n\n_There are multiple aspects to it: `shortuuid` with the normal alphabet contains upper and lowercase characters. In `erdi8` we avoid this (see below). There is the option to customize the alphabet of `shortuuid`: you could use the erdi8 alphabet for example. However, this leads to very long UUIDs. In this context, we find the following statement in the README particularly troublesome: \"If 22 digits are too long for you, you can get shorter IDs by just truncating the string to the desired length.\". This drops all beneficial stochastic properties of UUIDs and you need to run careful checks for potential identifier duplication. Here `erdi8` with its counting or \"mod space counting\" options has a significant advantage._\n\n__Why no upper case characters?__\n\n_Because we don't want to `erdi8` to be confused with `Erdi8`._\n\n__Why no start with a number?__\n\n_Because we want to avoid \"number-only\" identifiers. If we allowed to start with a number, we would have identifiers of the type `42` and `322` which could be mistaken for integers. We could achieve this with a more complex scheme avoiding any number-only combinations (would therefore still allow ids like `2z`, to be investigated). Further, certain technologies such as XML don't support element names that start with a number. In particular, QNAMES such as `asdf:123` are not allowed. Finally, it is important to note that programs like Excel are really creative when transforming input data, for example `08342 -\u003e 8342`, `12e34 -\u003e 12E+34`, `SEPT1 -\u003e Sep-01` etc. erdi8 with the safe option on avoids 99% of these types of issues._\n\n__How about combinations that form actual (bad) words?__\n\n_This depends on the use case and the way erdi8 is used. Therefore, we can recommend to work with filter lists. In addition, an erdi8 object that avoids the `aeiou` characters can be created with `Erdi8(safe=True)`. This shrinks the available character space to 28 and the produced output is not compatible to `Erdi8(safe=False)` (default). The danger that unintended English words are created is lower with this setting.  It is recommended for erdi8 identifiers that are longer than three characters where filter lists start to become impractical._\n\n__How does this relate to binary-to-text encodings such as base32 and base64?__\n\n_erdi8 can be used for a binary-to-text encoding and the basic functions to implement this are provided with `encode_int` and `decode_int`. However, the primary purpose is to provide a short counting scheme for identifiers._\n\n__What could be a drawback of using erdi8?__\n\n_It depends how you use it. If you use it to re-encode integer representations of other byte-array-like objects (secret numbers, hash digests, UUIDs, xids) it is likely that the length of the strings produced by erdi8 will vary. This variance may be predictable (for example with `xid`s) but can also cover larger ranges (secrets, hash digests, etc). A minimum and maximum length can be calculated given the number of bytes and the chosen erdi8 options (`safe=True` vs `safe=False`). At the moment we don't support padding as a built-in function. It depends on the use case to determine if it is necessary or not._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fathalhammer%2Ferdi8-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fathalhammer%2Ferdi8-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fathalhammer%2Ferdi8-py/lists"}