{"id":34193476,"url":"https://github.com/petersmagnusson/base62","last_synced_at":"2026-03-11T07:02:25.134Z","repository":{"id":216433739,"uuid":"741296639","full_name":"petersmagnusson/base62","owner":"petersmagnusson","description":"base62. Arbitrary inputs, deterministic and close to optimal output. Typescript, Go, and Rust versions.","archived":false,"fork":false,"pushed_at":"2025-03-14T17:20:31.000Z","size":1894,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-14T18:28:43.377Z","etag":null,"topics":["base62","base62-decoding","base62-encoding","go","golang","javascript","rust","rust-lang","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/petersmagnusson.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-10T05:16:10.000Z","updated_at":"2025-03-14T17:20:34.000Z","dependencies_parsed_at":"2024-06-16T20:47:17.498Z","dependency_job_id":"5e2339d2-adfd-4126-a75c-4d924787693c","html_url":"https://github.com/petersmagnusson/base62","commit_stats":null,"previous_names":["petersmagnusson/base62"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/petersmagnusson/base62","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petersmagnusson%2Fbase62","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petersmagnusson%2Fbase62/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petersmagnusson%2Fbase62/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petersmagnusson%2Fbase62/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/petersmagnusson","download_url":"https://codeload.github.com/petersmagnusson/base62/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/petersmagnusson%2Fbase62/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30373509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-11T06:09:32.197Z","status":"ssl_error","status_checked_at":"2026-03-11T06:09:17.086Z","response_time":84,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["base62","base62-decoding","base62-encoding","go","golang","javascript","rust","rust-lang","typescript"],"created_at":"2025-12-15T16:39:21.703Z","updated_at":"2026-03-11T07:02:25.128Z","avatar_url":"https://github.com/petersmagnusson.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# base62 reference implementation\n\nbase62 ``[A-Za-z0-9]`` encoding and decoding. Base62 remedies a few issues that\nwe've learned about base64 over the years (see below). \n\nReference implementation with extensive test suite in TypeScript 3.2 / ECMAScript 2015 (ES6) or later.\nPlease note that the Go and Rust versions are more or less direct AI translations\nof the TypeScript. Please open an issue if you need these improved or if you\nneed other programming language versions.\n\n    import { arrayBufferToBase62, base62ToArrayBuffer } from 'base62'\n    const encoded = arrayBufferToBase62((new TextEncoder).encode('Hello World!'))\n    const decoded = new TextDecoder().decode(base62ToArrayBuffer(encoded))\n    console.log(decoded)\n\nThis algorithm has no restrictions on the input size. The resulting length is\nonly a function of the length of the input (not the contents). It works\nwith whole bytes only, using integer modulus operations, in big-endian order,\nand default chunk size is 32 bytes. Default character set is base64\ncompatible ``[A-Za-z0-9]``. Both chunk size and choice of character set \ncan be configured.\n\nPerformance of base62 is generally worse than base64. This\nimplementation is fast as base62 systems go, but the focus has been on quality\nof encoded results, in particular for smaller sizes, and correctness.\n\nThe algorithm is close to theoretical optimum for base62 (eg if the entire\nbinary content were treated as a single integer). Notably, for several common\nsmaller sizes of bit strings, optimal base62 encoding results in the same\nlengths as base64.\n\n\n\n## Background\n\nIn contexts where we have a restricted set of characters to\nchoose from, base64 is suitable for large amounts of binary data, both for density\nand speed of encoding/decoding. However, the need to encode\nlarge amounts of binary data in 'printable character' format has\nbecome less of a concern over time, while we have had an increase in\nsituations where we need to encode small amounts of binary data that are\nthen directly handled by humans (copy-pasted, or printed, or outright memorized).\n\nThe largest set of printable characters that are \nalmost uniformly permitted are the 62 alphanumerics (``[A-Za-z0-9]``).\n\nThe algorithm encodes and decodes data using a base62 encoding scheme,\nwith a preferred chunk size of 32 bytes. Each chunk is\nfirst converted into a BigInt (eg up to 2^256-1 in size), and then\niteratively divided by 62 to encode it into a base62 string, zero-padded\nwith the character representing zero in our default base62 character set ('A'). We\nmaintain maps (M and invM) to correlate the length of byte sequences with\ntheir corresponding base62 string lengths, and vice versa. The algorithm\noperates in big-endian format. It includes checks to validate the correctness\nof the base62 strings, ensuring they are valid outputs of the same base62\nencoding process.\n\nBoth chunk size and choice of character set is easily modified (there are\na handful of incompatible choices for character set, see below).\n\n## Efficiency (briefly)\n\nGenerally for base62, each character represents log2(62) or about 5.9542 bits. In principle\nthis would require 0.8% more characters than base64, but in practice\nthere is often no difference, in particular in crypto contexts.\n\nNotably the resulting encoding lengths are the same for 128, 256, and 512 bits.\nBase64 has a \"sweetspot\" with 192 bits (and multiples thereof such as 384) since\nlog2(64) has 3 as a prime factor. But even then, for multiples of 192 up to 4x192, the\ndifference is only one character.\n\nIn fact, for bit lengths that are multiples of 32 (4 bytes),\nunless the bit length is also evenly divisible by 3 (in other words unless the total\namount of data in bits is divisible by 96), then b62 and b64 result\nin the same encoding lengths for all cases shorter than 352 bits.\n\nIf we look at a larger set of common key sizes (such as 128, 160, 192, 224, 256,\n320, 384, and 512) then unless they are a multiple of 192 bits (in the case of\nthis list 192 and 384), encoding lengths are same.\n\nThis is because, curiously, 43xlog2(62) is 256.03, an inefficiency of only 1/8000,\nwhereas 43xlog2(64) is 258.00, an inefficiency of 1/64, allowing b62 to \"catch up\".\n\nHence the default chunking of 32 bytes. This dramatically improves performance\ncompared to larger chunks, with minimal impact on quality. In fact you would\nneed to go to chunk sizes well above 512 bytes to see much difference. Conversely,\nsmaller chunks lead to significantly worse encoding.\n\nGiven our chunking of 256 bits, if we compare with theoretically optimal b62\nencoding, and we as above restrict input sizes\nto be a multiples of 4 bytes (32 bits), then for sizes up to 812 bytes\n(6496 bits) this algorithm is optimal, and for sizes up to 7052 bytes (56416 bits)\nit is behind optimum by at most one character.\n\n## Issues with Base64\n\nBase64 is the main standard for encoding binary data in printable format.\nThere are only 62 alphanumeric (A-Za-z0-9) characters, so any base64\ndesign needs to pick two symbols. For historical reasons (see the separate\n\"HISTORY.md\" document), Base64 uses '+' and '/', and '=' for padding.\n\nUnfortunately, these choices predate the World Wide Web. Uniform Resource\nIdentifiers (URIs) reserve all of those symbols. Base64 can work without '=',\nbut URIs reserve '+' for spaces and '/' for path separators.\n\nThis leads to Base64URL, which uses '-' and '_' instead of '+' and '/',\nallowing the encoding to be used in URIs. But this is not a standard,\nso for example, web APIs like atob() and btoa() in browsers do not support\nBase64URL. Conversely, JWT (JSON Web Tokens), which itself is a standard,\nuses Base64URL.\n\nThis in turn leads to things like encodeURIComponent() and decodeURIComponent()\nbeing applied to Base64 tokens - sometimes. This bridging between Base64\nand Base64URL depending on context, or wrapping on or the other, is a constant\nsource of bugs and confusion.\n\nHistorically, binary-data-as-readable-string was something that occurred\n\"internally\" in systems, for example to handle arbitrary binary attachments\nto email. But with the rise of various cryptographic features, items\nlike tokens and keys often parts of \"text\" that an end-user is directly\nmanipulating - copying, pasting, etc.\n\nWhereas the earlier issues mostly impact developers, the\nsymbols '-' and '_' introduce issues for users. Especially on\nmobile devices, \"selecting\" parts of text can be difficult. The symbols '-' and '_'\nare treated differently: '-' is generally treated as a word separator,\nwhereas '_' is not. Thus, for example, a 256-bit value encoded as Base64URL\nmay or may not include '-', in fact it's about a 50/50 chance. So half the time\na user needs to copy-paste such a token, they can just double-tap to select\nall of the characters, and about half the time they can't.\n\nBase62 has always existed as an option, but since it amounts to encoding\nusing fractional bits, it has two challenges: it is likely to be much\nslower, and there are corner cases that would require general agreement.\n\nTwo things have changed in recent years. First, BigInt support is now\npervasive in programming languages, so can be viewed as a primitive\nin any common environment, and for the situations where end-users are\ndirectly invovled, by definition the amount of data is small.\nSecondly, the universe of encodings have\nbeen so expanded, that, increasingly, environments have ways of expressing\nwhat encoding is being used (e.g. https://github.com/multiformats/multibase).\n\nSo it would seem that the obvious approach is to use Base64 wherever it\ninvolves large amounts of data, and Base64URL wherever interoperability\nwith standards like JWT is required, and Base62 for anything that is\nexposed to end-users. \n\n## Other Implementations\n\nUnfortunately, there is no standard for base62, so various implementations\nthat are in circulation are often not compatible.\n\nDifferences include:\n\n* The character set used, or rather, the order of the characters. Unfortunately,\n  four variations are in circulation: Base64 ordering (A-Za-z0-9), lexicographic\n  (ASCII) order (0-9A-Za-z), \"BaseN\" (**) ordering (0-9a-zA-Z), and finally\n  but least commonly (a-zA-Z0-9). We chose 'Base64 ordering' to be aligned\n  with the base64 standard (e.g. A-Za-z0-9).\n\n* Many \"base62\" implementations only encode a number, not an arbitrary\n  binary object.\n\n* Some approaches lead to variable-length encoding, eg the length\n  of the result depends on the contents of the input (*). For various reasons,\n  in many cases this is not desirable - the length of the base62 output\n  should be predictable from the length of the (binary) input.\n\n* Few (if any) approaches appear to be close to the theoretical optimum\n  for base62 (at least from the limited testing we've done).\n\nImplementations we are currently looking at for comparison include\nthe below. This list will grow as we find more, then hopefully curated down\nto keep 'canonical' implementations for different approaches. Let us\nknow what we're missing. Principal programming language is in parentheses.\n\n* (Go) https://github.com/marksalpeter/token/tree/master/v2 - token/uin64 only\n\n* (Java) https://github.com/glowfall/base62 - base64 ordering, variable length\n  with non-optimal results. Some examples (results are formatted as bufferSize:min/avg/max):\n\n```\n     16:   22 /    22.01 /   23 (optimum is   22)\n     32:   43 /    43.5  /   45 (optimum is   43)\n     40:   54 /    54.24 /   56 (optimum is   54)\n     64:   86 /    86.65 /   89 (optimum is   89)\n   2048: 2751 /  2759.82 / 2769 (optimum is 2752) (*)\n   4096: 5507 /  5519.23 / 5534 (optimum is 5504)\n   6240: 8393 /  8408    / 8428 (optimum is 8385)\n   5280: 7101 /  7114.5  / 7129 (optimum is 7138)\n```\n\n* (Python) https://github.com/suminb/base62\n  Variable results, not guaranteed optimal. For example, byte length 32\n  results in 42, 43, or 44 characters.\n\n* (Go) https://github.com/keybase/saltpack/tree/master/encoding/basex\n\n* (Go) https://github.com/eknkc/basex\n  Generic base. A port of https://github.com/cryptocoinjs/base-x (from JavaScript),\n  which in turn is a derivation of bitcoin/src/base58.cpp (generalized for variable alphabets).\n  For base62 uses (0-9a-zA-Z).\n\n* (C# and Javascript) https://github.com/KvanTTT/BaseNcoding\n\n* (Java) https://github.com/seruco/base62\n\n* (Go) https://github.com/jxskiss/base62\n  Inspired by glowfall. Variadic length encoding, not optimal but avoids bigint.\n\n* (Rust) https://github.com/fbernier/base62\n\n\n\n## Footnotes\n  \n(*) Variable length output means that it's possible some inputs will\nresult in encodings that are shorter than what is 'theoretically possible'.\nSince binary data in these encoding contexts are typically 'random',\ntaking any sort of compression approach for base62 doesn't lead to\nany benefits on efficiency (to the contrary), but allows for faster algorithms.\nA common approach is a sliding mask to decide on encoding either five\nbits or six bits at a time.\n\n(**) 'BaseN' approaches will prefer to pick 'less ambiguous' characters,\nand in that context, lowercase is considered preferable. Of course, base62\nis precisely the point on that continuum where both lower and upper case\nare included, and no additional symbols beyond alphanumeric. But to our\nknowledge, common 'baseN' implementations do not 'catch' this special case for base62.\nThe Base64 ordered version of Base62 is sometimes referred to as 'truncated\nbase64'. To (not) help clarify things, the Wikipedia article on base62 has chimed in\nwith different versions depending on the year. The first table\nadded to the 'base62' article was in 2020, and that showed A-Za-z0-9.\nThen in 2021 it was changed to 0-9A-Za-z, then some edit wars, it was\nchanged back and forth a few times, currently it's showing 0-9a-zA-Z.\nAt no point does the article appear to have mentioned that\nthere are in fact multiple versions (and no standard).\n\n# Appendix: A Short History of Encoding and Base64\n\n_Apologies for any remaining errors and omissions in this history section,\nplease let us know if we have missed influences, earlier important work, etc._\n\nA (very) brief history. Base64 as defined today (RFC 4648) traces back to\nPrivacy Enhanced Mail (PEM) in the early 1990s, which therefore predates\nthe web (eg URLs etc). PEM was designed to encode (encrypted) binary data as\nwell as cryptographic keys etc in a format that could be transmitted in\nemail messages. At the time, the required subsets of US-ASCII was referred\nto as \"printable characters\" - as opposed to control characters, eg the\nC0 and C1 sets. US-ASCII in turn dates back to ISO/IEC 646 for 6-bit\ncharacter set ... which in turn dates to the 1960s. (Trivia: it also\nbecame ECMA-6, which in fact predates ECMA-9 and ECMA-10 ... which is FORTRAN\nand punched tape, respectively.)\n\nYou've probably seen things like this:\n\n```\n-----BEGIN ENCRYPTED PRIVATE KEY-----\nMIHNMEAGCSqGSIb3DQEFDTAzMBsGCSqGSIb3DQEFDDAOBAghhICA6T/51QICCAAw\nFAYIKoZIhvcNAwcECBCxDgvI59i9BIGIY3CAqlMNBgaSI5QiiWVNJ3IpfLnEiEsW\nZ0JIoHyRmKK/+cr9QPLnzxImm0TR9s4JrG3CilzTWvb0jIvbG3hu0zyFPraoMkap\n8eRzWsIvC5SVel+CSjoS2mVS87cyjlD+txrmrXOVYDE+eTgMLbrLmsWh3QkCTRtF\nQC7k0NNzUHTV9yGDwfqMbw==\n-----END ENCRYPTED PRIVATE KEY----\n```\n\nThat's a PKCS#8 private key encoded in PEM format (RFC 7468); similarly\nfor things like X.509 (PKIX) and S/MIME (CMS) certificates, and so on.\n\nBase64 defines the encoding as being done in 3-byte chunks, which is\n24 bits, and with log2(64) being 6 bits that means 4 characters per chunk.\nThe standards also dictate that the base64 encoding must be line-wrapped\nat 64 characters. PEM in fact used 76 characters per line, but MIME constrained\nthis to 64.\n\nRecapitulating some of these steps, we trace the \"constrained resource\"\nnature of 7-bit ASCII. ASCII \"control\" characters were defined as the\nfirst 32 characters, and the last character was DEL (127). The rest\nwere \"printable\" characters (32-126). The alphanumerics (0-9A-Za-z)\nare interspersed with a quickly diminishing supply of symbols:\n\n```\n !\"#$%\u0026'()*+,-./0123456789:;\u003c=\u003e?\n@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_\n`abcdefghijklmnopqrstuvwxyz{|}~\n```\n\nThe first (#32, \"space\") is excluded quickly from any encoding format,\nand the symbols '\"' (#34 or DQUOTE), '(' (#40), ')' (#41), ',' (#44),\n'.' (#46), ':' (#58), ';' (#59), '\u003c' (#60), '\u003e' (#62), '@' (#64),\n'[' (#91), '\\\\' (#92), and ']' (#93) are all grabbed by RFC 822/2822, for\na total of 14 absorbed back in the day when there seemed to be an infinite\nsupply of symbols. Not counting '\"' (DQUOTE), these are called \"specials\" in\nRFC 822/2822. Some of these can be, and are, resurrected.\n\nSQUOTE ''' and DQUOTE '\"' are excluded as they are used as string delimeters\nof various types.\n\nAt this point we need to mention non-English languages. Though the\nPhoeneician alphabet is pretty common, the English language doesn't make\nuse of diacritics for \"meaning\", mostly for pronunciation hints - eg cooed\nvs coördinate. Other Latin-script languages may use them to distinguish\nbetween homonyms, eg French \"ou\" (or) vs \"où\" (where).\n\nThis meant that over time, and before the development of ASCII 'extensions',\ncharacters like '^' (#94), '`' (#96), and '~' (#126) became commonly used\nfor diacritics. This relates to \"deadkeys\" on typewriters, which were\nused to type diacritics. The \"deadkey\" was a key that didn't print anything\nby itself, but modified the next key pressed. For example, on a French\ntypewriter, the 'a' key would print 'a' by itself, but if the 'a' key was\npressed after the '^' deadkey, it would print 'â'. The 'a' key was also\nused for 'à' and 'ä', and the 'e' key was used for 'ê', 'é', 'è', and 'ë'\n(to get 'ä', you would press the 'a' key after the '\"' deadkey).\n\nAlong similar lines, whereas '$' (#36) was used for currency, it is the\ndollar. In some cases, '$' would instead become the local currency\nsymbol such as '£' (#163) for British pounds, but in other cases,\n'#' (#35) would used (since US dollar was pretty universal).\n\nSo for these reasons, symbols like '^' (#94), '`' (#96), and '~' (#126)\nwere in practice excluded from \"reuse\". Similarly, '{' (#123), '|' (#124),\n'}' (#125), '[' (#91), '\\\\' (#92), and ']' (#93) were excluded from\n\"reuse\" as they were often used for 'national' characters, for example\nin Swedish keyboards, layouts would map \"åÅäÄöÖ\" to \"{}[]|\\\\\". So a Swedish\nprogrammer had to switch keyboard mode to go between programming and\nwriting emails.\n\nThis is all reflected in International Alphabet No. 5 (\"IA5\"), which\nwas later defined in ISO/IEC 646:1991. IA5 is a subset of 7-bit ASCII.\n\nBelow I'm using '#' to indicate all these \"national\" characters, either\nbecause they are undefined in IA5, or because they are used for\ndiacriticals. (Of course, '#' itself is one such character.)\n\n```\n    IA5 character set\n       0 1 2 3 4 5 6 7 8 9 A B C D E F\n    2x   ! # # # % \u0026 # ( ) * + # - . /\n    3x 0 1 2 3 4 5 6 7 8 9 : ; \u003c = \u003e ?\n    4x # A B C D E F G H I J K L M N O\n    5x P Q R S T U V W X Y Z # # # # #\n    6x # a b c d e f g h i j k l m n o\n    7x p q r s t u v w x y z # # # #\n```\n\nThe first (#32, \"space\") is excluded quickly from any encoding format,\nand the symbols '\"' (#34 or DQUOTE), '(' (#40), ')' (#41), ',' (#44),\n'.' (#46), ':' (#58), ';' (#59), '\u003c' (#60), '\u003e' (#62), '@' (#64),\n'[' (#91), '\\\\' (#92), and ']' (#93) are absorbed back in the day.\nNot counting '\"' (DQUOTE), these are called \"specials\" in\nRFC 822/2822. Some of these would be resurrected.\n\nSQUOTE ''' and DQUOTE '\"' are in any case excluded as they are used as\nstring delimeters of various types.\n\nIf we exclude RFC 822 \"specials\" from the characters that are encoded\nthe same in ASCII and IA5, we are left with just these symbols:\n\n```\n   ! % \u0026 * + - / = ?\n```\n\nPEM picked their 64-character subset from this - the alphanumerics are\nidentical in ASCII and IA5, and then they grabbed '/', '+', and '='.\nI have not been able to find documentation on why these choices in\nparticular. But dating back to at least ECMA-1 (1963, see references)\nand 6-bit character sets, the only unambiguous symbols were ''( ) * + , - /'',\nand '=' and '%' were interchangeable so either could serve as padding\n(back then). \n\nMoving on. With MIME (RFC 1341) we get \"tspecials\" which are the specials\nplus '/' (#47), '?' (#63), and '=' (#61).\n\nSo the Base64 standard chooses the same as PEM.\n\nNow comes World Wide Web, starting in 1990. RFC 1630 defines URLs, and\nRFC 1738 defines URLs in more detail. The URL syntax is based on RFC 822,\nand so inherits the specials and tspecials. We lose '%' (#37)\nfor escaping, '+' (#43) for spaces, and '#' (#35) for fragment identifiers.\nAnd '!' (#33) and '*' (#42) are reserved for use as having\n\"special significance\" in certain contexts.\n\nWith WWW comes HTML, as defined in RFC 1866 and based on SGML\n(ISO 8879:1986), and certain characters are treated as special due to their\nroles in markup syntax. SGML designates '\u003c' (#60), '\u003e' (#62), and '\u0026' (#38)\nas special characters for defining tags and entities. HTML, while\ninheriting these special characters from SGML, also commonly uses double\nquotes '\"' (DQUOTE) and single quotes ''' (SQUOTE) for delimiting attribute\nvalues within element tags, and '#' (#35) to precede numeric character\nreferences.\n\nThe language issues were of course intimately understood by the WWW pioneers,\nsince they were literally based in Geneva. In RFC 1630 the category 'national'\nthus includes '{', '}', '|' (VLINE), '[', ']', '\\\\', '^', and '~'. Though\nthey don't seem to have been money grubbers so they ignored dual use of '#'.\n\nSo to summarize at this point:\n\n```\n   symbols = \"!\" | \"%\" | \"\u0026\" | \"*\" | \"+\" | \"-\" | \"/\" | \"=\" | \"?\"\n   base64  =                         \"+\" |       \"/\" | \"=\"\n   URI     = \"!\" | \"%\" | \"\u0026\" | \"*\" | \"+\" |     | \"/\" | \"=\" | \"?\"\n```\n\nSo standards for \"content\" collide with standards for \"addressing\" (*).\n\nThis gives birth to Base64URL, which replaces '+' with '-', and '/' with '\\_'.\nNote that '\\_' is not ideal, since it was used as national character. But\nit was the least bad.\n\nThe final collision is with graphical user interfaces, in particular smaller\ndevices like phones and tablets. Double-clicking will select a \"word\", which\nby convention includes the underscore '\\_' but not '-', so double tapping on\na base64url would absorb one of the symbols, but not both. The origin for this\ndistinction, in turn, is that the underscore is used in programming languages\nas a valid character in identifiers, which thus make up \"words\" in the context\nof programming, whereas '-' is not, since that's an operator (minus) and a\nmath symbol, whereas '\\_' had no real corresponding convention in writing.\n\nBut of course standard-based base64 decoders do not accept 'base64url'.\nYet because of the above issues, newer standards have been forced to\nuse 'base64URL' instead of 'base64', eg JSON Web Token (JWT, RFC 7519).\nAnd notably in javascript and web pages, things like btoa(), atob(), and\nData URLs work with standard base64 not base64url. Instead, encodeURIcomponent()\nprovides percentage escaping of all characters except:\n\n```\nA–Z a–z 0–9 - _ . ! ~ * ' ( )\n```\n\nWhich safeguards any string of characters ... except for a www form\nsubmission spaces should be \"+\" and \"%20\" needs to be post-recoded to \"+\".\n\nAnd of course IPv6 URI syntax reserves \"[\" and \"]\".\n\nAnd the newest URI standard (RFC 3986) reserves ''! ' ( ) *'' ...\n\nSo, yeah, base62 comes in handy.\n  \n## References\n\n* \u003chttps://ecma-international.org/wp-content/uploads/ECMA-1_1st_edition_march_1963.pdf\u003e April 1963, ECMA-1\n\n* \u003chttps://datatracker.ietf.org/doc/html/rfc822\u003e August 1982\n* \u003chttps://datatracker.ietf.org/doc/html/rfc1341\u003e June 1992\n* \u003chttps://datatracker.ietf.org/doc/html/rfc1630\u003e June 1994\n* \u003chttps://datatracker.ietf.org/doc/html/rfc1738\u003e December 1994, obsoleted by RFC 3986\n* \u003chttps://datatracker.ietf.org/doc/html/rfc1866\u003e November 1995\n* \u003chttps://datatracker.ietf.org/doc/html/rfc2822\u003e Obsoletes RFC 822 (get it? \"version 2\"? nerds.) April 2001\n* \u003chttps://datatracker.ietf.org/doc/html/rfc4627\u003e July 2006, obsoleted by RFC 7159\n* \u003chttps://datatracker.ietf.org/doc/html/rfc4648\u003e October 2006\n* \u003chttps://datatracker.ietf.org/doc/html/rfc8259\u003e December 2017, obsoletes RFC 7159\n\n* Standard Generalized Markup Language (ISO 8879:1986 SGML) now published under\n  \u003chttps://www.iso.org/standard/16387.html\u003e\n\n* \u003chttps://ecma-international.org/wp-content/uploads/ECMA-262_3rd_edition_december_1999.pdf\u003e December 1999\n\n* \u003chttps://datatracker.ietf.org/doc/html/rfc7519\u003e May 2015, JWT\n\n## Footnotes\n\n(*) Something analogous happened in the Middle Ages when arithmetic\nand geometry started to merge, and hence standards for \"distances\"\ncollided with standards for \"surface area\", which is the short answer\nto the question of why there are 5280 feet in a mile (the surface area\nstandards were much more important).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetersmagnusson%2Fbase62","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpetersmagnusson%2Fbase62","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpetersmagnusson%2Fbase62/lists"}