{"id":29419892,"url":"https://github.com/bytecodealliance/arf-strings","last_synced_at":"2025-07-12T01:13:21.561Z","repository":{"id":43151889,"uuid":"258657486","full_name":"bytecodealliance/arf-strings","owner":"bytecodealliance","description":"Encoding and decoding for ARF strings","archived":false,"fork":false,"pushed_at":"2025-03-10T15:58:07.000Z","size":69,"stargazers_count":14,"open_issues_count":1,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-07-12T00:44:55.170Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytecodealliance.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-25T00:55:27.000Z","updated_at":"2025-03-10T15:58:11.000Z","dependencies_parsed_at":"2023-01-21T12:32:52.304Z","dependency_job_id":null,"html_url":"https://github.com/bytecodealliance/arf-strings","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/bytecodealliance/arf-strings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytecodealliance%2Farf-strings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytecodealliance%2Farf-strings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytecodealliance%2Farf-strings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytecodealliance%2Farf-strings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytecodealliance","download_url":"https://codeload.github.com/bytecodealliance/arf-strings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytecodealliance%2Farf-strings/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264922908,"owners_count":23683705,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-12T01:13:18.696Z","updated_at":"2025-07-12T01:13:21.549Z","avatar_url":"https://github.com/bytecodealliance.png","language":"C","readme":"# ARF strings\n\nARF is the Alternative Representation for Filenames, an encoding for\nrepresenting NUL-terminated non-UTF-8 strings as valid (and non-NUL-terminated)\n[UTF-8] strings. It's intended for use in environments that need a way to\nrepresent [POSIX-compatible] and Windows-compatible path names within UTF-8\nstring types.\n\nThis is an experiment, and the Windows encoding scheme is particularly\nexperimental.\n\n[UTF-8]: https://en.wikipedia.org/wiki/UTF-8\n[POSIX-compatible]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_271\n\n## Description\n\nARF strings have the following form:\n\n```\narf-string ::= U+FEFF lossy-portion U+0000 NUL-escaped-portion\n```\n\n`U+FEFF` is the Byte Order Mark (BOM) code point.\n\nThe `lossy-portion` consists of the original string data (excluding the\nterminating NUL) with any unencodable bytes replaced by `U+FFFD`, the Unicode\nreplacement character.\n\n`U+0000` is the NULL (NUL) code point.\n\nThe `NUL-escaped-portion` of an ARF string consists of the original string\ndata (again, excluding the terminating NUL) with any unencodable bytes replaced\nby `U+0000` followed by:\n - On POSIX-ish platforms, the invalid byte with the most significant bit set to 0.\n - On Windows, a Unicode scalar value between `U+0` and `U+7FF`, representing\n   the offset in the surrogate codepoint space (`U+D800` through `U+DFFF`).\n\n## Example\n\nThe ARF encoding of `\"foo\\xffbar\"` on POSIX-ish platforms is `\"\\xef\\xbb\\xbffoo\\xef\\xbf\\xbdbar\\x00foo\\x00\\x7fbar\"`:\n - `\"\\xef\\xbb\\xbf\"` is the UTF-8 encoding of `U+FEFF`.\n - `\"foo\\xef\\xbf\\xbdbar\"`is the string with the unencodable byte replaced by the UTF-8 encoding for `U+FFFD`.\n - `\"\\x00\"` is the UTF-8 encoding for `U+0000`.\n - `\"foo\\0\\x7fbar\"` is the string with the unencodable byte replaced by a `NUL` followed by the invalid byte with the most significant bit set to 0.\n\n## Rationale\n\nUnencodable pathnames are very rare in practice, so this design doesn't attempt to\nmake them efficient. In the worst cases, ARF strings may be several times the size\nof the corresponding input strings (though they're still O(n)). The redundancy is\nused to protect against accidental misuse by code not aware of ARF strings.\n\nC and POSIX code represent paths as NUL-terminated strings. When given an ARF string,\nsuch code will only see the BOM and the lossy portion containing replacement characters.\nIn most cases, attempts to open such a pathname will produce `ENOENT` errors, since the\nleading UTF-8 BOM and UTF-8 replacement byte sequences are unlikely to appear in\nnon-UTF-8 filenames. Typical application error messages will include the pathname,\nwhere the replacement characters will serve as a hint as to the nature of the problem.\n\nConsequently, by default, ARF-unaware C and POSIX code will not be able to open\nunencodable pathnames. For many applications, this limitation is worth the advantage\nof being able to assume that all pathnames are UTF-8. Applications that wish to\nwork with unencodable pathnames can opt in by being explicitly aware of ARF strings,\noptionally with the help of the Rust and C libraries in this repository.\n\nAnother tricky case is code which modifies paths. ARF-unaware code may modify ARF\nstrings without being aware of the ARF encoding. Such code won't know to update the\nNUL-escaped portion of the ARF string, and the resulting ARF string will subsequently\nbe detected as invalid, leading to errors rather than surprising behavior.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytecodealliance%2Farf-strings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytecodealliance%2Farf-strings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytecodealliance%2Farf-strings/lists"}