{"id":15557724,"url":"https://github.com/fgasper/p5-sys-binmode","last_synced_at":"2025-04-11T19:11:59.526Z","repository":{"id":54166428,"uuid":"343534394","full_name":"FGasper/p5-Sys-Binmode","owner":"FGasper","description":"CPAN’s Sys::Binmode","archived":false,"fork":false,"pushed_at":"2023-07-20T15:49:45.000Z","size":132,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-25T15:06:41.659Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FGasper.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-01T19:35:20.000Z","updated_at":"2023-07-19T00:40:48.000Z","dependencies_parsed_at":"2024-12-09T01:24:02.502Z","dependency_job_id":"0f43564b-6224-468c-bd17-529b533f7a79","html_url":"https://github.com/FGasper/p5-Sys-Binmode","commit_stats":{"total_commits":109,"total_committers":2,"mean_commits":54.5,"dds":0.00917431192660545,"last_synced_commit":"4eed823ac934a7733e7377ed25845ea6aeff5489"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FGasper%2Fp5-Sys-Binmode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FGasper%2Fp5-Sys-Binmode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FGasper%2Fp5-Sys-Binmode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FGasper%2Fp5-Sys-Binmode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FGasper","download_url":"https://codeload.github.com/FGasper/p5-Sys-Binmode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248465345,"owners_count":21108244,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T15:20:28.457Z","updated_at":"2025-04-11T19:11:59.513Z","avatar_url":"https://github.com/FGasper.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NAME\n\nSys::Binmode - A fix for Perl’s system call character encoding\n\n\u003cdiv\u003e\n    \u003ca href='https://coveralls.io/github/FGasper/p5-Sys-Binmode?branch=master'\u003e\u003cimg src='https://coveralls.io/repos/github/FGasper/p5-Sys-Binmode/badge.svg?branch=master' alt='Coverage Status' /\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n# SYNOPSIS\n\n    use Sys::Binmode;\n\n    my $foo = \"\\xff\";\n    $foo .= \"\\x{100}\";\n    chop $foo;\n\n    # Prints a single octet (0xFF) and a newline:\n    print $foo, $/;\n\n    # In Perl 5.32 this may print the same single octet, or it may\n    # print UTF-8-encoded U+00FF. With Sys::Binmode, though, it always\n    # gives the single octet, just like print:\n    exec 'echo', $foo;\n\n# DESCRIPTION\n\ntl;dr: Use this module in **all** new code.\n\n# BACKGROUND\n\nIdeally, a Perl application doesn’t need to know how the interpreter stores\na given string internally. Perl can thus store any Unicode code point while\nstill optimizing for size and speed when storing “bytes-compatible”\nstrings—i.e., strings whose code points all lie below 256. Perl’s\n“optimized” string storage format is faster and less memory-hungry, but it\ncan only store code points 0-255. The “unoptimized” format, on the other\nhand, can store any Unicode code point.\n\nOf course, Perl doesn’t _always_ optimize “bytes-compatible” strings;\nPerl can also, if\nit wants, store such strings “unoptimized” (i.e., in Perl’s internal\n“loose UTF-8” format), too. For code points 0-127 (ASCII printables,\ncontrols, and DEL) there’s actually no\ndifference between the two forms, but for 128-255 the formats differ. (cf.\n[\"The \"Unicode Bug\"\" in perlunicode](https://metacpan.org/pod/perlunicode#The-Unicode-Bug)) This means that anything that reads\nPerl’s internals **MUST** differentiate between the two forms in order to\nuse the string correctly.\n\nAlas, that differentiation doesn’t always happen. When it doesn’t, Perl\noutputs code points 128-255 differently depending on whether the\ncontaining string is “optimized” or not.\n\nRemember, though: Perl applications _should_ _not_ _care_ about\nPerl’s string storage internals like optimized/unoptimized. (This is why,\nfor example, the [bytes](https://metacpan.org/pod/bytes)\npragma is discouraged.) The catch, though, is that without that knowledge,\n**the** **application** **can’t** **know** **what** **it** **actually** **says**\n**to** **the** **outside** **world!**\n\nThus, applications must either monitor Perl’s string-storage internals\nor accept unpredictable behavior, both of which are categorically bad.\n\n(Perl’s documentation calls the “unoptimized” format “upgraded”, while\nit calls the “optimized” format “downgraded”. The rest of this document\nwill favor Perl’s terms.)\n\n# HOW THIS MODULE (PARTLY) FIXES THE PROBLEM\n\nThis module provides predictable behavior for Perl’s built-in functions by\ndowngrading all strings before giving them to the operating system. It’s\nequivalent to—but faster than!—prefixing your system calls with\n`utf8::downgrade()` (cf. [utf8](https://metacpan.org/pod/utf8)) on all arguments.\n\nPredictable behavior is **always** a good thing; ergo, you should\nuse this module in **all** new code.\n\n# CAVEAT: CHARACTER ENCODING\n\nIf you apply this module injudiciously to existing code you may see\nexceptions or character corruption where previously things worked fine.\n\nThis can\nhappen if you’ve neglected to encode one or more strings before\nsending them to the OS. Without Sys::Binmode, Perl sends upgraded\nstrings to the OS in UTF-8 encoding. In essence, it’s an implicit\nUTF-8 auto-encode, which is kind of nice, except that it depends on\nPerl’s internals, which are unpredictable. Sys::Binmode removes\nthat implicit UTF-8 auto-encode, which of course will break things\nthat need it.\n\nThe fix is to apply an explicit UTF-8 encode prior to the system call\nthat throws the error. This is what we should do _anyway_;\nSys::Binmode just enforces that better.\n\n## Example: The [utf8](https://metacpan.org/pod/utf8) Pragma\n\nThe widely-used [utf8](https://metacpan.org/pod/utf8) pragma particularly exemplifies this problem.\n\nIf you have code like this:\n\n    use utf8;\n\n    mkdir \"épée\";\n\n… then adding this module will change your program’s behavior in ways you’ll\nprobably dislike.\n\nConsider the string `épée`. Without the `utf8` pragma (but assuming that\nthe code _is_ actually written in UTF-8) this is 6\ncharacters because the two `é`s are 2 bytes each (so 2 + 1 + 2 + 1),\nand without the `utf8` pragma each byte in a string constant becomes its own\ncharacter, even if multiple bytes make up a single UTF-8 character. Since\nnothing _probably_ upgrades that string on its way to\n`mkdir()`, the OS will receive the intended 6 bytes and create a directory\nwith a UTF-8-encoded name.\n\n_With_ `utf8`, though, `épée` is **4** characters, not 6, because\nthis string is now UTF-8-decoded. Those 4 characters all lie beneath 256,\nso the string is still bytes-compatible. Thus, if you `print()` that string\nyou’ll get 4 bytes of Latin-1, which probably **isn’t** what you want.\n\n`mkdir()`, though, _probably_ still creates a directory with a 6-byte (UTF-8)\nname. This happens when Perl itself stores `épée` in upgraded (i.e.,\n“unoptimized”) form. If that’s the case, that means Perl’s _internal_ buffer\nof `épée` is still the 6 bytes of UTF-8, even though to the Perl\n_application_ it’s a 4-character string. Perl’s `mkdir()` doesn’t care\nabout characters, though; it just gives Perl’s internal buffer to the\nOS’s create-directory function. So by violating its own abstraction, Perl\nhappens to achieve something that is _sometimes_ useful.\n\nThere are still two problems, though:\n\n- 1. Inconsistency: `print()` sends 4 bytes to the OS while\n`mkdir()` (again, _probably_) outputs 6.\n- 2. Uncertainty: `épée` _could_ be stored downgraded rather than\nupgraded, which would cause `mkdir()` to send 4 bytes instead.\n\n`print()`’s outputting of 4 bytes here is actually the **correct** behavior\nbecause it doesn’t depend on whether Perl stores the string upgraded or\ndowngraded. Sys::Binmode extends that correct behavior to `mkdir()` and\nother such Perl commands.\n\nOf course, in the end, we want `mkdir()` to receive 6 bytes of UTF-8, not\n4 bytes of Latin-1. To achieve that, just do as you normally do with\n`print()`: encode your string before you give it to the OS.\n\n    use utf8;\n    use Encode;\n\n    mkdir encode(\"UTF-8\", \"épée\");\n\nThis is what your code should look like, regardless of Sys::Binmode;\nthe omitted encoding step was a bug that Perl’s own abstraction-violation\nbug _might_ have obscured for you. Sys::Binmode fixes Perl’s bug,\nwhich makes you fix your own bug, too.\n\n## Non-POSIX Operating Systems (e.g., Windows)\n\nIn a POSIX operating system, an application’s communication with the\nOS happens entirely through byte strings. Thus, treating all\nOS-destined strings as byte strings is good and natural.\n\nIn Windows, though, things are weirder. For example, Windows\nexposes multiple APIs for creating a directory, and the one Perl uses (as of\n5.32, anyway) only accepts code points 0-255. In this context Sys::Binmode\ndoesn’t _break_ anything, but it does reinforce one of Perl’s unfortunate\nlimitations on Windows.\n\nSys::Binmode is a good idea anywhere that Perl sends byte strings to the OS.\nFor now, as far as I know, that’s everywhere that Perl runs. If that’s not\ntrue, please file a bug.\n\n# WHERE ELSE THIS PROBLEM CAN APPEAR\n\nThe unpredictable-behavior problem that this module fixes in core Perl is\nalso common in [CPAN](http://cpan.org)’s XS modules due to rampant\nuse of [the SvPV macro](https://perldoc.perl.org/perlapi#SvPV) and\nvariants. SvPV is basically Perl’s [bytes](https://metacpan.org/pod/bytes) pragma in C: it gives\nyou the string’s\ninternal bytes with no regard for what those bytes represent. This, of course,\nis problematic for the same reason why the [bytes](https://metacpan.org/pod/bytes) pragma is. XS authors\n_generally_ should prefer\n[SvPVbyte](https://perldoc.perl.org/perlapi#SvPVbyte)\nor [SvPVutf8](https://perldoc.perl.org/perlapi#SvPVutf8) in lieu of\nSvPV unless the C code in question handles Perl’s encoding abstraction.\n\nNote in particular that, as of Perl 5.32, the default XS typemap converts\nscalars to C `char *` and `const char *` via an SvPV variant. This means\nthat any module that uses that conversion logic also has this problem.\nSo XS authors should also avoid the default typemap for such conversions.\n(Again, though, use of the default typemap in this context is regrettably\ncommonplace.)\n\nBefore Perl 5.18 this problem also affected %ENV. 5.18 introduced\nan auto-downgrade when setting %ENV similar to what this module does.\n\n# LEXICAL SCOPING\n\nIf, for some reason, you _want_ Perl’s unpredictable default behavior,\nyou can disable this module for a given block via\n`no Sys::Binmode`, thus:\n\n    use Sys::Binmode;\n\n    system 'echo', $foo;        # predictable/sane/happy\n\n    {\n\n        # You should probably explain here why you’re doing this.\n        no Sys::Binmode;\n\n        system 'echo', $foo;    # nasal demons\n    }\n\n# AFFECTED BUILT-INS\n\n- `exec`, `system`, and `readpipe`\n- `do` and `require`\n- File tests (e.g., `-e`) and the following:\n`chdir`, `chmod`, `chown`, `chroot`, `ioctl`,\n`link`, `lstat`, `mkdir`, `open`, `opendir`, `readlink`, `rename`,\n`rmdir`, `stat`, `symlink`, `sysopen`, `truncate`,\n`unlink`, `utime`\n- `bind`, `connect`, `setsockopt`, and `send` (last argument)\n- `syscall`\n\n## Omissions\n\n- `crypt` already does as Sys::Binmode would make it do.\n- `select` (the 4-argument one) has the bug that Sys::Binmode fixes,\nbut since it’s a performance-sensitive call where upgraded strings are\nunlikely, this library doesn’t wrap it.\n\n# KNOWN ISSUES\n\n[autodie](https://metacpan.org/pod/autodie) creates functions named, e.g., `chmod` in the\nnamespace of the module that `import()`s it. Those functions lack\nthe compiler “hint” that tells Sys::Binmode to do its work; thus,\n[autodie “clobbers” Sys::Binmode](https://github.com/pjf/autodie/issues/113).\n`CORE::*` functions will still have Sys::Binmode, but of course they won’t\nthrow exceptions.\n\n# TODO\n\n- `dbmopen` and the System V IPC functions aren’t covered here.\nIf you’d like them, ask.\n- There’s room for optimization, if that’s gainful.\n- Ideally this behavior should be in Perl’s core distribution.\n- Even more ideally, Perl should adopt this behavior as _default_.\nMaybe someday!\n\n# ACKNOWLEDGEMENTS\n\nThanks to Leon Timmermans (LEONT) and Paul Evans (PEVANS) for some\ndebugging and design help.\n\n# LICENSE \u0026 COPYRIGHT\n\nCopyright 2021 Gasper Software Consulting. All rights reserved.\n\nThis library is licensed under the same license as Perl.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffgasper%2Fp5-sys-binmode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffgasper%2Fp5-sys-binmode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffgasper%2Fp5-sys-binmode/lists"}