{"id":21866886,"url":"https://github.com/rootmos/h","last_synced_at":"2026-03-14T04:09:52.687Z","repository":{"id":61582505,"uuid":"549735876","full_name":"rootmos/h","owner":"rootmos","description":"Hardened script host programs","archived":false,"fork":false,"pushed_at":"2025-03-21T08:03:38.000Z","size":494,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T10:21:21.152Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rootmos.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-11T16:48:21.000Z","updated_at":"2025-03-21T08:03:41.000Z","dependencies_parsed_at":"2024-06-22T03:47:45.871Z","dependency_job_id":"7ce301bc-2af4-44cb-a3ff-7094bce6200c","html_url":"https://github.com/rootmos/h","commit_stats":null,"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootmos%2Fh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootmos%2Fh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootmos%2Fh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootmos%2Fh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rootmos","download_url":"https://codeload.github.com/rootmos/h/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248969272,"owners_count":21191226,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-28T05:07:47.560Z","updated_at":"2026-03-14T04:09:52.674Z","avatar_url":"https://github.com/rootmos.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# *h*ardened script host programs\n[![Build and test](https://github.com/rootmos/h/actions/workflows/build-test.yaml/badge.svg)](https://github.com/rootmos/h/actions/workflows/build-test.yaml)\n\nUnprivileged sandboxing of popular script languages.\n\n|Unprivileged|sandboxing|script languages|\n|------------|----------|----------------|\n|[Alice: \"Why not Docker?\"\u003cbr\u003eBob: \"Because `sudo docker`\".](#frequently-unasked-questions)|[landlock](https://www.kernel.org/doc/html/latest/userspace-api/landlock.html) \u003cbr\u003e [seccomp](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)|[lua](http://lua.org/) -\u003e [hlua](hlua) \u003cbr\u003e [python](https://python.org/) -\u003e [hpython](hpython) \u003cbr\u003e [node](https://nodejs.org/) -\u003e [hnode](hnode) \u003cbr\u003e [bash](https://www.gnu.org/software/bash/) -\u003e [hsh](hsh)|\n\n## DISCLAIMER\nThis project is a work in progress and has not been audited by security\nexperts.\n\nHowever I think it remains useful for educational purposes regarding Linux's\nsometimes daunting security features and using `strace` to illustrate how a\nprogram written in a high level language is translated into syscalls to obtain\nits desired (or undesired) effects.\n\n... and it is certainly better than nothing as I will try to exemplify in the\nfollowing section. But as always, remember that sandboxing and containerization\nonly limit the extent of a successful attack,\nand don't give you carte blanche to willy-nilly execute untrusted code.\n\n## Raison d'être\nSo given that disclaimer, why did I write this?\nShowcasing Linux's security features is only a secondary goal; my primary goal\nis for the reader to add `strace` to her list of favorite tools.\n\n### Alice's [game](https://love2d.org/)\nAssume Alice is a game designer with malicious intent and you are her intended\nvictim.\nBeing a fan of indie games you of course accept to be a beta-tester for her\nlatest creation.\nShe sends you the `fun.lua` game and hidden within is the statement:\n```lua\nos.execute(\"sudo rm -rf --no-preserve-root /\")\n```\n(or she'll try `sudo --askpass` if the credentials aren't cached).\nA diligent code-reviewer might catch such an obviously malicious\nstatement, but it can be surprisingly easy to miss in a hurried\nglance; try to allow yourself only a few seconds to read the following:\n```lua\nfunction run(cmdline)\n    local s = os.getenv(\"SUDO\")\n    if not s then\n        cmdline = \"sudo -A \" .. cmdline\n    end\n    os.execute(cmdline)\nend\n\nfunction clean_cache()\n    local project = os.getenv(\"PROJECT_ROOT\") or \"..\"\n    local cache = os.getenv(\"CACHE\") or \"/tmp\"\n    run(string.format(\"rm -r %s/%s\", cache, project))\nend\n```\nDid you spot the malicious or unintended transposition?\nThis is the hardship presented to us by the PR-culture, and can provide a false\nsense of security.\nThere are also programming languages\n[designed to be difficult to read](https://esolangs.org/wiki/Esoteric_programming_language#Obfuscation).\nAnd speaking of programming languages:\n\"the greatest thing about Lua is that you don't have to write Lua.\"\nMeaning that it's very feasible to bundle a compiler for another language,\nhowever non-esoteric (check out: [fennel](https://fennel-lang.org) and\n[Amulet](https://amulet.works/)).\nBut Lua (as well as Python, Node.js, C and many many more) are:\n[any-effect-at-any-time languages](https://www.youtube.com/watch?v=iSmkqocn0oQ).\nThis in contrast with [Haskell](https://www.haskell.org)\n(check out [Learn You a Haskell for Great Good!](http://www.learnyouahaskell.com))\nor maybe [eff](https://www.eff-lang.org/) if you're feeling adventurous.\nThat means that an expected pure/side-effect free operation such as compiling a\npiece of source code can include an obfuscated `os.execute`-attack or worse\nif the attacker has a more insidious mind.\n\nConsidering that compilers are usually quite extensive pieces of software\nthey provide ample forestry to hide a malicious tree.\nAlice, I suggest you split your malicious code into several commits and PR:s\n(preferably large ones close to a deadline).\nFor the victim, I recommend [Ken Thompson's \"Reflections on Trusting Trust\"](https://dl.acm.org/doi/10.1145/358198.358210),\nwhich if you haven't read I expect will shatter any trust you might have\nimagined you had in *any* binary executable\n(going all the way back to punchcards and the [PDP-1](https://en.wikipedia.org/wiki/PDP-1)).\nThis may seem ridiculous, but [OCaml](https://ocaml.org/)\n(my yardstick language of languages):\nstill bundle a [bootstrapping binary complier](https://github.com/ocaml/ocaml/blob/trunk/boot/ocamlc)\nto build subsequent compilers: this is very much\n[\"tusting trust\"](https://dl.acm.org/doi/10.1145/358198.358210).\nEven more so since [`Coq`](https://en.wikipedia.org/wiki/Coq) is implemented\nin and thus compiled by OCaml; now your trust stack ends in a binary blob:\ndo you trust it? And do you have the incredibly time consuming task of\nverifying no malicious opcodes hide within?\n\nSo the world is a scary and unsatisfactory environment, let's consider\nmitigating the consequences of malicious and/or incompetently written\ncode.\n\n### Enter [no new privileges](https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html)\nAlice's `sudo`-based `rm`-attack can be mitigated by a one-liner:\n[`prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)`](https://man.archlinux.org/man/prctl.2#PR_SET_NO_NEW_PRIVS).\nThis call is not expected to fail, but being a conscientious developer it never\nhurts to crash-don't-thrash and I present a\n[copy-pastable snippet](https://github.com/rootmos/libr/blob/master/modules/no_new_privs.c):\n```c\n#include \u003csys/prctl.h\u003e\n#include \u003cstdlib.h\u003e\n\nvoid no_new_privs(void)\n{\n    if(0 != prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {\n        abort();\n    }\n}\n```\nYou might prefer [`exit`](https://man.archlinux.org/man/exit.3a).\nI don't: [libc](https://libc.org):s commonly provide\n[`atexit`](https://man.archlinux.org/man/atexit.3)\nwhich in my opinion is contrary to a fail-early/crash-don't-thrash philosophy:\nthe operating system already has to assume the responsibility of clean up\nafter a failing process.\n(Ever noticed that C coders don't free their allocations when exiting?)\nUsing `exit` and `atexit` reminds me of languages with exceptions and the\nnightmare when exception handlers raise exceptions ad infinitum.\nInstead consider programming models where failure-is-always-an-option thinking\nis prevalent, such as the actor model:\nwhere the non-delivery of a message is a scenario\nbrought to the forefront (with the real-world scenario of the fallibility of\nnetwork connections).\nIf you are curious I recommend [Erlang](https://www.erlang.org/)\n(check out [Learn You Some Erlang for great good!](https://learnyousomeerlang.com/)).\n(Erlang does not implement the Actor model in a strict way, but provide a very\nenjoyable way to explore its concepts while writing highly concurrent\napplications.)\n\nBack to mitigating Alice's attacks: the above\n[`no_new_privs`](https://github.com/rootmos/libr/blob/master/modules/no_new_privs.c)\ncall is so simple it should always, *always*, be used:\nunless *explicitly necessary* to gain new privileges.\nThis is the [Principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege):\nif the functionality you intend to provide does not require privileges your\nprocess should not have any privileges, and this is the red thread in this\nattempted raison d'être.\nBut in the [RealWorld](https://hackage.haskell.org/package/base-4.13.0.0/docs/Control-Monad-ST-Safe.html#t:RealWorld),\nprocesses inherit quite a handful of privileges that Alice can still abuse,\nas we shall see.\n\nSo Alice can't `sudo` anymore thanks to `PR_SET_NO_NEW_PRIVS`,\nbut even a sneaky `os.execute(\"rm -r ~\")` would still\nbe a [mayor](https://www.youtube.com/watch?v=fmAWIDI4ZgY) buzzkill.\n\nThe naive Lua specific mitigation is to `os.execute = nil` before running the\nentrypoint of Alice's game.\nWell, that may be good enough\n(I haven't figured out a way around that mitigation, but I'm reasonably sure\nthere is an exploit and would be interested in seeing it).\nContinuing this idea we can tweak it into at least making this first naive\nmitigation useful:\n```lua\nos.execute = function() error(\"not allowed\") end\n```\nEspecially since the mitigation I suggest below do not even allow for the\nprogram to try to provide a user-friendly error message.\n\n### Enter [seccomp](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)\nSeccomp is Linux's way of filtering syscalls and so limiting the exposed\nkernel surface.\n\nFancy words aside, this means that when you receive in your email inbox the\nnotification of a new vulnerability you can feel certain that you are not\naffected because the vulnerable syscalls are rejected by your program.\nIf you aren't subscribed to any [CVE](https://en.wikipedia.org/wiki/Common_Vulnerabilities_and_Exposures)\nmailing lists I recommend:\n- [Arch Linux's](https://lists.archlinux.org/mailman3/lists/arch-security.lists.archlinux.org/),\n- [Ubuntu's](https://lists.ubuntu.com/mailman/listinfo/ubuntu-security-announce) or\n- [OpenBSD's](https://www.openbsd.org/mail.html) mailing lists.\n\nThe simplest seccomp filters are essentially accept/reject lists, but they can\ndo more complex things.\nBut as always when it comes to security:\neasily understandable code is always preferred.\n\nBack to Alice's `os.execute`-based attacks:\nwith seccomp enabled with a filter that forbids `exec`:s,\nthe kernel will politely kill your process and suggest to the rest of the\nsystem that you received a `SIGSYS` signal.\nIn practice this means that your process immediately vanishes, so without a\nsyscall inspection tool such as `strace` one is reduced to debugging by:\nthou shalt `printf(\"are we nearly here yet?\");`\n\n### Enter [strace](https://man.archlinux.org/man/strace.1)\nIf you haven't invoked strace before, or you are curious what syscalls are\nbeing used by a program, then try:\n```shell\nstrace lua -e 'print(\"hello\")'\nstrace python -c 'print(\"hello\")'\n```\nThe output of `strace` can be quite extensive (and therefore `strace` provides\nsophisticated ways to filter what is traced).\nFor our [hello world](https://rosettacode.org/wiki/Hello_world/Text) example\nthe interesting syscall can be found towards the end:\n```\nwrite(1, \"hello\\n\", 6)                  = 6\n```\nOther interesting syscalls to look for are memory allocating syscalls such as\n[`brk`](https://man.archlinux.org/man/brk.2) and\n[`mmap`](https://man.archlinux.org/man/mmap.2):\n```\nmmap(NULL, 151552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f55a8b74000\n```\n\nContinuing our investigation of Alice's `os.execute`-attack, we can trace an\nalmost as trivial piece of code:\n```shell\nstrace -f lua -e 'os.execute(\"echo hello\")'\nstrace -f python -c 'import os; os.system(\"echo hello\")'\n```\nNote the `-f` (`--follow-forks`) option that tells `strace` to continue tracing\nspawned child processes.\nAnd now we look for\n[`clone`](https://man.archlinux.org/man/clone.2) (the syscall that implements\n[`fork`](https://man.archlinux.org/man/fork.2))\nand [`execve`](https://man.archlinux.org/man/execve.2).\nFrom the trace in the parent:\n```\nclone3({flags=CLONE_VM|CLONE_VFORK, exit_signal=SIGCHLD, stack=0x7f8160953000, stack_size=0x9000}, 88) = 2304236\n```\nand in the child:\n```\nexecve(\"/bin/sh\", [\"sh\", \"-c\", \"echo hello\"], 0x7ffebf549aa8 /* 52 vars */) = 0\n```\n\nSo why not reject `clone` as well?\nRemember, in Linux threads and processes are the same abstraction:\nessentially one with shared virtual memory space and the other without\n(but even a casual glance at clone's options makes the difference no longer so\nclear cut).\nNow with `clone` rejected: both threads as well as processes are no longer\nthings you need to reason about.\n\nSo how do we actually tell seccomp what to accept and what to reject?\n\n### Enter [Berkeley Packet Filters](https://www.kernel.org/doc/html/latest/bpf/index.html)\nSeccomp filters are expected to be binary representations of\n[cBPF](https://www.kernel.org/doc/Documentation/networking/filter.txt), the c\nstands for \"classic\" BPF (in contrast with\nextended BPF ([eBPF](https://www.kernel.org/doc/html/latest/bpf/index.html)).\n\nWhile cBPF is not theoretically Turing complete\nbecause of lack of infinite memory; it is restricted to the scratch memory:\n`uint32_t M[16]`.\nThat only presents an interesting challenge:\nwhich [Project Euler](https://projecteuler.net) or\n[Advent of Code](https://adventofcode.com) problems can be solved in cBPF?\n\nTherefore working with seccomp can provide somewhat of a challenge.\nSo you may want to use an\n[assembler](https://github.com/torvalds/linux/blob/master/tools/bpf/bpf_asm.c)\nand a [preprocessor](tools/pp) (I've bundled them together as [bpfc](tools/bpfc))\nthat can interpret the constants commonly used\nwhen making syscalls.\n\nI always start with a \"reject everything\" filter:\n```\nbad: ret #$SECCOMP_RET_KILL_THREAD\ngood: ret #$SECCOMP_RET_ALLOW\n```\nthen run a test under `strace`, look for `SIGSYS`, reason about the offending\nsyscall, reluctantly add it to the allowed list and iterate:\n```\nld [$$offsetof(struct seccomp_data, nr)$$]\n\njeq #$__NR_brk, good\n\nbad: ret #$SECCOMP_RET_KILL_THREAD\ngood: ret #$SECCOMP_RET_ALLOW\n```\nEventually when the test passes you have achieved a list of syscalls\nliving up to the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).\n\nThe filter you produce might appear very long, but remember that Linux has a\n[*massive* amount of syscalls](https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/syscall.h.in?h=v1.2.3\u0026id=7a43f6fea9081bdd53d8a11cef9e9fab0348c53d).\nViewed from a security perspective even a moderately long filter is still\na huge reduction of the exposed kernel surface.\nBut a simple (but still very effective) yes/no approach to filtering\nsyscalls falls short when it encounters the\n\"functionality-grouping\" syscalls such as\n[`fcntl`](https://man.archlinux.org/man/fcntl.2),\n[`ioctl`](https://man.archlinux.org/man/ioctl.2) and\n[`prctl`](https://man.archlinux.org/man/prctl.2)\n(which [we encountered above](#enter-no-new-privileges)).\nFor these syscalls it becomes necessary to inspect the call arguments\n(from [`hlua`'s filter](hlua/filter.bpf)):\n```\njne #$__NR_fcntl, fcntl_end\nld [$$offsetof(struct seccomp_data, args[1])$$]\njeq #$F_GETFL, good\njmp bad\nfcntl_end:\n```\n\nSometimes it may be useful to \"tamper\" with a syscall instead of rejecting it\noutright:\nreturn `-1` and set `errno` to `EPERM` or `ENOSYS` to allow a child to recover:\nsee for example the `prlimit` check in\n[`hnode`'s seccomp filter](https://github.com/rootmos/h/blob/6ed41b19839291fe4ca404cb5c315223a0f72ec2/hnode/filter.bpf#L76).\n\nDoing the \"test-n-strace\" dance for a non-trivial test-case you quickly end up\nwith a filter usually including the\n`read`, `write` and `close` syscalls.\n(Unsurprisingly these have syscall numbers:\n[`0`, `1` and `3`](https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/syscall.h.in?h=v1.2.3\u0026id=7a43f6fea9081bdd53d8a11cef9e9fab0348c53d#n1).)\n\n`write` is particularly fun to think about: without it\n[how can you communicate the result of any computation](https://youtu.be/iSmkqocn0oQ?t=195)\nin a \"everything is a file\" system?\nThe syscall filtering way of expressing this is\n[seccomp's strict mode](https://man.archlinux.org/man/seccomp.2#SECCOMP_SET_MODE_STRICT):\nonly allow `write` and `exit`. The reasoning being is that you are only allowed\nto `write` to *already opened* file descriptors (since in this setting\n`open` is forbidden, or more accurately not expressively allowed).\n\nBut even moderately interesting Lua applications enjoy using\n[`require`](https://www.lua.org/manual/5.4/manual.html#pdf-require).\nSo it's not unreasonable to allow Lua to `open` files (which fills in the\nnumber `2` syscall numbering slot).\nBut then Alice changes her `fun.lua` game to include (obfuscated of course):\n```lua\nio.open(os.getenv(\"HOME\") .. \"/.aws/credentials\", \"r\"):read(\"*a\")\n```\nNow Alice has to get this information back to her, but maybe it's a\nmultiplayer game? Or she obfuscates it in the game's log file and exclaims:\n\"Oh the game crashed, why don't you send me the logs?\"\n\nAlice's intentions might only go as far as\n[griefing](https://en.wikipedia.org/wiki/Griefing), and she will try to\n`os.remove` your access tokens.\nAlice, try removing Chrome/Firefox cookies as well.\nThis would definitely lose me my sunny disposition.\n\nRemoving files map to the [`unlink`](https://man.archlinux.org/man/unlink.2)\nsyscall.\nCertainly it commonly makes sense to reject it,\nbut a plausible scenario is using `unlink` is to remove intermediately\ncreated files (during compilation maybe).\n\nSo what do we do about Alice's intent to remove your\n`with-blood-sweat-and-tears.doc` file?\n\n### Enter [landlock](https://www.kernel.org/doc/html/latest/userspace-api/landlock.html)\nLandlock is a fairly recently added security feature, which is meant to\nrestrict filesystem access for unprivileged processes, in addition to the\nstandard UNIX file permissions.\n(I will argue landlock is fairly recent when its new syscalls have, at the time\nof writing: [the highest syscall number](https://git.musl-libc.org/cgit/musl/tree/arch/x86_64/bits/syscall.h.in?h=v1.2.3\u0026id=7a43f6fea9081bdd53d8a11cef9e9fab0348c53d#n355).)\n\nIn essence [landlock](https://man.archlinux.org/man/landlock.7)\ngrants or restricts rights to filesystem operations\non whole filesystem hierarchies. (Note that a single file is a trivial\nhierarchy.)\nSo we can grant read access to `/usr/lib` only and mitigate Alice's attack on\nyour access tokens in your home directory. And maybe allow both read and write\nto `/tmp`, and maybe allow removing (i.e. unlinking).\nUnless you allow `open`'s\n[`O_TMPFILE` flag](https://man.archlinux.org/man/open.2#O_TMPFILE)\nin your seccomp filter of course.\n\nThe reason this section is bare of example code is that I found, and hope you\nwill do too, the concepts behind landlock easily understandable and yet very\npowerful.\nTherefore I will not include any sample code here since the\n[the sample code provided with landlock](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c?h=v6.0.7\u0026id=3a2fa3c01fc7c2183eb3278bd912e5bcec20eb2a)\nis excellent, and is relatively\n[verbatim what I use](https://github.com/rootmos/libr/blob/master/modules/landlock.c).\nMy experienced ratio of positive security impact versus time spent learning\nthe feature is huge.\n\nMy one criticism of the current implementation of landlock is the inability\nto hide files: that is, even though landlock restricts access to a\nfile, for example `/etc/passwd`, then `stat` (or similar) responds with\n`EACCESS` instead of `ENOENT`.\nThe knowledge that a Linux installation has a `/etc/passwd` maybe of limited\nvalue, but revealing that `~/.aws/credentials` exist\ncan enable an attacker to target her attack more effectively against the\ndiscovered files.\n\nFurthermore an attacker can, given enough access to\n(lots, lots and lots of)\nsystem time, enumerate your entire file tree.\nThis is of course a ridiculous endeavour;\n[`#define NAME_MAX 255`](https://git.musl-libc.org/cgit/musl/tree/include/limits.h?h=v1.2.3\u0026id=7a43f6fea9081bdd53d8a11cef9e9fab0348c53d#n45)\nand [`pp`](tools/pp) `('z'-'a')+('Z'-'A')+('9'-'0')` says that the amount of\nfilenames to try are *bounded from below* by\n[`59^255`](https://www.wolframalpha.com/input?i=59%5E255):\nwhich evaluates to an integer that starts with `36920` and ends with `89299`\n(not mentioning the other 442 digits).\n\nThe counter-argument is that there are other, perhaps better, ways of achieving\nthis functionality\n([`chroot`](https://man.archlinux.org/man/chroot.2.en) maybe),\nreducing my criticism to a mere down-prioritized item on a wishlist.\n\nThe wrinkle in our concrete setting of providing script hosts is that sometimes\nthe interpreters want to dynamically load shared libraries which boast a\nnotorious elusiveness and never appear in the same place twice\n([which implies we at least know their velocity](https://en.wikipedia.org/wiki/Uncertainty_principle)).\nHence I have added a set of tools to, at compile time,\ntell [`hpython`](hpython)'s landlock rules to allow read access to the path\nwhere the embedded Python instance will look for, say, the `libz` library.\nThis is the functionality exercised in\n[`hpython`'s `import` test](hpython/test/import).\nThe journey starts with the [`paths`](tools/paths) utility:\n```shell\npaths --python-site -lz\n```\nwhich for my system suggest that these file system trees are of\nparticular interest:\n```\n/usr/lib/python3.10\n/usr/lib/libz.so.1\n```\nBehind the scenes [`dlinfo`](https://man.archlinux.org/man/dlinfo.3) is used to\nresolve the shared libraries.\nThe paths are then\n[inspected and converted into a relevant landlock rules code snippet](tools/landlockc)\nwhich is then included and applied in the main program.\n\nContinuing the same wrinkle into the [`hsh`](hsh) project\nwhich [executes a `bash`](https://man.archlinux.org/man/core/man-pages/fexecve.3.en)\nin the same security setting as the other script hosts programs\nproduce another complication.\nBeing a fully-fledged shell it desires to link quite a bit\nof dynamic libraries, which in turn desire even more of that shared binary\ngoodness.\n[`ldd /bin/bash`](https://man.archlinux.org/man/ldd.1) expose the extent of\ntheir desire:\n```\n     linux-vdso.so.1 (0x00007ffe3bed2000)\n     libreadline.so.8 =\u003e /usr/lib/libreadline.so.8 (0x00007f45e8bcc000)\n     libdl.so.2 =\u003e /usr/lib/libdl.so.2 (0x00007f45e8bc7000)\n     libc.so.6 =\u003e /usr/lib/libc.so.6 (0x00007f45e89e0000)\n     libncursesw.so.6 =\u003e /usr/lib/libncursesw.so.6 (0x00007f45e896c000)\n     /lib64/ld-linux-x86-64.so.2 =\u003e /usr/lib64/ld-linux-x86-64.so.2 (0x00007f45e8d44000)\n```\nThat indeed is a wrinkle to iron out given the\n[diversity of Linux distributions](https://en.wikipedia.org/wiki/Linux_distribution#/media/File:Linux_Distribution_Timeline_21_10_2021.svg).\nMy [`awk`-front-leds and `sed`-mid-legs twitch](https://en.wikipedia.org/wiki/The_Metamorphosis)\nbut there is a better approach using [`objdump`](https://man.archlinux.org/man/ldd.1#Security):\n```shell\nobjdump -p /path/to/program | grep NEEDED\n```\nwhich said insect has bundled into the [`poor_ldd`](tools/poor_ldd) utility\n(using the above mentioned [`dlinfo`](https://man.archlinux.org/man/dlinfo.3)\nbased [`lib`](tools/lib.c) utility).\nNow `poor_ldd /bin/bash` produce a similar output as `ldd`:\n```\n/lib64/ld-linux-x86-64.so.2\n/usr/lib/libc.so.6\n/usr/lib/libdl.so.2\n/usr/lib/libncursesw.so.6\n/usr/lib/libreadline.so.8\n```\nwhich then can be handed of to [landlockc](tools/landlockc) and grant a very\nlimited set of read-access rules.\n\nAlice's set attack vectors are now quite diminished, but we can do even better.\n\n### Enter [drop capabilities](https://man.archlinux.org/man/capabilities.7)\nI have included a [code snippet](build/capabilities.c) to drop capabilities.\nThis is a Linux feature I previously hadn't had the need to explore (so take\nthat code and what comes next with a grain of salt and always:\n\"trust, but verify\").\nThe classic selling point of capabilities is the scenario to allow unprivileged\nusers to run `ping`.\nIn a pre-capabilities world one would have to have to obtain the full\npower of the privileged user (`root`) in order to use `ping`.\nOf course `setsuid` reduces the mess of every user `su`:ing, but still\nprovides a nice potential attack vector on the `ping` binary.\nThe capabilities is basically the idea to split `root` into separate, well,\ncapabilities that can be granted independently.\n(`ping` requires the [`CAP_NET_RAW`](https://man.archlinux.org/man/capabilities.7.en#CAP_NET_RAW)\ncapability).\n\nIn this project this scenario isn't really applicable\n(since we start out as unprivileged users.)\nBut what may be applicable is the functionality to relinquish granted\ncapabilities from the current process.\nMaybe this sounds convoluted, but in our current Dockerized world I would say\nits fairly common to see images invoke executables in a privileged mode\n(i.e. [not setting another user](https://docs.docker.com/engine/reference/builder/#user)).\n\nAnd a noteworthy configuration option of Linux is that you don't have to include\nthe bothersome userland. Here I imagine a barebones server setup: the kernel,\na single stand-alone server executable (serving as the\n[init](https://man.archlinux.org/man/init.1) process) and nothing else.\nIn that setting dropping capabilities could be useful.\n\nBut even with these restrictions Alice can cause quite a bother:\n\n### Enter [rlimits](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en)\nNow Alice is restricted to using a pre-approved set of syscalls\nand restricted to a pre-approved set of file-system operations on an\nequally pre-approved subset of the file-system tree.\n\nHer last-ditch effort is to execute a\n[Denial-of-service attack](https://en.wikipedia.org/wiki/Denial-of-service_attack).\nI suggest Alice tries to `while(1)` allocate at least one page of memory\n(`getconf PAGESIZE`: 4096 bytes),\nwrite a single [pseudo-randomly generated](https://en.wikipedia.org/wiki/Xorshift)\nbyte to each allocation: forcing the kernel to\n[copy-on-write](https://en.wikipedia.org/wiki/Copy-on-write).\nThis will quickly exhaust all available memory, and any unfortunate Linux user\nwill attest to the ensuing misery.\n\nThe mitigation is to apply strict\n[rlimits](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en).\nIn this attack [`RLIMIT_AS`](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en#RLIMIT_AS)\nmight be the most efficient mitigation.\nThe common way of applying `rlimits` is by using the shell's\n[`ulimit`](https://man.archlinux.org/man/ulimit.1p) command.\n\nAlice then tries a [fork bomb](https://en.wikipedia.org/wiki/Fork_bomb).\nRejecting the `clone` syscall will of course mitigate such an attack, but for\ninstance: [`node`](hnode) is determined to spawn worker threads making such a\nmitigation ineffective.\nOnce more rlimits come to the rescue:\n[`RLIMIT_NPROC`](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en#RLIMIT_NPROC)\nrestricts the number of process a process can spawn\n(including threads of course).\n\nAlice, your next attack vector should be to exhaust any available block-devices\nby creating huge files with your pseduo-random generator.\nBut again rlimits provides the mitigation:\n[`RLIMIT_FSIZE`](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en#RLIMIT_FSIZE).\n\nThe pattern should be obvious: restrict all available `rlimits` to the minimum\nrequired to make the intended functionality succeed.\nThe [code-snippet used](https://github.com/rootmos/libr/blob/master/modules/rlimit.c)\nto restrict the `rlimits` zeroes any resource restriction not expressively\nrequired non-zero limit.\nCheck the `#define RLIMIT_DEFAULT_`:s at the top of [hlua](hlua/main.c),\n[hpython](hpython/hpython.c) and [hnode](hnode/main.cpp).\nAgain we encounter the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).\n\nFor instance, this approach guarantees that a file-descriptor-exhaustion\nattack is no longer viable:\n[`RLIMIT_NOFILE`](https://man.archlinux.org/man/core/man-pages/setrlimit.2.en#RLIMIT_NOFILE).\n\n### Conclusion\nSo given Alice's game you're itching to play regardless of her malicious intent:\ndo you now feel safe enough to evaluate her code?\n- We can enforce a list of allowed syscalls and their arguments using\n  [seccomp](#enter-berkeley-packet-filters)\n- We can impose an additional layer of access restriction upon the file\n  system hierarchy using [landlock](#enter-landlock)\n- We can enforce [strict resource usage limits](#enter-rlimits) on:\n  memory usage, file-descriptor and thread/processes allocation\n\nYou might feel safe enough: but what\n[surreal thing will she think of next](https://cs.stanford.edu/~knuth/sn.html)‽\n\n## Frequently unasked questions\n- \"No, but seriously, why not `sudo docker`?\"\u003cbr\u003e\n  Yes, and seriously, no `sudo`, but yes Docker. My opinion is that Docker is\n  great (for me `Docker := cgroups+overlayfs` packaged into a sleek product,\n  but that's fine), especially in professional CI/Kubernetes/what-have-you\n  settings. With this project I want to showcase a Linux way of sandboxing\n  applications *unprivileged*: hence no `sudo`, but yes Docker.\n  Also my guiding principle (other than \"trust, but verify\" that is);\n  [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege):\n  why require privileges to do something that can be achieved without?\n- \"Why no binary packages?\"\u003cbr\u003e\n  Because each embedded script host may have a different license, and I do not\n  want to spend the time to study each of them and mess up anyway.\n  Also my aim for this project to be an educational showcase and a sandbox to\n  let users experiment and get hands on experience with the Linux security\n  features in a non-toy setting: reducing the value of a pre-built binary\n  package distributed without the sources and tools.\n  The laziness argument coupled with this argument guides me to only offer\n  source packages: mostly as a guide for users who do not feel comfortable to\n  jump into the deep end with `git clone` and `make`.\n\n## Installation and build instructions\n\n### Building from sources\nThis project is intended to be built using `Makefile`:s and common C build\ntools, except for the `bpf_asm` tool (found in the\n[Linux kernel sources](https://github.com/torvalds/linux/tree/master/tools/bpf)).\nArch Linux users can use the [`bpf`](https://archlinux.org/packages/community/x86_64/bpf/)\npackage, but other distributions might have to build their own copies.\nI have prepared a [build script](tools/bpf/build) which is used when `bpf_asm`\nis not found (the script is used in the Ubuntu workflow job).\n\nThe steps to build the project is then:\n```shell\nmake tools\nmake build\nmake check\n```\n\nIf these steps fail because of missing dependencies you may consult the\nfollowing table (derived from the packages installed during the\n[Build and test](.github/workflows/build-test.yaml) workflow).\n\n| | runtime | build | check |\n|-|---------|-------|-------|\n|Ubuntu 24.04| [`libcap2`](https://packages.ubuntu.com/noble/libcap2) [`lua5.4`](https://packages.ubuntu.com/noble/lua5.4) [`python3`](https://packages.ubuntu.com/noble/python3) [`libnode109`](https://packages.ubuntu.com/noble/libnode109) [`bash`](https://packages.ubuntu.com/noble/bash) | [`make`](https://packages.ubuntu.com/noble/make) [`pkg-config`](https://packages.ubuntu.com/noble/pkg-config) [`gcc`](https://packages.ubuntu.com/noble/gcc) [`libcap-dev`](https://packages.ubuntu.com/noble/libcap-dev) [`wget`](https://packages.ubuntu.com/noble/wget) [`ca-certificates`](https://packages.ubuntu.com/noble/ca-certificates) [`bison`](https://packages.ubuntu.com/noble/bison) [`flex`](https://packages.ubuntu.com/noble/flex) [`liblua5.4-dev`](https://packages.ubuntu.com/noble/liblua5.4-dev) [`python3`](https://packages.ubuntu.com/noble/python3) [`libpython3-dev`](https://packages.ubuntu.com/noble/libpython3-dev) [`libnode-dev`](https://packages.ubuntu.com/noble/libnode-dev) |  |\n|Ubuntu 22.04| [`libcap2`](https://packages.ubuntu.com/jammy/libcap2) [`lua5.4`](https://packages.ubuntu.com/jammy/lua5.4) [`python3`](https://packages.ubuntu.com/jammy/python3) [`libnode72`](https://packages.ubuntu.com/jammy/libnode72) [`bash`](https://packages.ubuntu.com/jammy/bash) | [`make`](https://packages.ubuntu.com/jammy/make) [`pkg-config`](https://packages.ubuntu.com/jammy/pkg-config) [`gcc`](https://packages.ubuntu.com/jammy/gcc) [`libcap-dev`](https://packages.ubuntu.com/jammy/libcap-dev) [`wget`](https://packages.ubuntu.com/jammy/wget) [`ca-certificates`](https://packages.ubuntu.com/jammy/ca-certificates) [`bison`](https://packages.ubuntu.com/jammy/bison) [`flex`](https://packages.ubuntu.com/jammy/flex) [`liblua5.4-dev`](https://packages.ubuntu.com/jammy/liblua5.4-dev) [`python3`](https://packages.ubuntu.com/jammy/python3) [`libpython3-dev`](https://packages.ubuntu.com/jammy/libpython3-dev) [`libnode-dev`](https://packages.ubuntu.com/jammy/libnode-dev) | [`python3-toml`](https://packages.ubuntu.com/jammy/python3-toml) |\n|Arch Linux| [`lua`](https://archlinux.org/packages/extra/x86_64/lua/) [`python`](https://archlinux.org/packages/core/x86_64/python/) [`nodejs`](https://archlinux.org/packages/extra/x86_64/nodejs/) [`bash`](https://archlinux.org/packages/core/x86_64/bash/) | [`bpf`](https://archlinux.org/packages/extra/x86_64/bpf/) |  |\n\n### Building from a Ubuntu source package\nPick a release and download the Ubuntu source package asset.\nIncluded within are the sources and two helper scripts:\n- `build-package` that runs\n  [`dpkg-buildpackage`](https://manpages.ubuntu.com/manpages/jammy/en/man1/dpkg-buildpackage.1.html),\n  as well as checking for missing build-time dependencies\n- `install-package` installs the built package using `apt-get`, but note that\n  you can try out the built binaries without a system-wide installation\n\nThese scripts are intended to be running as an unprivileged user, but might\nneed `sudo` access to `apt-get` in order to install missing dependencies.\nBoth scripts accept an `-s` option for this case, or you can set the `SUDO`\nenvironment variable (e.g. `SUDO=\"sudo --askpass\"`).\n\n### Building from an Arch Linux PKGBUILD\nPick a release and download the Arch Linux\n[`PKGBUILD`](https://wiki.archlinux.org/title/PKGBUILD)\nasset, place it in a suitably empty directory and invoke\n[`makepkg`](https://wiki.archlinux.org/title/PKGBUILD),\npossibly with `--syncdeps` and/or `--install` options when desired.\nNote that you can try out the built binaries (found in the created `src`\nsubfolder) without a system-wide installation.\n\n## TODO\n- [ ] use [`strace` statistics](https://man.archlinux.org/man/strace.1#Statistics)\n  to sort seccomp filters with respect to number of calls\n- [ ] landlock ABI=2 (see the [sandbox example](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c#n233))\n- [ ] [readline](https://en.wikipedia.org/wiki/GNU_Readline) or naive\n  [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop)\n  with [rlwrap](https://github.com/hanslub42/rlwrap)\n- [ ] reference [OpenBSD's pledge(2)](https://man.openbsd.org/pledge.2)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootmos%2Fh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frootmos%2Fh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootmos%2Fh/lists"}