{"id":13797127,"url":"https://github.com/shellvm/shellvm","last_synced_at":"2025-04-06T00:09:48.798Z","repository":{"id":64012908,"uuid":"118095953","full_name":"SheLLVM/SheLLVM","owner":"SheLLVM","description":"A collection of LLVM transform and analysis passes to write shellcode in regular C","archived":false,"fork":false,"pushed_at":"2023-06-12T18:13:03.000Z","size":70,"stargazers_count":372,"open_issues_count":3,"forks_count":45,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-06T00:09:44.230Z","etag":null,"topics":["llvm","llvm-bitcode","llvm-ir","platform-independent","shellcode"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"ncsa","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SheLLVM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-01-19T08:03:41.000Z","updated_at":"2025-04-03T09:08:45.000Z","dependencies_parsed_at":"2024-01-07T06:23:24.469Z","dependency_job_id":"6acef7a5-3f57-4b50-bba2-9220c87bcc6e","html_url":"https://github.com/SheLLVM/SheLLVM","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SheLLVM%2FSheLLVM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SheLLVM%2FSheLLVM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SheLLVM%2FSheLLVM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SheLLVM%2FSheLLVM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SheLLVM","download_url":"https://codeload.github.com/SheLLVM/SheLLVM/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415971,"owners_count":20935387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llvm","llvm-bitcode","llvm-ir","platform-independent","shellcode"],"created_at":"2024-08-03T23:01:22.972Z","updated_at":"2025-04-06T00:09:48.781Z","avatar_url":"https://github.com/SheLLVM.png","language":"C++","funding_links":[],"categories":["\u003ca id=\"8c5a692b5d26527ef346687e047c5c21\"\u003e\u003c/a\u003e收集"],"sub_categories":[],"readme":"SheLLVM\n=======\n\nSheLLVM (pronounced either \"shell-ell-ell-vee-em\" or \"shell-vee-em\" but never\nwith a long e as in \"she\") is a collection of LLVM analysis and transform passes\nto help developers compile lightly- to moderately-complex C(++) programs as\nposition-independent \"load anywhere and jump to the beginning\" machine code.\n\nWhile this project started as a toolkit for writing shellcode in plain C, it\ncan really apply to any situation where a developer needs a program compiled in\na platform-independent and position-independent way.\n\nExample usage\n-------------\n\n```\n// main.c\n#define WIN32_LEAN_AND_MEAN\n#include \u003cwindows.h\u003e\n\nextern void say_hello();\n\n__attribute__((annotate(\"shellvm-main\")))\nvoid shellcode()\n{\n\twhile(1) {\n\t\tSleep(1000);\n\t\tsay_hello();\n\t}\n}\n```\n\n```\n// hello.c\n#define WIN32_LEAN_AND_MEAN\n#include \u003cwindows.h\u003e\n\nvoid say_hello()\n{\n\tMessageBox(NULL, \"Hello, SheLLVM world!\", \"Hello\", 0);\n}\n```\n\n```\nclang -target i686-w64-mingw32 -c -emit-llvm -o main.bc main.c\nclang -target i686-w64-mingw32 -c -emit-llvm -o hello.bc hello.c\nllvm-link -o linked.bc main.bc hello.bc shellvm-built/winnt-{user,kernel}32.bc\nclang -load=shellvm-built/shellvm.so -O3 -shellvm -o shellcode.elf linked.bc\nobjcopy -O binary --only-section=.text shellcode.elf shellcode.bin\nmsfvenom -p - -a i386 --platform win32 -e x86/shikata_ga_nai \u003c shellcode.bin \u003e shellcode_encoded.bin\n```\n\nFeatures\n--------\n\n- **Portable**: SheLLVM makes no assumption about architecture. While this is\n  most heavily tested and developed on x86/amd64, the passes involved run on\n  pure LLVM IR and should work anywhere the LLVM ecosystem works.\n- **Flexible**: While there are some guides to writing shellcode in C on the\n  'net already, these usually impose a lot of restrictions on how the C code\n  can be written; for example, by forbidding the use of string constants and\n  global variables. SheLLVM attempts to provide a more \"conventional\" C\n  environment while producing comparably self-contained code.\n- **Platform-independent**: SheLLVM-generated code tries to keep to itself as\n  much as possible. It does not rely on OS API calls or make assumptions about\n  the memory layout of the target system. It does not need to unpack itself or\n  allocate additional memory; all it requires is that it be loaded into a\n  readable/executable segment and have the processor's stack initialized when\n  execution begins. This makes SheLLVM suitable for use even in deeply embedded\n  circumstances.\n- **Compatible**: While this is primarily focused on C compatibility, there's\n  no reason it can't work with other languages which use LLVM in the code\n  generation pipeline, such as Swift or Rust. Patches for those languages very\n  welcome!\n\nLimitations\n-----------\n\nBefore we talk about the limitations of SheLLVM, let's first talk about the\nlimitations of shellcode.\n\n1. **It must be position-independent.** While there are some circumstances\n   where shellcode might land at a fixed address in an injected program,\n   most of the time there's no guarantee whatsoever. Shellcode must rely on\n   relative addressing only.\n2. **It must fit in a single segment.** Shellcode does not get the luxury of\n   having a .text, .data, and .rdata segment. A single contiguous block must be\n   loaded somewhere and the program must operate entirely in this environment.\n3. **It does not get the benefit of the OS loader.** While this is the cause\n   for #1 and #2 above, it also means relying on OS dynamic libraries is\n   forbidden.\n4. **It's usually loaded into a non-writeable segment.** Back in the 90s,\n   before the importance of [W^X](https://en.wikipedia.org/wiki/W%5EX) was\n   widely understood, shellcode could reasonably expect to land in RWX\n   (readable, writeable, and executable) memory. Nowadays, shellcode may have\n   to execute in R-X memory (and sometimes even --X memory).\n5. **The only memory sure to be writeable is the stack.** This is a blessing\n   and a curse. On one hand, the stack is always there and most architectures\n   have a register pointing to it and everything. On the other hand, the stack\n   can overflow if not used judiciously.\n\nSo, let's look at how these translate into limitations when using SheLLVM:\n\n1. **All code must be compiled to LLVM bitcode.** These are LLVM passes. They\n   can only work on LLVM IR. The code must be presented to SheLLVM as a single\n   LLVM .bc file, which means heavy use of `llvm-link` and `clang -emit-llvm`.\n2. **No linker.** Or, more precisely, no _object code_ linker. Linking\n   functionality is still provided by LLVM, but this means all linking must be\n   done with LLVM modules and not object files. Importantly, this means...\n3. **No libraries.** Sorry, no `myfavoritelib.lib` or `libcoolthing.a`. If you\n   want to use a third-party library, you're going to have to compile it as\n   LLVM bitcode yourself and link it in with `llvm-link`. Most notably,\n   however, this means you cannot rely on the Win32 API or C standard library.\n   SheLLVM provides _loader stubs_ for dynamic libraries on certain platforms,\n   which provide the standard platform API without needing to write\n   symbol-hunting code by hand.\n4. **You must have a single main function.** No linker/loader means no\n   symbols. You can't export a collection of functions. The only function\n   callable from the outside world is your main function.\n5. **Your main function must have the longest lifetime.** Because the only\n   writeable memory is the stack (as explained above), the main function ends\n   up taking the responsibility for allocating all memory in its stack frame\n   and freeing it when done. Similarly, the main function handles all\n   constructors and destructors. As a consequence, the main function must be\n   the first thing to run and the last thing to exit. (This is only really an\n   issue for programs using threading and callbacks.)\n6. **Your code has amnesia.** Since all state resides on the stack, this means\n   your globals (and, by extension, static variables) have a lifetime only as\n   long as your main function is running. This means that, while your globals\n   (and static variables) will behave normally during a given run of your code,\n   they will be reset to their default values during a subsequent run of your\n   main function.\n\nThere are further limitations depending on which SheLLVM style you use.\n\nSheLLVM Style 1 (\"Megafunction Style\")\n======================================\n\nIn this style, SheLLVM functions as, essentially, a hyperaggressive inliner. It\nattempts to reduce your entire program down to a single function, with no\ncallstack and no constants or globals on the heap.\n\nPros\n----\n\n- Works in --X memory.\n- Compatible with most (all?) LLVM code generators.\n- Highest code density.\n- Does not use the stack for calls at all.\n\nCons\n----\n\n- All functions must be inlinable. In particular, this means no recursion is\n  allowed (you **may** be able to get away with tail-call self-recursion, as\n  LLVM's optimizer is pretty good at turning those into ordinary loops) and,\n  perhaps more importantly, you must never take the address of any of your\n  functions. **This means no callbacks or threads.**\n- Everything resides on the stack, even large const arrays. This can be a\n  problem for large programs, since read-only data is written to the stack\n  instruction-by-instruction instead of being loaded into memory as-is.\n\n~~SheLLVM Style 2 (\"Concatenated Style\")~~\n==========================================\n\nNote: This has not been implemented yet.\n\nIn this style, SheLLVM does not attempt to inline functions or restrict itself\nto a single frame on the call stack. Instead, all global variables are placed\nin a massive struct which lives in the stack frame of the main function. Every\nother function's parameters list is modified to accept a pointer to this\nstruct, in order to provide global variable support.\n\nFunctions used as external callbacks must be annotated so as not to receive this\nmodification. In this case, it's the programmer's responsibility to restore the\npointer to the globals object (via `__shellvm_save()` and `__shellvm_restore()`\nintrinsics).\n\nMultiple functions (and constant heap variables) are concatenated together in\nthe output binary, and a small assembler stub at the entry point deduces from\nthe instruction pointer where it has been loaded in memory and computes, from\nan offset table, the addresses of each function and constant in the program.\n\n\nPros\n----\n\n- Supports recursion and taking the addresses of functions.\n- Supports threading and callbacks, when done carefully.\n- Requires much less stack, due to constants being interspersed in the\n  program data instead.\n\nCons\n----\n\n- Only works when the code segment is readable (R-X).\n- Requires special handling at the assembly level.\n- Much more complex.\n- Callbacks from outside of LLVM code (e.g. due to spawning a thread) require\n  the use of special SheLLVM intrinsics to save/restore the globals object.\n\n\nPasses in SheLLVM\n=================\n\nThese are the passes implemented in SheLLVM:\n\n`-shellvm-prepare`\n------------------\n\nThis pass makes sure exactly one function is marked as the main function (via\nthe `__attribute__((annotate(\"shellvm-main\")))` annotation). It removes this\nannotation, replaces it with an LLVM _attribute_, and marks all other functions\nin the module as private.\n\n`-shellvm-precheck`\n-------------------\n\nThis just checks that all functions in the module are marked `norecurse`,\nand that all functions are `unnamed_addr` (except for the main function, which\nmust be `local_unnamed_addr`).\n\n`-mergecalls`\n-------------\n\nThis merges call instructions which target the same function into the same\nbasic block, using a `switch` statement on the other end to branch back to\nwhere the call left off.\n\nBecause this could be useful outside of SheLLVM, it does not have the\n`-shellvm-` prefix.\n\n`-shellvm-flatten`\n------------------\n\nThis uses `-mergecalls` on the main function repeatedly, inlining each merged\ncallsite each time. It's responsible for taking a full function call graph and\nflattening it down to a single function.\n\n`-shellvm-global2stack`\n-----------------------\n\nThis inlines all global variables (constant or not) which are used by only one\nfunction into the stack of said function. Note that this can/will heavily break\nnon-SheLLVM programs if not used with care.\n\n`-shellvm-inlinectors`\n----------------------\n\nThis inlines LLVM ctors/dtors into the SheLLVM main function.\n\n`-shellvm-postcheck`\n--------------------\n\nThis ensures that the resultant module contains no globals, only one function,\nno switch statements, etc.\n\nIt generally makes sure that the LLVM module is ready for code generation and\nwill behave as proper shellcode when lowered into machine instructions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshellvm%2Fshellvm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshellvm%2Fshellvm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshellvm%2Fshellvm/lists"}