{"id":13732385,"url":"https://github.com/GaloisInc/reopt","last_synced_at":"2025-05-08T06:32:14.903Z","repository":{"id":33449979,"uuid":"37095372","full_name":"GaloisInc/reopt","owner":"GaloisInc","description":"A tool for analyzing x86-64 binaries.","archived":false,"fork":false,"pushed_at":"2024-05-22T18:36:09.000Z","size":18089,"stargazers_count":295,"open_issues_count":22,"forks_count":26,"subscribers_count":25,"default_branch":"main","last_synced_at":"2024-06-11T01:05:39.109Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"LLVM","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GaloisInc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"support/.gitignore","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-06-08T22:01:42.000Z","updated_at":"2024-07-25T23:21:40.725Z","dependencies_parsed_at":"2023-01-15T01:01:13.264Z","dependency_job_id":"1058b439-2688-45d5-b784-67f9be002db9","html_url":"https://github.com/GaloisInc/reopt","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GaloisInc%2Freopt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GaloisInc%2Freopt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GaloisInc%2Freopt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GaloisInc%2Freopt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GaloisInc","download_url":"https://codeload.github.com/GaloisInc/reopt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224708250,"owners_count":17356509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T02:01:55.129Z","updated_at":"2024-11-14T23:32:20.328Z","avatar_url":"https://github.com/GaloisInc.png","language":"LLVM","readme":"# reopt\n\nReopt is a general purpose decompilation and recompilation tool\nfor repurposing application logic.  It does this by analyzing machine\ncode to recover a more flexible program representation --\nspecifically the [LLVM assembly language](https://llvm.org/docs/LangRef.html).\nOnce in this format, one can then apply optimization tools to optimize the\nLLVM, recompile the application into optimized or security hardened\nobject code, and use Reopt to merge the recompiled code back into the\noriginal executable.\n\nReopt supports Linux x86_64 programs.  We are working towards a full\n1.0 release, but the current pre-release version supports the end-to-end\nrecompilation toolchain.\n\n## Getting Reopt\n\nAlthough Reopt can build on other POSIX systems such as OSX, we recommend\nbuilding Reopt to run on Linux.  Reopt currently only supports Elf binaries\nwhich are the default binary format for Linux.  It does not support OSX\nMacho binaries, and so it is easier to find applications\nto try Reopt on when running Linux.\n\n### Gitpod\n\nFor most people, the easiest way to try out Reopt is to\n[try it out on Gitpod](https://gitpod.io#https://github.com/GaloisInc/reopt/tree/try-reopt).\nThis requires an account on Gitpod, but gives you access to a VSCode IDE connected to a\nLinux container with Reopt pre-installed.\n\n### Github Releases\n\nIf you have Linux installed, you can download one of our recent releases from\nthe [Releases page](https://github.com/GaloisInc/reopt/releases).  We build\nreleases as static binaries on Centos 7, so they should work on a variety\nof distributions.\n\n### Docker\n\nIf you have Docker installed, you can install and run the Reopt pre-release\nDocker image by running:\n\n```\ndocker pull galoisbinaryanalysis/reopt\ndocker run --rm -it galoisbinaryanalysis/reopt\n```\n\n### Building from source\n\nBuilding Reopt requires that one has installed the GHC Haskell\ncompiler and supporting tooling.  We currently build on GHC 8.10.4.\nAn easy way to get GHC is to install [ghcup](https://www.haskell.org/ghcup/),\nand run `ghcup install ghc-8.10.4`.  We also maintain a\n[Docker image](https://hub.docker.com/r/galoisbinaryanalysis/reopt-dev)\nthat has GHC and other dependencies preinstalled for building Reopt.\n\nOnce GHC is installed, the following steps may be useful for building Reopt:\n\n```\ngit clone https://github.com/GaloisInc/reopt.git\n\ncd reopt\n# Fix submodule URLs (can skip if you have a Github account)\nsed -i 's/git@github.com:/https:\\/\\/github.com\\//' .gitmodules\ngit submodule update --init\n# Build Reopt\ncabal install exe:reopt\n# Build Reopt Explore\ncabal install exe:reopt-explore\n```\n\nReopt and Reopt Explore will be installed at `$HOME/.cabal/bin/reopt`\n`$HOME/.cabal/bin/reopt-explore`.\n\nReopt's verification condition generator (`reopt-vcg`) is included in the\naforementioned Github release and Docker image, however the source is currently\nmaintained in a [separate repository](https://github.com/galoisinc/reopt-vcg)\nwith it's own build instructions and requirements.\n\n## Using Reopt\n\nOnce `reopt` is installed on a Linux system and included in your path,\nyou can try running it on system utilities such as `ls`.  To do an\nend-to-end recompilation, you can run reopt with the command.\n\n```\n$ reopt -o ls.exe $(which ls)\n```\n\nThis execution will use the version of `ls` in your system path and produce\nan executable `ls.exe` in the current directory.  When running `reopt`\nwill print out messages as it discovers functions within the application\nand attempts to convert each discovered function into LLVM.\n\n## Inspecting intermediate state\n\nDuring recompilation, Reopt has to do a complex series of analysis steps\nto lift the machine code into LLVM.  Each of these analysis steps is\nincomplete and may fail either due to Reopt not recognizing features\nin the binary or an error in our prerelease version of Reopt.  As such,\ndo not be alarmed when Reopt fails to translate functions.\n\nIf you'd like to inspect Reopt's intermediate state, there are several\ncommand line flags to export intermediate results.  We describe the\nmain flags for exporting intermediate state below.\nAdditional options can be viewed by running `reopt --help`.\n\n * **Disassembly.**  `reopt --disassemble \u003cbinary\u003e` provides a raw\n   disassembler output view of the code in the binary.  This is similiar to\n   `objdump`'s disassembly output.\n\n * **Control flow graph construction.** `reopt --cfg \u003cbinary\u003e` displays the low\n   level control flow graphs that Reopt has constructed for each discovered\n   function within the binary.  This is a low-level IR that maintains\n   machine code's explicit stack and register references, but lifts the\n   machine code instructions into a more architectural neutral register\n   transfer language.\n\n * **Function Recovery** `reopt --export-fns \u003cpath\u003e \u003cbinary\u003e` writes the\n   functions that Reopt has generated after performing stack and function\n   argument analysis. This is a higher-level IR in which explicit references to\n   the stack have been replaced with allocations, and functions take arguments.\n\n * **LLVM Generation** `reopt --export-llvm \u003cpath\u003e \u003cbinary\u003e` generates\n   LLVM from the binary.  This is essentially a version of function\n   recovery rendered in LLVM's format.  Providing the\n   `--annotations \u003cann_file\u003e` flag during LLVM generation will\n   cause `reopt` to additionally emit JSON in `\u003cann_file\u003e` describing\n   verification conditions which (if valid) demonstrate functional equivalence\n   between the generated LLVM and machine code. Running `reopt-vcg\n   \u003cann_file\u003e` will simulate the executation of the LLVM and machine code,\n   block-by-block, leveraging an SMT solver (cvc4) to verify as many of\n   the conditions as possible.\n\n * **Object Files** `reopt --export-object \u003cpath\u003e \u003cbinary\u003e` generates an object\n   file from the LLVM generated in the previous state.\n   This is essentially the same as generating the LLVM, and then running\n   the LLVM compiler toolchain with the selected options.\n\n## Function arguments\n\nOne common reason Reopt fails is because it cannot figure out the arguments\nthat a function can take.  We have four mechanisms for obtaining function\narguments: (1) User provided hints; (2) a small builtin database; (3) debug\ninformation; and (4) a demand analysis that looks at what registers are used\nto infer arguments.  These mechanisms are listed in priority order, although\nwe note that the builtin database is currently the only mechanism for supporting\nfunctions that take a variable number of arguments like `printf`.\n\nIf you'd like to provide hints to Reopt, the recommended way is write a\nC header file with the arguments, such as:\n\n```\n// decls.h\n\n\ntypedef long ssize_t;\ntypedef unsigned long size_t;\n\nssize_t read(int fd, void* buf, size_t count);\nssize_t write(int fd, const void* buf, size_t count);\n```\n\nYou can then use this file to tell Reopt about the expected types for\n`read` and `write` via the `--header` flag, e.g.,\n\n```\nreopt -o ls.exe --header decls.h $(which ls)\n```\n\n## Using `OCCAM` for additional optimizations\n\n`reopt` can leverage the [OCCAM](https://github.com/SRI-CSL/OCCAM) whole-program\npartial evaluator for LLVM bitcode to further optimize binaries (assuming a user\nhas already installed and made available both `OCCAM` and its accompanying\ninterface `slash`).\n\nThis feature can be enabled by passing the `--occam-config=FILE` option to\n`reopt`, where `FILE` is the `reopt`/`OCCAM` manifest. The manifest should\nessentially a valid [OCCAM manifest\nfile](https://github.com/SRI-CSL/OCCAM/wiki/Manifest) (i.e., a file with JSON\nentries) with the following (optional) additional field:\n\n + `slash_options`: a list of command line option flags for OCCAM's `slash` tool,\n\nand *excluding* the following fields (`reopt` will populate these appropriately):\n\n  + `binary`\n  + `name`\n\nThe `main` field should specify the desired name of the bitcode file that will\nbe generated for `OCCAM` to process, and the OCCAM optimized result will share\nthe name with an added `.occam` suffix.\n\nN.B., when passing flags to customize `OCCAM`/`slash` behavior, be aware that\n`reopt` passes the `-c` and `-emit-llvm` flags via the\n`ldflags` manifest entry so `OCCAM` skips recompiling and acts only as an LLVM\nto LLVM translator.\n\n## Using Reopt Explore\n\nWith `reopt-explore` installed we can gather statistics regarding `reopt`'s ability\nto recover functions in an individual or collection of binaries.\n\nTo examine a single binary, simply call `reopt-explore` with the a path to the binary:\n\n```\n$ reopt-explore llvm $(which ls)\n...\n/usr/bin/ls\n  Initialization:\n    Code segment: 112,004 bytes\n    Initial entry points: 234\n    Warnings: 0\n  Discovery:\n    Bytes discovered: 59,502 (53%)\n    Succeeded: 216 (92%)\n    Failed: 18 (8%)\n      Unhandled instruction: 1 (0%)\n      Unidentified control flow: 17 (7%)\n  Argument Analysis:\n    Succeeded: 123 (57%)\n    Failed: 93 (43%)\n    Header Warnings: 0\n    DWARF Warnings: 0\n    Code Warnings: 112\n  Invariant Inference:\n    Succeeded: 92 (75%)\n    Failed: 31 (25%)\n      Indirect call target: 1 (1%)\n      Unresolved call target arguments: 30 (24%)\n  Recovery:\n    Succeeded: 81 (88%)\n    Failed: 11 (12%)\n      Unsupported function value: 8 (9%)\n      Unimplemented LLVM backend feature: 3 (3%)\n  LLVM generation status: Succeeded.\n```\n\nTo recursively search a directory for binaries and examine each,\ncall `reopt-explore` with the path to the directory to search:\n\n```\n$ reopt-explore llvm /usr/bin\n...\nreopt analyzed 394 binaries:\nGenerated LLVM bitcode for 394 out of 394 binaries.\nInitialization:\n  Code segment: 42,933,178 bytes\n  Initial entry points: 79776\n  Warnings: 0\nDiscovery:\n  Bytes discovered: 23,025,164 (54%)\n  Succeeded: 64,494 (81%)\n  Failed: 15,500 (19%)\n    Unhandled instruction: 425 (1%)\n    Unidentified control flow: 15,075 (19%)\nArgument Analysis:\n  Succeeded: 40,429 (63%)\n  Failed: 24,065 (37%)\n  Header Warnings: 0\n  DWARF Warnings: 0\n  Code Warnings: 38,681\nInvariant Inference:\n  Succeeded: 30,221 (75%)\n  Failed: 10,208 (25%)\n    Symbolic call stack height: 1 (0%)\n    Unresolved stack read: 13 (0%)\n    Indirect call target: 526 (1%)\n    Call target not function entry point: 41 (0%)\n    Unresolved call target arguments: 9,614 (24%)\n    Could not resolve varargs args: 13 (0%)\nRecovery:\n  Succeeded: 21,952 (73%)\n  Failed: 8,269 (27%)\n    Unsupported function value: 2,425 (8%)\n    Unimplemented feature: 6 (0%)\n    Unimplemented LLVM backend feature: 4,762 (16%)\n    Stack offset escape: 83 (0%)\n    Stack read overlapping offset: 1 (0%)\n    Unresolved return value: 8 (0%)\n    Missing variable value: 984 (3%)\n```\n\n## Improving recovery with debug information\n\n`reopt` and `reopt-explore` will try to determine if any debug information\nis available for dynamic dependencies by querying `gdb` (if it is installed).\n\nUsers can also manually specify dependency and debug directories to search in\nmanually for both `reopt` and `reopt-explore` via the following flags:\n\n```\n--lib-dir=PATH              Additional location to search for dynamic\n                            dependencies.\n--debug-dir=PATH            Additional location to search for dynamic\n                            dependencies' debug info.\n```\n\n## Contributing\n\nThe project has been contributed to by many authors over many years without much\ncoordination on code style and library usage.  For our currently-maintained\nsub-projects, we favor the following, without enforcing them aggressively (i.e.\nyou may still find instances where those are not used):\n\n- We use the `fourmolu` code formatter, informed by the `fourmolu.yaml` present\nin the root directory.\n\n- We are trying to move to `optparse-applicative` for CLI argument parsing\n(though there remain some instances of `cmdargs` for executables we are not\nactively maintaining).\n\n- We use `prettyprinter` for pretty-printing.\n\n- We tend to use labeled generic lenses for accessing complex data types in\nread/write.  We still have a mixed use of named- and symbol-based lens\noperators.\n\n- We do not yet have a very nice logging story.  Currently, the code outputs to\nstdout/stderr as it deems necessary, but we ought to use a more principled\nlogging discipline.\n","funding_links":[],"categories":["LLVM","Tools"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGaloisInc%2Freopt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGaloisInc%2Freopt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGaloisInc%2Freopt/lists"}