{"id":20275618,"url":"https://github.com/softsec-kaist/reassessor","last_synced_at":"2025-04-11T05:24:23.539Z","repository":{"id":63938220,"uuid":"537323886","full_name":"SoftSec-KAIST/Reassessor","owner":"SoftSec-KAIST","description":"Reassembly is Hard: A Reflection on Challenges and Strategies (USENIX Security '23)","archived":false,"fork":false,"pushed_at":"2025-01-27T13:49:31.000Z","size":383,"stargazers_count":32,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-25T03:33:41.399Z","etag":null,"topics":["binary-analysis","reassembler","recompile","software-testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SoftSec-KAIST.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-16T05:55:02.000Z","updated_at":"2025-02-27T12:34:01.000Z","dependencies_parsed_at":"2024-05-19T14:27:39.725Z","dependency_job_id":"4d492b76-1cb0-4c3b-b418-6d8bcaafa20b","html_url":"https://github.com/SoftSec-KAIST/Reassessor","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftSec-KAIST%2FReassessor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftSec-KAIST%2FReassessor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftSec-KAIST%2FReassessor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SoftSec-KAIST%2FReassessor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SoftSec-KAIST","download_url":"https://codeload.github.com/SoftSec-KAIST/Reassessor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248346646,"owners_count":21088493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-analysis","reassembler","recompile","software-testing"],"created_at":"2024-11-14T13:10:13.800Z","updated_at":"2025-04-11T05:24:23.518Z","avatar_url":"https://github.com/SoftSec-KAIST.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Reassessor\n========\n\n[Reassessor](https://github.com/SoftSec-KAIST/Reassessor) is an automated tool\nto search symbolization errors from reassembler-generated assembly files. At a\nhigh level, `Reassessor` searches errors by diffing the compiler\ngenerated-assembly file and reassembly file. The details of the algorithm in\nour paper \"Reassembly is Hard: A Reflection on Challenges and Strategies\" will\nappear in USENIX Security 2023.\n\n# Install\n\n`Reassessor` currently works on only Linux machine and we tested on Ubuntu\n18.04 and Ubuntu 20.04.\n\n### 1. Clone Reassessor\n\n```\n$ git clone https://github.com/SoftSec-KAIST/Reassessor\n$ cd Reassessor\n```\n\n### 2. Install Dependencies\n\n`Reassessor` is written in python 3 (3.6), and it depends on\n[pyelftools](https://github.com/eliben/pyelftools.git) (\u003e= 0.29) and\n[captone](https://pypi.org/project/capstone/) (\u003e=4.0.2).\n\nTo install the dependencies, please run:\n\n```\n$ pip3 install -r requirements.txt\n```\n\n### 3. Install Reassessor\n\n```\n$ python3 setup.py install --user\n```\n\n# Usage\n\n### Perform a Preprocessing Step\n\nThere is a preprocessing step that needs to be performed before operating\n`Reassessor` to produce a compiler-generated assembly file, a non-stripped\nbinary file, and a reassembler-generated assembly file.\n\n\nYou can download our benchmark binary files and compiler-generated assembly\nfiles at\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7178116.svg)](https://doi.org/10.5281/zenodo.7178116).\n\n\u003e **Note**\n\u003e If you want to make your own binary set, you should build binaries with\n\u003e `--save-temps=obj` option to force the compilers to preserve all the\n\u003e intermediate files including assembly files generated during a compilation\n\u003e process. Also, you should enable the `-g` option to produce binaries with\n\u003e debugging information. Lastly, `-Wl,--emit-relocs` linker option is required\n\u003e especially when you build non-PIE (Position-dependent Executable) binaries.\n\u003e The linker option preserves relocation information.\n\nNext, you can get reassembler-generated assembly files by running\n`preprocessing` module.\n\n\u003e **Note**\n\u003e Docker needs to be installed on the same machine to run reassemblers within\n\u003e a Docker container. Our scripts assume that you can run Docker commands as a\n\u003e regular (unprivileged) user; thus, no need to run them with sudo.\n\n```\n$ python3 -m reassessor.preprocessing \u003cbinary_path\u003e \u003coutput_dir\u003e\n```\n\nDuring the preprocessing step, `STRIP` module strips off debug symbols from the\nbinary to get a stripped binary. `Ddisasm` and `Ramblr` take the stripped\nbinary as an input binary. However, the stripping process is omitted for\n`RetroWrite` since it requires debugging information to reassemble binaries.\n\nThe module produces the reassembly files under the `\u003coutput_dir\u003e/reassem`.\n```\n$ ls \u003coutput_dir\u003e/reassem\nddisasm.s  retrowrite.s\n```\n\nNote that each reassembly tool supports different sets of binaries: `Ramblr` only\nworks with non-PIE binaries and `RetroWrite` only works with x86-64 PIE binaries.\nThus, `preprocessing` module will generate a different set of reassembly files\ndepending on binary files.\n\n\u003e **Note**\n\u003e The `preprocessing` module runs the-state-of-art reassemblers, Ramblr (commit\n\u003e 64d1049, Apr. 2022), RetroWrite (commit 613562, Apr. 2022), and Ddisasm\n\u003e v1.5.3 (docker image digests: a803c9, Apr. 2022), in a dockerized\n\u003e environment, to produce reassembly files.  If you want to run `Reassessor`\n\u003e with a new reassembler, you should update the execution commands in\n\u003e reassemble() method in\n\u003e [preprocessing.py](https://github.com/SoftSec-KAIST/Reassessor/blob/main/reassessor/preprocessing.py)\n\u003e file\n\n\n### Run Reassessor\n\n`Reassessor` takes in a compiler-generated assembly file and a\nreassembler-generated assembly file, and transforms assembly expressions into a\ncanonical form to ease the comparison. Then, `Reassessor` searches errors by\ncomparing the normalized assembly code.\n\nTo search reassembly errors, you should run `reassessor` module as follows:\n\n```\n$ python3 -m reassessor.reassessor \u003cbinary_path\u003e \u003cassembly_directory\u003e \u003coutput_directory\u003e \\\n  [--ramblr RAMBLR] [--retrowrite RETROWRITE] [--ddisasm DDISASM]\n```\n\nThe `reassessor` module requires `\u003cbinary_path\u003e`  and `\u003cassembly_directory\u003e` to\nnormalize compiler-generated assembly files. Also, it requires `reassembly\nfile` to normalize the target reassembly file; you can specify the location\nof `reassembly file` by using `--ramblr`, `--retrowrite`, and `--ddisasm`\noptions. Then, `reassessor` module compares the normalized code and produces\nreport files on `\u003coutput_directory\u003e`.\n\n```\n$ python3 -m reassessor.reassessor \u003cbinary_path\u003e \u003cassembly_directory\u003e \u003coutput_directory\u003e \\\n  --ddisasm \u003creassembly_file_path\u003e\n$ ls \u003coutput_directory\u003e/norm_db\ngt.db  ddisasm.db\n$ ls \u003coutput_directory\u003e/errors/ddisasm\ndisasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json\n```\n\nThe `reassessor` module generates normalized assembly files under\n`\u003coutput_directory\u003e/norm_db` folder, and then it takes the two\nnormalized files to find the differences between them.\nConsequently, the `reassessor` module produces the following\nfiles as output: `ddisasm_diff.txt`, `sym_errors.dat`, `sym_diff.txt`,\n`sym_errors.json`. Firstly, `disasm_diff.txt` contains a list of disassembly\nerrors (one per line); each line contains the relevant address,\nreassembler-generated assembly line, and compiler-generated assembly line.\n`sym_errors.dat` is a raw output file containing a list of symbolization\nerrors. This file is used to generate other two files: `sym_errors.json` and\n`sym_diff.txt`. `sym_diff.txt` is a human-readable representation of\n`sym_errors.dat`. Each line of the file contains address, error type,\nreassembler-generated assembly code, and compiler-generated code, for each\nerror found. Finally, `sym_errors.json` contains detailed information about\neach symbolization error found, including the relevant assembly file, line\nnumber, relocatable expression type, normalized code, repairability, and so on.\nThe file is written in JSON format.\n\n### Docker\n\nYou can use a `Docker` image to try out `Reassessor` quickly.\n\nThe following command will build the docker image name `Reassessor` using our\n[Dockerfile](https://github.com/SoftSec-KAIST/Reassessor/blob/main/Dockerfile).\n```\n$ docker build --tag reassessor .\n```\n\nNow, you can run `Reassessor` within a `Docker` container.\n```\n$ docker run --rm reassessor sh -c \"/Reassessor/reassessor.py \u003cbinary_path\u003e \u003cassembly_directory\u003e \\\n  \u003coutput_directory\u003e [--ramblr RAMBLR] [--retrowrite RETROWRITE] [--ddisasm DDISASM]\n```\n\n# Example\n\nYou can test `Reassessor` with our sample program.\n\n### 1. Build a source code\n```\n$ cd examples\n$ make\n$ cd ..\n```\n\n### 2. Reassemble the example program\n```\n$ mkdir output\n$ python3 -m reassessor.preprocessing ./example/bin/hello ./output\n$ ls ./output/reassem\nddisasm.s  retrowrite.s\n```\n\n### 3. Run Reassessor\n```\n$ python3 -m reassessor.reassessor ./example/bin/hello ./example/asm ./output  \\\n  --retrowrite ./output/reassem/retrowrite.s\n$ ls ./output/norm_db\ngt.db  retrowrite.db\n$ ls ./output/errors/retrowrite\ndisasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json\n```\n\nAlso, you can run `Reassessor` within a `Docker` container.\n```\n$ docker run --rm -v $(pwd):/input reassessor sh -c \"python3 -m reassessor.reassessor \\\n  /input/example/bin/hello /input/example/asm/ /input/output \\\n  --retrowrite /input/output/reassem/retrowrite.s\"\n```\n\n\n### 4. Check Error Report\n```\n$ ls ./output/errors/retrowrite/\ndisasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json\n$ cat ./output/errors/retrowrite/sym_diff.txt\n# Instrs to check: 48\n# Data to check: 14\nRelocatable Expression Type 4 [FP: 3(0) / FN: 0]\nE4FP [0] (Disp:3:0) 0x1196  : movl .LC2024(%rip), %eax                  | movl bar+4(%rip), %eax\nE4FP [0] (Disp:3:0) 0x11a7  : movl .LC2028(%rip), %eax                  | movl bar+8(%rip), %eax\nE4FP [0] (Disp:3:0) 0x11b8  : movl .LC202c(%rip), %eax                  | movl bar+12(%rip), %eax\n```\n\n# Dataset\nWe publicize our benchmark at\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7178116.svg)](https://doi.org/10.5281/zenodo.7178116).\n(The dataset does not contain SPEC CPU 2006 binaries because of a licensing\nissue.)\n\n\n# Artifacts\n\nWe also provide the artifact to reproduce the experiments in our paper.\nPlease check\n[Reassessor/artifacts/](https://github.com/SoftSec-KAIST/Reassessor/tree/v1.0.0/artifact) folder.\n\n# Contributions of our works\n\n`Reassessor` found plentiful symbolization errors from stat-of-art\nreassemblers. Also, we discovered unseen reassembly errors. We made PR and\nissues to resolve the errors.\n\n- Ramblr\n    - [issue 3549](https://github.com/angr/angr/issues/3549) (1 Oct 2022)\n    - [issue 39](https://github.com/angr/patcherex/issues/39) (21 Jan 2022)\n\n- RetroWrite\n    - [PR](https://github.com/HexHive/retrowrite/pull/36) (26 May 2022)\n    - [issue 45](https://github.com/HexHive/retrowrite/issues/45) (1 Oct 2022)\n    - [issue 38](https://github.com/HexHive/retrowrite/issues/38) (6 Jun 2022)\n    - [issue 35](https://github.com/HexHive/retrowrite/issues/35) (9 May 2022)\n    - [issue 29](https://github.com/HexHive/retrowrite/issues/29) (13 Oct 2021)\n\n- Ddisasm\n    - [issue 54](https://github.com/GrammaTech/ddisasm/issues/54) (1 Oct 2022)\n    - [issue 41](https://github.com/GrammaTech/ddisasm/issues/41) (25 Jan 2022)\n\n\n### Authors\n\nThis research project has been conducted by\n[SoftSec Lab](https://softsec.kais.ac.kr) at KAIST and UT Dallas.\n- Hyungseok Kim (KAIST)\n- [Soomin Kim (KAIST)](https://softsec.kaist.ac.kr/~soomink/)\n- Junoh Lee (KAIST)\n- [Kangkook Jee (UT Dallas)](https://kangkookjee.io)\n- [Sang Kil Cha (KAIST)](https://softsec.kaist.ac.kr/~sangkilc/)\n\n### Citation\n\n(TBD)\n\n# License\n\nSee the [LICENSE](LICENSE.md) file for license rights and limitations (MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftsec-kaist%2Freassessor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoftsec-kaist%2Freassessor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoftsec-kaist%2Freassessor/lists"}