{"id":18189863,"url":"https://github.com/bmedicke/reed","last_synced_at":"2025-10-09T13:33:06.823Z","repository":{"id":141461086,"uuid":"300668178","full_name":"bmedicke/REED","owner":"bmedicke","description":"notes about 🔍 Reverse Engineering and 🔥 Exploit Development","archived":false,"fork":false,"pushed_at":"2022-03-13T13:13:44.000Z","size":16775,"stargazers_count":6,"open_issues_count":13,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-09T13:33:05.004Z","etag":null,"topics":["debugging","exploit-development","reverse-engineering","security"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bmedicke.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-02T16:05:05.000Z","updated_at":"2025-05-12T10:26:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa954183-ed0e-4464-ab80-5363e1a2f7ac","html_url":"https://github.com/bmedicke/REED","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bmedicke/REED","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bmedicke%2FREED","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bmedicke%2FREED/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bmedicke%2FREED/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bmedicke%2FREED/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bmedicke","download_url":"https://codeload.github.com/bmedicke/REED/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bmedicke%2FREED/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001480,"owners_count":26083102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["debugging","exploit-development","reverse-engineering","security"],"created_at":"2024-11-03T04:04:20.316Z","updated_at":"2025-10-09T13:33:06.817Z","avatar_url":"https://github.com/bmedicke.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RE🌾ED\n\n**notes about reverse engineering and exploit development**\n\n![radare2 visual mode](media/r2-visual-mode-hex.jpg)\n\n---\n\nREED | [PTEH](https://github.com/bmedicke/PTEH)\n\n---\n\nThis readme covers the absolute basics while more technical topics (linked throughout) are covered in seperate files.\nIt's a good idea to read and understand this page entirely before delving into the specifics.\n\nA couple of conventions to make orientation a bit easier:\n\n* links in square brackets refer to one of the sources at the botton, e.g. [[pracbin, p. 27ff]](#sources-and-further-reading)\n* links that end with an arrow are in-repo-links, e.g. [exploit-development ↣](exploit-development)\n\n# toc\n\n\u003c!-- vim-markdown-toc GFM --\u003e\n\n* [computer architectures](#computer-architectures)\n  * [Von Neumann](#von-neumann)\n  * [Harvard](#harvard)\n  * [modern processors](#modern-processors)\n    * [consequences](#consequences)\n* [registers](#registers)\n  * [the instruction pointer](#the-instruction-pointer)\n* [memory](#memory)\n  * [Bit and Byte order](#bit-and-byte-order)\n  * [virtual address space](#virtual-address-space)\n  * [memory layout](#memory-layout)\n  * [the stack](#the-stack)\n  * [the stack at \u003cmain+0\u003e](#the-stack-at-main0)\n  * [stack frames](#stack-frames)\n* [programming paradigms](#programming-paradigms)\n  * [imperative](#imperative)\n    * [procedural](#procedural)\n    * [object oriented](#object-oriented)\n  * [declerative](#declerative)\n  * [stack-based](#stack-based)\n* [stages of compilation](#stages-of-compilation)\n  * [typing](#typing)\n  * [preprocessor](#preprocessor)\n  * [compiler](#compiler)\n  * [assembler](#assembler)\n  * [linker](#linker)\n* [instruction set architectures](#instruction-set-architectures)\n  * [types](#types)\n    * [CISC](#cisc)\n    * [RISC](#risc)\n  * [instruction sets](#instruction-sets)\n    * [x86](#x86)\n      * [AT\u0026T and Intel syntax](#att-and-intel-syntax)\n      * [x86_32](#x86_32)\n      * [x86_64](#x86_64)\n  * [ARM](#arm)\n* [executable file formats](#executable-file-formats)\n  * [ELF](#elf)\n  * [Mach-O](#mach-o)\n  * [PE](#pe)\n* [binary analysis](#binary-analysis)\n  * [static](#static)\n  * [dynamic](#dynamic)\n* [tools](#tools)\n  * [GDB](#gdb)\n  * [LLVM](#llvm)\n  * [GCC](#gcc)\n  * [NASM](#nasm)\n  * [Radare2](#radare2)\n  * [Ghidra](#ghidra)\n  * [binutils](#binutils)\n  * [binwalk](#binwalk)\n  * [BinDiff](#bindiff)\n  * [dd](#dd)\n  * [xxd](#xxd)\n  * [ldd](#ldd)\n  * [pwntools](#pwntools)\n  * [ltrace and strace](#ltrace-and-strace)\n  * [pmap](#pmap)\n  * [Python 3](#python-3)\n  * [Compiler Explorer](#compiler-explorer)\n  * [binvis.io](#binvisio)\n* [dotfiles](#dotfiles)\n* [exploit-development](#exploit-development)\n* [reverse-engineering](#reverse-engineering)\n* [sources and further reading](#sources-and-further-reading)\n\n\u003c!-- vim-markdown-toc --\u003e\n\n# computer architectures\n\nThe following two subsections show the minimal, archetypal representation of both architectures.\nCaches (and for that matter cache levels) are ignored.\n\nTODO source this section\n\n\n## Von Neumann\n\nSee [[edvac]](#sources-and-further-reading).\n\nSomewhat controversially named solely after John von Neumann this is still\nthe base for most common CPU designs.\n\n\u003e architecture-von-neumann.jpg\n\n\u003cimg src=\"media/architecture-von-neumann.jpg\" width=500px\u003e\u003c/img\u003e\n\n* *note the following:*\n  * **data and instructions are stored in the same memory**\n  * data and instructions are transported over the same bus\n\n## Harvard\n\n\u003e architecture-harvard.jpg\n\n\u003cimg src=\"media/architecture-harvard.jpg\" width=500px\u003e\u003c/img\u003e\n\n* *note the following:*\n  * data memory and instruction memory are separated\n  * they travel over separate buses\n\n## modern processors\n\nMost modern CPUs use the same memory for data and instructions, like\nin the Von Neumann architecture.\nTo avoid the Von Neumann bottleneck multiple levels of caches are\nadded between CPU and memory (L1-Ln cache) and a seperate cache\nfor instructions and data is used in the lowest level (similar to\nthe Harvard architecture).\n\nThe resulting architecture is called Modified Harvard or\nto be more specific split-cache/**almost Von Neumann** architecture.\n\n*As such modern CPUs are situated somewhere between a pure Von Neumann and a\npure Harvard architecture.\u003cbr\u003e\nThe important takeaway here is that data and instructions are stored\nin the same place!*\n\n### consequences\n\nThe - by the Von Neumann architecture - historically caused consolidation of data\n(that can be freely written by the process)\nand instructions (that are executed by the CPU)\nin the same memory has led to countless memory corruption exploits and in\nturn a myriad of countermeasures.\n\n# registers\n\nAs seen in the drawings of the Von Neumann and Harvard architectures, registers are part of the CPU.\nThey store a small amount of data and instructions that the CPU can immediately operate on.\nAs such they are much faster than any other form of storage (HDDs, SSDs, RAM, Cache).\n\nSome registers might be read only or have a specific hardware function.\n\n## the instruction pointer\n\nOne important special purpose register is the one that stores the instruction pointer or program counter.\nThe instruction pointer always points at the instruction that the CPU will execute next. As an internal register\nit is not usually directly read or writeable. (Some architectures such as 32-bit ARM do expose it.)\n\n# memory\n\nMemory refers to fast, non persistant storage. Nowadays, (main) memory is usually RAM.\n\n## Bit and Byte order\n\nSee [[inteldev, vol. 1, ch. 1, p. 5f]](#sources-and-further-reading).\n\nThere are several ways to store multi-Byte data in memory. Today, the two most common are:\n\n* Little Endian (used by x86_32, x86_64)\n* Big Endian (AVR32, z/Architecture)\n\nARM, IA-64 and PowerPC support both for data and instructions.\n\nEndiannes only comes into play when storing multi-Byte data.\nBig endian stores the most significant Byte first.\nLittle endian stores the least significan Byte first.\n\nThe big/little refer to the end that is stored first.\n\n\u003e datastructure-byte-order.jpg\n\n\u003cimg src=\"media/byte-order.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * higher addresses are towards the top\n  * higher addresses are towards the left\n  * each Byte has its own address\n  * the first Byte (Byte 0) is stored first (it gets the lowest address)\n\n---\n\nBits can be stored in two ways:\n\n* most significant bit first\n* least significant bit first\n\nSince the smallest addressable unit is one Byte it usually does not matter.\n(Except when working with bitfields or serializing data Bit by Bit via SPI/I2C.)\nThat said Bit order usually follows the Byte order of the system.\n\n\u003e bit-order.jpg\n\n\u003cimg src=\"media/bit-order.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * both Byte represent the same number (`13` in base 10)\n  * **the drawing shows the order the bits would be stored in (starting left)**\n    * the leftmost bit gets the lowest address\n\n## virtual address space\n\nSee [[pracbin, p. 27ff]](#sources-and-further-reading).\n\nModern operating systems create a virtual address space for each process.\nBehind the scenes the OS maps the virtual address space to real memory (or disk space).\nThis is not visible to the process itself.\n\nAfter setting up the VAS the interpreter is mapped into the newly created userspace.\u003cbr\u003e\nThe interpreter's job is it to perform the required relocations and start our program\n(by jumping to its entry point).\n\n## memory layout\n\nSee [[compsec, p. 350]](#sources-and-further-reading).\n\nThis section assumes an x86_32 Linux with a 1GiB/3GiB Kernel/userspace split.\n\n\u003e process-memory-layout.jpg\n\n\u003cimg src=\"media/process-memory-layout.jpg\" width=500px\u003e\u003c/img\u003e\n\n* *note the following:*\n  * high memory addresses are at the top (Intel notational convention, [[inteldev, vol. 1, ch. 1, p. 5f]](#sources-and-further-reading))\n  * the stack and heap both grow into free memory\n  * the heap grows up\n  * **the stack grows down!**\n    * the more data is on the stack, the lower the last address\n  * **the size of `argc`, `argv`, and the environment influence the offset for addresses on the stack!**\n    * potential source of confusion when debugging or developing exploits\n    \u003cbr\u003e(environment change results in address offset change)\n  * `0xFFFFFFFF` to `0xC0000000` is the kernelspace\n  * `0xBFFFFFFF` to `0x00000000` is the userspace\n  * **CPU instructions (`.text`, shared libs and the interpreter)\u003cbr\u003eand data (all the other sections) reside in the same memory**\n\n\n## the stack\n\nA stacks is a dynamic LIFO (last in, first out) data structure.\nIt can be primarily interacted with in two ways:\n\n* **push**: add one item to the end\n* **pop**: remove the last item from the end (and put it somewhere)\n\n\"The\" stack often refers to a specific instance of that data structure also known as the hardware stack.\u003cbr\u003e\nDepending on the language used, its function is to provide a temporary space\nfor local variables in the current scope, passing parameters between calls,\nstoring return addresses and more.\n\n**Hardware stacks typically grow down which is counterintuitive to the name.**\n\nYou can think of it as a stack of magnets on the ceiling:\n\n\u003e the-stack-push-pop.jpg\n\n\u003cimg src=\"media/the-stack-push-pop.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * at the start the stack is empty (1.)\n  * after we have pushed `variable a` the stack contains a single item (2.)\n  * after we have pushed `variable b` the stack contain two items (3.)\n    * at this point we have lost direct access to `variable a` (at least to pop it)\n    * to pop `variable a` we first have to pop `variable b`\n  * the first pop returns `variable b` to us and removes it from the stack (4.)\n  * the second pop returns `variable a`, the stack is now empty again (5.)\n  * **we don't get to chose which item to pop, it's always the last one added!**\n\n---\n\nLet's take a look at what is specifically stored on the stack.\n\n\n## the stack at \u003cmain+0\u003e\n\nThis section assumes an x86_64 Linux.\nThe following shows a partial dump of the stack at the start of `main()`:\n\n\u003e preparing for the stack dump:\n```sh\n# we will use the executable from the later 'stages of compilation' section.\n# at this point it does not particularly matter though:\ngdb a.out\n\n# (gdb)\nbreak *main\nrun \"passed_along_argument\"\n# at this point we take a look at the stack.\n```\n\u003e stack-dump-main.jpg\n\n\u003cimg src=\"media/stack-dump-main.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * as we go further down the stack, memory addresses go down as well (**the stack grows down!**)\n  * the lower the data on the screen, the later it was pushed to the stack\n  * (a.) shows part of the environment, such as which shell we've used or the default editor\n  * (b.) is the argument we started our program with (`argv[1]`)\n  * (c.) is the name of the executable (`argv[0]`)\n    * **arrays grow up: `argv[0]` is further down the stack than `argv[1]`!**\n  * just after (c.) we can see `argc` is 2 (the size of our `argv` array)\n  * (d.) contains the latest return addresses\n    * **`__libc_start_main + 243` is the operation that will be executed once we return from `main()`**\n\n## stack frames\n\nThe stack frame for the current function is the part of the stack\nthat is used to store local variables for the current function scope.\n\nThe current stack frame is delimited by two pointers:\n1. The *base pointer*, that points to the start of the stack frame.\n2. And the *frame pointer* that points to the end of the current frame.\n\n\nIf a function calls another function a new stack frame is created and the pointers are updated.\u003cbr\u003e\nAs functions return to their calling function, stack frames are removed\nand the stack shrinks again.\n\nIn this section we will take a look at how this happens one assembly instruction at a time.\n\n\u003e preparing for stepping through the creation of a stack frame:\n\n```sh\n# we will use the 'stages of compilation' binary again.\n# compared to the previous example we don't pass any arguments. this will\n# result in different addresses, even if everything else stays the same.\ngdb a.out\n\n# (gdb)\nstart # shortcut for `break main*` plus `run`\ndisas main # disassemble main function.\n```\n\n\u003e dissassembled main via `(gdb) disas main`:\n```sh\n=\u003e 0x0000555555555160 \u003c+0\u003e:\tendbr64 \n   0x0000555555555164 \u003c+4\u003e:\tpush   rbp\n   0x0000555555555165 \u003c+5\u003e:\tmov    rbp,rsp\n   0x0000555555555168 \u003c+8\u003e:\tsub    rsp,0x20\n   0x000055555555516c \u003c+12\u003e:\tmov    DWORD PTR [rbp-0x4],edi\n   0x000055555555516f \u003c+15\u003e:\tmov    QWORD PTR [rbp-0x10],rsi\n   0x0000555555555173 \u003c+19\u003e:\tmov    QWORD PTR [rbp-0x18],rdx\n   0x0000555555555177 \u003c+23\u003e:\tmov    eax,0x0\n   0x000055555555517c \u003c+28\u003e:\tcall   0x555555555149 \u003cgreet\u003e\n   0x0000555555555181 \u003c+33\u003e:\tmov    eax,0x0\n   0x0000555555555186 \u003c+38\u003e:\tleave  \n   0x0000555555555187 \u003c+39\u003e:\tret    \n```\n\n* *note the following:*\n  * `=\u003e` marks the instruction, that will be **executed next**\n  * `\u003c+n\u003e` shows the offset of an instruction from the start of the function\n    * in our case `main+28` would be the call instruction\n  * `main+4`, `main+5` and `main+8` are what is called the **function prologue**\n\n---\n\nThe following images show the stack before we have executed any instructions (in main):\n\n\u003e stack-frame-main-0.jpg\n\n\u003cimg src=\"media/stack-frame-main-0.jpg\"\u003e\u003c/img\u003e\n\n  *note the following:*\n  * the base pointer (`rbp`) points to `0x0`\n    * on entry GDB sets all registers (except `rsp` and `rip`) to zero\n      * you can observe this behaviour with `(gdb) starti`\n    * if we had ran the executable without a debugger it would contain a valid address\n  * the stack pointer (`rsp`) points to the end of the previous frame\n  * the most recently pushed item is a **return address**\n    * it points to an instruction that is part of the C standard library\n    * when `main()` exits, the program will continue from here\n\n---\n\nNow we repeatedly call `stepi` and see how the stack\ngrows during the course of the program.\n\n\u003e stack-frame-main-function-prologue-a.jpg\n\n\u003cimg src=\"media/stack-frame-main-function-prologue-a.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * (1.) after executing `push rbp` the stack grows by one word\n    * a word is traditionally the smallest addressable size of memory\n    * 1 word = 8 Byte = 64 bit (we're running on x86_**64** after all)\n    * we do this so we can restore the old base pointer\n  * (1.) at the same time the stack pointer is updated to point to the new end of our frame\n  * (2.) after executing `mov rbp, rsp` both pointers point at the same address (the backup)\n\n---\n\nIf we take one more `stepi` we'll be the farthest away from the top we've ever been!\n\n\u003e stack-frame-main-function-prologue-b.jpg\n\n\u003cimg src=\"media/stack-frame-main-function-prologue-b.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * (3.) after executing `sub rsp,0x20` the stack pointer is updated and the stack grows by 32 Byte\n    * we are substracting from the stack pointer because the stack grows down: `0xC0 - 0x20 == 0xA0`\n    * the memory is not initialized in any way, whatever was there before still is\n\n---\n\nInstructions `main+12`, `main+15`and `main+19` write data to our new space on the stack frame.\nInstruction `main+23` only touches a register.\n\nLet's skip ahead to `main+28`, the function call.\n\n\u003e stack-frame-main-greet.jpg\n\n\u003cimg src=\"media/stack-frame-main-greet.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * (4.) as part of executing `call 0x555555555149`, the address for `main+33` is pushed to the stack\n    * **this is the return address for the instruction that is next in line after we return from `greet()`**\n  * a `call` is basically an atomic `push \u003caddress_after_call\u003e` plus `jmp \u003ctarget\u003e`\n\nIf we would continue the execution of `greet()` the process would be the same as for `main()`:\n  * backup the base pointer to the stack\n  * move the base and stack pointers to grow the frame\n  * push a return address as we call another function\n\n# programming paradigms\n\n## imperative\n\nhow to change the state\n\n* Forth\n* assembly\n\n### procedural\n\ngouping instructions into procedures\n\n* C\n\n### object oriented\n\nstate and instructions are grouped together\n\n* C++\n\n## declerative\n\ndeclare desired results\n\n## stack-based\n\n* Reverse Polish notation\n\n# stages of compilation\n\n## typing\n\n* Forth (typeless)\n* C (static)\n* Python (dynamic, duck)\n\nSee [[pracbin, p. 12ff]](#sources-and-further-reading).\n\nThis section assumes compilation for the x86_64 architecture on Linux.\n\nAll intermediary files can be found in the `stages-of-compilation` directory.\n\n---\n\nLet's go through the stages of compilation step by step.\nWe'll use the following example:\n\n\u003e stages.c\n\n```c\n#include \u003cstdio.h\u003e\n#include \u003cstdlib.h\u003e\n\n#define FORMAT_STRING \"%s\"\n#define MESSAGE \"hello, world!\\n\"\n\nvoid greet() {\n  printf(FORMAT_STRING, MESSAGE);\n}\n\nint main(int argc, char** argv, char** envp)\n{\n  greet();\n  return EXIT_SUCCESS;\n}\n```\n\n* *note the following:*\n  * using a `main()` signature with `envp` does not conform to POSIX but is widely supported by Unix-like systems and mentioned as a common alternative in the C standard ([[c11, Annex J.5.1, p. 575]](#sources-and-further-reading))\n\n\n\u003e run through all stages and save intermediary files:\n\n```sh\nuname -a\n# Linux ubuntu 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux\n\ngcc -masm=intel --save-temps stages.c\n\ntree\n# .\n# ├── a.out\n# ├── stages.c\n# ├── stages.i\n# ├── stages.o\n# └── stages.s\n```\n\n---\n\n\u003e stages-of-compilation.jpg\n\n\u003cimg src=\"media/stages-of-compilation.jpg\"\u003e\u003c/img\u003e\n\n* *note the following:*\n  * produced filenames assume compilation with `--save-temps`\n  * on macOS gcc is llvm in disguise and will produce an additional `.bc` bitcode file\n\n## preprocessor\n\nIn this first stage the preprocessor does the following:\n\n* all included headers (`#include`) and our source file are (recursively) concatenated in place\n* preprocessor macros (`#define`) are expanded\n\n\u003e stages.i (partial)\n\n```c\n# 1 \"stages.c\"\n# 1 \"\u003cbuilt-in\u003e\"\n# 1 \"\u003ccommand-line\u003e\"\n# 31 \"\u003ccommand-line\u003e\"\n# 1 \"/usr/include/stdc-predef.h\" 1 3 4\n\n/* ... */\n\ntypedef signed char __int8_t;\ntypedef unsigned char __uint8_t;\n\n/* ... */\n\nextern FILE *stdin;\nextern FILE *stdout;\nextern FILE *stderr;\n\n/* ... */\n\n# 7 \"stages.c\"\nvoid greet() {\n  printf(\"%s\", \"hello, world!\\n\");\n}\n\nint main(int argc, char** argv, char** penv)\n{\n  greet();\n  return \n# 14 \"stages.c\" 3 4\n        0\n# 14 \"stages.c\"\n                    ;\n}\n```\n\n* *note the following:*\n  * our 15 line program ballooned to 1837 lines (`wc -l stages.i`)\n    * all neccessary (uncompiled) code for the following stages  is now contained within\n  * if you grep the file for include or define directives there won't be any (`egrep '#include|#define' stages.i`)\n  * the `greet()` function no longer contains the macros for our strings but the strings themselves\n\n## compiler\n\nIn the compilation stage our concatenated source is translated into assembly language.\nDepending on flags more or less optimization takes place (`-O0` to `-O3`).\n\n\u003e stages.s (Intel syntax)\n\n```asm\n\t.file\t\"stages.c\"\n\t.intel_syntax noprefix\n\t.text\n\t.section\t.rodata\n.LC0:\n\t.string\t\"hello, world!\"\n\t.text\n\t.globl\tgreet\n\t.type\tgreet, @function\ngreet:\n.LFB6:\n\t.cfi_startproc\n\tendbr64\n\tpush\trbp\n\t.cfi_def_cfa_offset 16\n\t.cfi_offset 6, -16\n\tmov\trbp, rsp\n\t.cfi_def_cfa_register 6\n\tlea\trdi, .LC0[rip]\n\tcall\tputs@PLT\n\tnop\n\tpop\trbp\n\t.cfi_def_cfa 7, 8\n\tret\n\t.cfi_endproc\n.LFE6:\n\t.size\tgreet, .-greet\n\t.globl\tmain\n\t.type\tmain, @function\nmain:\n.LFB7:\n\t.cfi_startproc\n\tendbr64\n\tpush\trbp\n\t.cfi_def_cfa_offset 16\n\t.cfi_offset 6, -16\n\tmov\trbp, rsp\n\t.cfi_def_cfa_register 6\n\tsub\trsp, 32\n\tmov\tDWORD PTR -4[rbp], edi\n\tmov\tQWORD PTR -16[rbp], rsi\n\tmov\tQWORD PTR -24[rbp], rdx\n\tmov\teax, 0\n\tcall\tgreet\n\tmov\teax, 0\n\tleave\n\t.cfi_def_cfa 7, 8\n\tret\n\t.cfi_endproc\n.LFE7:\n\t.size\tmain, .-main\n\t.ident\t\"GCC: (Ubuntu 9.3.0-10ubuntu2) 9.3.0\"\n\t.section\t.note.GNU-stack,\"\",@progbits\n\t.section\t.note.gnu.property,\"a\"\n\t.align 8\n\t.long\t 1f - 0f\n\t.long\t 4f - 1f\n\t.long\t 5\n0:\n\t.string\t \"GNU\"\n1:\n\t.align 8\n\t.long\t 0xc0000002\n\t.long\t 3f - 2f\n2:\n\t.long\t 0x3\n3:\n\t.align 8\n4:\n\n```\n\n* *note the following:*\n  * if not told otherwise (with `-masm=intel`) gcc creates assembly with AT\u0026T syntax\n  * constants and variables have symbolic names and not just addresses (`.LC0` for the hello world string)\n  * constants and variables have types\n  * functions are easily identified by their labels (`greet:`, `main:`)\n  * calls to functions happen via names and not addresses (`call puts@PLT`)\n  * our call to `printf` was optimized to `puts`\n  * there's not much left of the 1.8k lines from the preprocessor step\n\n## assembler\n\nNow we create a binary file for the first time. But while this object file contains machine code it is not quite executable just yet.\n\n\u003e inspecting the object file:\n\n```sh\nfile stages.o\n# stages.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped\n\nxxd stages.o | head -n 1 # hex dump.\n# 00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............\n```\n\n* *note the following:*\n  * the file utility tells us a couple of interesting things:\n    * `ELF` stands for Executable and Linkable Format, a file format for executables and object files (among others)\n    * `64-bit` since our object file is targeted at x86_64\n    * `LSB` stands for Least Significant Byte first (when storing integers)\n    * `relocatable` since object files don't have a fixed address in memory\n    * `not stripped` means that debug symbols are present\n  * **each object files is compiled on its own and thus has no idea of memory addresses of other object files**\n  * ELF files start with a bit of *magic*, the bytes: `0x7F` and then the ASCII characters `E`, `L`, `F`\n\n---\n\n\u003e trying to execute an object file\n\n```sh\nchmod +x stages.o\n./stages.o\n# -bash: ./stages.o: cannot execute binary file: Exec format error\n```\n\nThat did not work, let's see why:\n\n---\n\nC and other high level languages refer to functions and variables with\nhuman readable (symbolic) names. This makes it easier for the programmer.\nThe CPU on the other hand refers to them directly via memory addresses.\n\nA symbol table is a mapping between the two.\n\n\u003e symbol table via `readelf --syms stages.o`\n\n```sh\nSymbol table '.symtab' contains 14 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND\n     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS stages.c\n     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1\n     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3\n     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4\n     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5\n     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7\n     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    8\n     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    9\n     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    6\n    10: 0000000000000000    23 FUNC    GLOBAL DEFAULT    1 greet\n    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND puts\n    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_\n    13: 0000000000000017    40 FUNC    GLOBAL DEFAULT    1 main\n```\n\n* *note the following:*\n  * for relocatables most symbols point to just zeroes\n    * this is because before the linking stage it's unclear where they'll land in memory/the file\n    * we will have to resolve them before we can run our binary\n\n## linker\n\nIn this stage all object files will be linked together. The result is a single executable. The process is as follows:\n\n1. merge all object files into a single executable\n2. resolve (static) symbolic references to now known fixed locations\n\nStatic libraries (`.a`) are merged into the executable. Dynamic/shared libraries (`.so`) are left unresolved.\nThe dynamic linker (interpreter) will resolve these at runtime.\n\n\u003e inspecting the executable:\n\n```sh\nfile a.out\n# a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=41087481fb19eebf518ec7ff727fde7395cdc927, for GNU/Linux 3.2.0, not stripped\n\n# let's finally run it:\n./a.out\n# hello, world!\n```\n\n* *note the following:*\n  * `pie` Position Independent Executable, code does not rely on being located in a specific place in memory\n    * you can ignore this for now, we'll talk about it later\n  * `executable` instead of `relocatable`, which means we can actually run it\n  * `dynamically linked` at least some of the used libraries are shared ones\n  * `interpreter [...]` which dynamic linker will be used to resolve shared libraries\n\n---\n\n\u003e symbol table via `readelf --syms a.out`\n\n```sh\nSymbol table '.dynsym' contains 7 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND \n     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab\n     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)\n     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)\n     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__\n     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable\n     6: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (2)\n\nSymbol table '.symtab' contains 66 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND \n     1: 0000000000000318     0 SECTION LOCAL  DEFAULT    1 \n     2: 0000000000000338     0 SECTION LOCAL  DEFAULT    2 \n     3: 0000000000000358     0 SECTION LOCAL  DEFAULT    3 \n     4: 000000000000037c     0 SECTION LOCAL  DEFAULT    4 \n     5: 00000000000003a0     0 SECTION LOCAL  DEFAULT    5 \n     6: 00000000000003c8     0 SECTION LOCAL  DEFAULT    6 \n     7: 0000000000000470     0 SECTION LOCAL  DEFAULT    7 \n     8: 00000000000004f2     0 SECTION LOCAL  DEFAULT    8 \n     9: 0000000000000500     0 SECTION LOCAL  DEFAULT    9 \n    10: 0000000000000520     0 SECTION LOCAL  DEFAULT   10 \n    11: 00000000000005e0     0 SECTION LOCAL  DEFAULT   11 \n    12: 0000000000001000     0 SECTION LOCAL  DEFAULT   12 \n    13: 0000000000001020     0 SECTION LOCAL  DEFAULT   13 \n    14: 0000000000001040     0 SECTION LOCAL  DEFAULT   14 \n    15: 0000000000001050     0 SECTION LOCAL  DEFAULT   15 \n    16: 0000000000001060     0 SECTION LOCAL  DEFAULT   16 \n    17: 0000000000001208     0 SECTION LOCAL  DEFAULT   17 \n    18: 0000000000002000     0 SECTION LOCAL  DEFAULT   18 \n    19: 0000000000002014     0 SECTION LOCAL  DEFAULT   19 \n    20: 0000000000002060     0 SECTION LOCAL  DEFAULT   20 \n    21: 0000000000003db8     0 SECTION LOCAL  DEFAULT   21 \n    22: 0000000000003dc0     0 SECTION LOCAL  DEFAULT   22 \n    23: 0000000000003dc8     0 SECTION LOCAL  DEFAULT   23 \n    24: 0000000000003fb8     0 SECTION LOCAL  DEFAULT   24 \n    25: 0000000000004000     0 SECTION LOCAL  DEFAULT   25 \n    26: 0000000000004010     0 SECTION LOCAL  DEFAULT   26 \n    27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27 \n    28: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c\n    29: 0000000000001090     0 FUNC    LOCAL  DEFAULT   16 deregister_tm_clones\n    30: 00000000000010c0     0 FUNC    LOCAL  DEFAULT   16 register_tm_clones\n    31: 0000000000001100     0 FUNC    LOCAL  DEFAULT   16 __do_global_dtors_aux\n    32: 0000000000004010     1 OBJECT  LOCAL  DEFAULT   26 completed.8059\n    33: 0000000000003dc0     0 OBJECT  LOCAL  DEFAULT   22 __do_global_dtors_aux_fin\n    34: 0000000000001140     0 FUNC    LOCAL  DEFAULT   16 frame_dummy\n    35: 0000000000003db8     0 OBJECT  LOCAL  DEFAULT   21 __frame_dummy_init_array_\n    36: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS stages.c\n    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c\n    38: 0000000000002184     0 OBJECT  LOCAL  DEFAULT   20 __FRAME_END__\n    39: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS \n    40: 0000000000003dc0     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_end\n    41: 0000000000003dc8     0 OBJECT  LOCAL  DEFAULT   23 _DYNAMIC\n    42: 0000000000003db8     0 NOTYPE  LOCAL  DEFAULT   21 __init_array_start\n    43: 0000000000002014     0 NOTYPE  LOCAL  DEFAULT   19 __GNU_EH_FRAME_HDR\n    44: 0000000000003fb8     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_\n    45: 0000000000001000     0 FUNC    LOCAL  DEFAULT   12 _init\n    46: 0000000000001200     5 FUNC    GLOBAL DEFAULT   16 __libc_csu_fini\n    47: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab\n    48: 0000000000004000     0 NOTYPE  WEAK   DEFAULT   25 data_start\n    49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@@GLIBC_2.2.5\n    50: 0000000000004010     0 NOTYPE  GLOBAL DEFAULT   25 _edata\n    51: 0000000000001208     0 FUNC    GLOBAL HIDDEN    17 _fini\n    52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_\n    53: 0000000000001149    23 FUNC    GLOBAL DEFAULT   16 greet\n    54: 0000000000004000     0 NOTYPE  GLOBAL DEFAULT   25 __data_start\n    55: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__\n    56: 0000000000004008     0 OBJECT  GLOBAL HIDDEN    25 __dso_handle\n    57: 0000000000002000     4 OBJECT  GLOBAL DEFAULT   18 _IO_stdin_used\n    58: 0000000000001190   101 FUNC    GLOBAL DEFAULT   16 __libc_csu_init\n    59: 0000000000004018     0 NOTYPE  GLOBAL DEFAULT   26 _end\n    60: 0000000000001060    47 FUNC    GLOBAL DEFAULT   16 _start\n    61: 0000000000004010     0 NOTYPE  GLOBAL DEFAULT   26 __bss_start\n    62: 0000000000001160    40 FUNC    GLOBAL DEFAULT   16 main\n    63: 0000000000004010     0 OBJECT  GLOBAL HIDDEN    25 __TMC_END__\n    64: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable\n    65: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@@GLIBC_2.2\n```\n\n* *note the following:*\n  * **we have two sybol tables now**\n    * `.dynsym` is used by the dynamic linker\n    * `.symtab` includes the same symbols as `.dynsym` (and more)\n    * **`.symtab` is optional at this point** (not necessary for process creation)\n  * the functions we've written ourselves (`main`, `greet`) now have locations/offsets\n  * the `puts` function from before still has no location (dynamically linked)\n\n---\n\nI've mentioned in the notes above that the `.symtab` symbol table is optional. Let's strip it.\n\n\u003e stripped symbol table via `strip -s a.out \u0026\u0026 readelf --syms a.out`:\n\n```sh\nSymbol table '.dynsym' contains 7 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND \n     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab\n     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)\n     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)\n     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__\n     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable\n     6: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (2)\n```\n\n\u003e running a stripped binary works just as well:\n```sh\n./a.out\n# hello, world!\n```\n\n* *note the following:*\n  * while it still runs the same it is much harder to debug now\n  * most binaries you encounter in the wild are stripped\n\n# instruction set architectures\n\naka. ISA\n\n## types\n\n### CISC\n\n### RISC\n\n## instruction sets\n\n### x86\n\nSee [[inteldev, vol. 1, ch. 2, p. 1ff]](#sources-and-further-reading), [[86hist]](#sources-and-further-reading).\n\n#### AT\u0026T and Intel syntax\n\n#### x86_32\n\naka. x86\n\n#### x86_64\n\naka. x64\n\n## ARM\n\n# executable file formats\n\n## ELF\n\n## Mach-O\n\n## PE\n\n# binary analysis\n\nSee [[pracbin, p. 2f]](#sources-and-further-reading).\n\n## static\n\nStatic analysis is the process of gaining information about a binary without running it.\n\n*Pros*\n\n* no need for a CPU with a fitting architecture\n* no need for additional software to run the binary (Kernel, etc.)\n\n*Cons*\n\n* more difficult to reason about due to missing runtime state\n\n## dynamic\n\nDynamic analysis on the other hand executes the binary.\n\n*Pros*\n\n* easier, due to additional information (runtime state)\n\n*Cons*\n\n* might miss some code paths\n\n# tools\n\n## GDB\n\n[GDB ↣](gdb)\n\n## LLVM\n\n## GCC\n\n## NASM\n\nNetwide Assembler.\n\n## Radare2\n\n[Radare2 ↣](radare2)\n\n## Ghidra\n\n## binutils\n\nSee [[binutils]](#sources-and-further-reading).\n\n## binwalk\n\n## BinDiff\n\n## dd\n\n## xxd\n\n## ldd\n\nprint shared object dependencies\n\n```sh\nldd rot13 # prints shared objects and their addresses in memory.\n```\n\n## pwntools\n\n## ltrace and strace\n\n## pmap\n\n## Python 3\n\n## Compiler Explorer\n\n* generates and compares binaries with different compilers and settings\n* [godbolt.org](https://godbolt.org/)\n\n## binvis.io\n\n* visual analysis of binary files\n* [binvis.io](https://binvis.io/)\n\n# dotfiles\n\n[dotfiles ↣](dotfiles)\n\n# exploit-development\n\n[exploit-development ↣](exploit-development)\n\n\n# reverse-engineering\n\n[reverse engineering ↣](reverse-engineering)\n\n# sources and further reading\n\n* [86hist] Morse, S. P., Ravenel, B. W., Mazor, S., \u0026 Pohlman, W. B. (1980). Intel Microprocessors — 8008 to 8086. Computer, 13(10), 42–60. https://doi.org/10.1109/MC.1980.1653375\n* [binutils] GNU Binary Utilities Documentation. (2002). Retrieved from http://www.gnu.org/software/binutils/manual/\n* [c11] ISO, \u0026 IEC. (2010). ISO/IEC 9899:201x, International Standard Programming languages — C, Committee Draft (N1570 ed.). ISO/IEC. Retrieved from http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf\n* [compsec] Stallings, W. (2018). Computer Security: Principles and Practice, Global Edition (4th ed.). Pearson.\n* [edvac] Von Neumann, J., \u0026 Godfrey, M. D. (1993). First Draft of a Report on the EDVAC. IEEE Annals of the History of Computing, 15(4), 27–75. https://doi.org/10.1109/85.238389\n* [inteldev] Intel. (2011). Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes. System, 3(253665). https://doi.org/10.1109/MAHC.2010.22\n* [pracbin] Andriesse, D., \u0026 Francisco, S. (2018). PRACTICAL BINARY ANALYSIS Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly (2nd ed.). No Starch Press.\n* [x64beg] Hoey, J. Van. (2019). Beginning x64 Assembly Programming: From Novice to AVX Professional Paperback (1st ed.). Apress.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbmedicke%2Freed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbmedicke%2Freed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbmedicke%2Freed/lists"}