{"id":15025816,"url":"https://github.com/jonathansalwan/vmprotect-devirtualization","last_synced_at":"2025-04-13T06:19:24.619Z","repository":{"id":38037526,"uuid":"459076558","full_name":"JonathanSalwan/VMProtect-devirtualization","owner":"JonathanSalwan","description":"Playing with the VMProtect software protection. Automatic deobfuscation of pure functions using symbolic execution and LLVM.","archived":false,"fork":false,"pushed_at":"2022-06-11T05:13:00.000Z","size":29466,"stargazers_count":1224,"open_issues_count":0,"forks_count":192,"subscribers_count":31,"default_branch":"main","last_synced_at":"2025-04-13T06:19:24.073Z","etag":null,"topics":["deobfuscation","llvm-ir","program-analysis","symbolic-execution","vmprotect"],"latest_commit_sha":null,"homepage":"","language":"Roff","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JonathanSalwan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-14T08:31:04.000Z","updated_at":"2025-04-12T23:14:31.000Z","dependencies_parsed_at":"2022-08-08T22:46:17.383Z","dependency_job_id":null,"html_url":"https://github.com/JonathanSalwan/VMProtect-devirtualization","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonathanSalwan%2FVMProtect-devirtualization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonathanSalwan%2FVMProtect-devirtualization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonathanSalwan%2FVMProtect-devirtualization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JonathanSalwan%2FVMProtect-devirtualization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JonathanSalwan","download_url":"https://codeload.github.com/JonathanSalwan/VMProtect-devirtualization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248670809,"owners_count":21142963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deobfuscation","llvm-ir","program-analysis","symbolic-execution","vmprotect"],"created_at":"2024-09-24T20:03:04.941Z","updated_at":"2025-04-13T06:19:24.598Z","avatar_url":"https://github.com/JonathanSalwan.png","language":"Roff","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eVMProtect Devirtualization\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  An \u003cb\u003eexperimental\u003c/b\u003e dynamic approach to devirtualize pure functions protected by \u003cb\u003eVMProtect 3.x\u003c/b\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\n* [TL;DR](#tldr)\n* [Introduction](#introduction)\n* [The approach](#the-approach)\n    * [Example 1: A simple bitwise operation protected](#example-1-a-simple-bitwise-operation-protected)\n    * [Example 2: A MBA operation protected](#example-2-a-mba-operation-protected)\n    * [Example 3: More than one basic block](#example-3-more-than-one-basic-block)\n* [Conclusion and limitations](#conclusion-and-limitations)\n* [References](#references)\n\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\n# TL;DR\n\nI am sharing some notes about a dynamic approach to devirtualize pure functions protected by VMProtect.\nThis approach has shown very good results if the virtualized function only contains one basic block\n(regardless of its size). This is a common scenario when binaries protect arithmetic operations. However,\nthis approach is a bit more experimental when the target function contains more than one basic block.\nNevertheless, we managed to devirtualize and reconstruct the binary code from samples that contain 2 basic\nblocks which suggests that it is possible to fully devirtualize small functions dynamically.\n\n\n# Introduction\n\n[VMProtect](https://vmpsoft.com) is a software protection that protects code by running it through a virtual\nmachine with non-standard architecture. This protection is a great playground for asm lovers\n[0, 1, 2, 3, 4, 5, 6, 11]. Also, there are already numerous tools that attack this protection [7, 8, 9, 12, 13].\nIn 2016 we took a look at the [Tigress](https://github.com/JonathanSalwan/Tigress_protection/) software\nprotection solution and managed to defeat its virtualization using symbolic execution and LLVM. This approach\nhas been presented at DIMVA 2018 [10] and I wanted to test it on VMProtect. Note that there is no\nmagic solution that works on every binaries, there are always tradeoffs depending on the target and your goals.\nThis modest contribution aims to provide an example of a dynamic attack against *pure functions* that are virtualized\nby VMProtect. The main advantage of a dynamic attack is that it defeats by design some VMProtect's static protections\nlike self modifying code, key and operands encryption etc.\n\nWe consider a pure function a function with a finite number of paths and that does not have side effects.\nThere can be several inputs but only one output. Below is an example of a pure function:\n\n```cpp\nint secret(int x, int y) {\n  int r = x ^ y;\n  return r;\n}\n```\n\n# The approach\n\nWe rely on the key intuition that an obfuscated trace T' (from the obfuscated code P') combines original\ninstructions from the original code P (the trace T corresponding to T' in the original code) and\ninstructions of the virtual machine VM such that T' = T + VM(T). If we are able to distinguish between\nthese two subsequences of instructions T and VM(T), we then are able to reconstruct one path of the\noriginal program P from a trace T'. By repeating this operation to cover all paths of the virtualized\nprogram, we will be able to reconstruct the original program P. In our practical example, the original\ncode has a finite number of executable paths, which is the case in many situations involving intellectual\nproperty protection. To do so, we proceed with the following steps:\n\n1. Identify the virtualized function and its arguments\n2. Generate a VMProtect trace of the target\n3. Replay the VMP trace and construct symbolic expressions to obtain the relation between inputs and output\n4. Apply optimizations on symbolic expressions to avoid as much as possible instructions from the VM\n5. Lift our symbolic representation to the LLVM-IR to build a new unprotected version of the target\n\n## Example 1: A simple bitwise operation\n\nLet's take as a first example the following function: it takes two inputs and returns `x ^ y` which is protected by VMProtect.\n\n```cpp\nint secret(int x, int y) {\n  VMProtectBegin(\"secret\");\n  int r = x ^ y;\n  VMProtectEnd();\n  return r;\n}\n```\n\nWe start by identifying where functions are using VMProtect and how many arguments they have. For our example we may have something like below:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screen1.png\"\u003e\n\u003c/p\u003e\n\nJust by reading the code we know that the function starts at the address `0x4011c0`, have two 32-bit arguments (`edi` and `esi`)\nand returns at `0x4011ef`. That's all the reverse-engineering we need. Next parts will be automatic. Now we\nhave to generate an trace execution of this virtualized function. To do so we use a [Pintool](pin/source/tools/VMP_Trace/VMP_Trace.cpp).\nIt only needs a `start` and an `end` address (for our example, `0x4011c0` and `0x4011ef`) which represents the range\nof the instrumentation. Note that any kind of DBI or emulator could do this job.\n\n```\n$ ./pin/pin -t ./pin/source/tools/VMP_Trace/obj-intel64/VMP_Trace.so -start 4198848 -end 4198895 -- ./vmp_binaries/binaries/sample2.vmp.bin 1 2 \u0026\u003e ./vmp_traces/sample2.vmp.trace\n```\n\nYou can see the result [here](vmp_traces/sample2.vmp.trace). The trace format uses three kind of operations: `mr`, `r` and `i`.\n`mr` is a memory read access done by the instruction `i`, and `r` are the CPU registers. For example:\n\n```\nmr:0x7ffda459d718:8:0x227db4f8\nr:0x40200a:0x0:0x7ffda459f571:0x2:0x40200a:0x0:0x0:0x7ffda459d688:0x0:0x0:0x7feee9b80ac0:0x7feee9b8000f:0xad1c3e:0x0:0x0:0x0\ni:0x89173e:8:488BB42490000000\n```\n\nWe have a memory read that loads an `8` bytes constant `0x227db4f8` from the address `0x7ffda459d718`.\nThe instruction is executed at the address `0x89173e` and its 8-bytes long opcode is `488BB42490000000` which is a\n[`mov rsi, qword ptr [rsp + 0x90]`](http://shell-storm.org/online/Online-Assembler-and-Disassembler/?opcodes=488BB42490000000\u0026arch=x86-64\u0026endianness=little\u0026dis_with_addr=True\u0026dis_with_raw=True\u0026dis_with_ins=True#disassembly).\nThe register state before the execution is the following:\n\n```python\n(1) RAX = 0x40200a          (9)  R8  = 0\n(2) RBX = 0                 (10) R9  = 0\n(3) RCX = 0x7ffda459f571    (11) R10 = 0x7feee9b80ac0\n(4) RDX = 0x2               (12) R11 = 0x7feee9b8000f\n(5) RDI = 0x40200a          (13) R12 = 0xad1c3e\n(6) RSI = 0                 (14) R13 = 0\n(7) RBP = 0                 (15) R14 = 0\n(8) RSP = 0x7ffda459d688    (16) R15 = 0\n```\n\nOnce the VMP trace has been generated, we replay it using the [attack_vmp.py](attack_vmp.py) script. This script uses\n[Triton](https://github.com/jonathansalwan/Triton) to build the path predicate of the trace. Note that all expressions which\ninvolve symbolic variables (inputs of the function) are kept symbolic while all non related input expressions are concretized. In\nother words, our symbolic expressions do not contain any operation related to the virtual machine (the machinery\nitself does not depend on the user) but only operations related to the original program.\n\nFor example, below is an example of a concretization. On the left we have an AST that contains subexpressions which do not involve\nsymbolic variable (`1 + 2` and `6 ^ 3`). So these branches are concretized and replaced by constants `3` and `5` which leads to\nthe AST on the right. **This is how we devirtualize code.**\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screen2.png\"\u003e\n\u003c/p\u003e\n\n**A note on formula-level backward slicing**: As it is common in symbolic execution, the symbolic representation is first\ncomputed in a forward manner along the path, then all logical operations and definitions affecting neither the final result\nnor the followed path are removed from the symbolic expression (formula slicing, a.k.a. formula pruning). This turns out to\nperform on the formula the equivalent of a backward slicing code analysis from the program output. Thus, at the return of the\n`secret` function, we have an expression of the relation between the inputs and the output without the instructions of VMProtect.\n\nThe `./attack_vmp.py` script takes as parameters the trace file and the size of symbolic variables. Remember, it was `edi` and\n`esi`, so they are 4 bytes long. The result of the script is the following:\n\n```\n$ ./attack_vmp.py --trace1 ./vmp_traces/sample2.vmp.trace --symsize 4\n[+] Replaying the VMP trace\n[+] Symbolize inputs\n[+] Instruction executed: 12462\n[+] Emulation done\n[+] Return value: 0x3\n[+] Devirt expr: (bvor (bvnot (bvor (bvnot (bvnot x)) (bvnot y))) (bvnot (bvor (bvnot x) (bvnot (bvand (bvnot y) (bvnot y))))))\n[+] Synth expr: (bvxor x y)\n\n[+] LLVM IR ==============================\n\n; ModuleID = 'tritonModule'\nsource_filename = \"tritonModule\"\n\ndefine i32 @__triton(i32 %SymVar_0, i32 %SymVar_1) {\nentry:\n  %0 = xor i32 %SymVar_0, %SymVar_1\n  ret i32 %0\n}\n\n[+] EOF LLVM IR ==============================\n```\n\nAs we can see, the devirtualized expression returned by the `secret` function is pretty concise and does not contain instructions from\nthe virtual machine.\n\n```smt\n(bvor\n    (bvnot (bvor\n             (bvnot (bvnot x))\n             (bvnot y)\n           )\n    )\n    (bvnot (bvor\n             (bvnot x)\n             (bvnot (bvand\n                      (bvnot y)\n                      (bvnot y)\n                    )\n             )\n           )\n    )\n)\n```\n\nHowever, we did not manage to recover the original expression which was a simple `XOR` operation. It looks like the `XOR` has been\ntranslated to bitwise operations. Luckily, we recently released new features in the Triton project which are\na [synthesizer](https://github.com/JonathanSalwan/Triton/issues/1074) and a lifter to\n[LLVM-IR](https://github.com/JonathanSalwan/Triton/issues/1078). Thus, we can synthesize the expression which gives us the expression\n`(bvxor x y)`. It's a good win and now we can go further by lifting this expression to LLVM-IR and then compile a new devirtualized\nbinary code.\n\n\n## Example 2: A MBA operation protected\n\nOk, now let's take a look at another example which tries to hide an MBA operation. The original source code is the following:\n\n```cpp\n// This function is an MBA that computes: (x ^ 92) + y\n// We will protect this MBA with VMProtect and see if we can recover \"(x ^ 92) + y\"\nchar secret(char x, char y) {\n  VMProtectBegin(\"secret\");\n  int a = 229 * x + 247;\n  int b = 237 * a + 214 + ((38 * a + 85) \u0026 254);\n  int c = (b + ((-(2 * b) + 255) \u0026 254)) * 3 + 77;\n  int d = ((86 * c + 36) \u0026 70) * 75 + 231 * c + 118;\n  int e = ((58 * d + 175) \u0026 244) + 99 * d + 46;\n  int f = (e \u0026 148);\n  int g = (f - (e \u0026 255) + f) * 103 + 13;\n  int r = (237 * (45 * g + (174 * g | 34) * 229 + 194 - 247) \u0026 255) + y;\n  VMProtectEnd();\n  return r;\n}\n```\n\nLike with the first example, we have to identify where this function starts and ends and generate a VMP trace.\n\n```\n$ ./pin/pin -t ./pin/source/tools/VMP_Trace/obj-intel64/VMP_Trace.so -start 4198857 -end 4199140 -- ./vmp_binaries/binaries/sample3.vmp.bin 1 2 \u0026\u003e ./vmp_traces/sample3.vmp.trace\n```\n\nOnce the [VMP trace](vmp_traces/sample3.vmp.trace) is generated, let's run the `./attack_vmp.py` script.\n\n```\n$ ./attack_vmp.py --trace1 ./vmp_traces/sample3.vmp.trace --symsize 1\n[+] Replaying the VMP trace\n[+] Symbolize inputs\n[+] A potential symbolic jump found on CF flag: 0x821dac: popfq - Model: {0: x:32 = 0xa3, 1: y:32 = 0xff}\n[+] A potential symbolic jump found on CF flag: 0x87f437: popfq - Model: {0: x:32 = 0xa3, 1: y:32 = 0xff}\n[+] Instruction executed: 25085\n[+] Emulation done\n[+] Return value: 0x5f\n[+] Devirt expr: In: (bvadd (bvadd (bvshl (bvadd (_ bv1 32) (bvnot (bvlshr (concat (_ bv0 8) (_ bv0 8) ((_ extract 15 8)  ...\n[+] Synth expr: In: (bvadd (bvadd (bvshl (bvadd (_ bv1 32) (bvnot (bvlshr (concat (_ bv0 8) (_ bv0 8) ((_ extract 15 8)  ...\n\n[+] LLVM IR ==============================\n\n; ModuleID = 'tritonModule'\nsource_filename = \"tritonModule\"\n\ndefine i32 @__triton(i8 %SymVar_0, i8 %SymVar_1) {\nentry:\n  %0 = xor i8 %SymVar_0, 92\n  %1 = and i8 %SymVar_0, 0\n  %2 = zext i8 %1 to i32\n  %3 = or i32 0, %2\n  %4 = shl i32 %3, 8\n  %5 = zext i8 %0 to i32\n  %6 = or i32 %4, %5\n  %7 = and i8 %SymVar_1, 0\n  %8 = zext i8 %7 to i32\n  %9 = or i32 0, %8\n  %10 = shl i32 %9, 8\n  %11 = zext i8 %SymVar_1 to i32\n  %12 = or i32 %10, %11\n  %13 = zext i8 %7 to i32\n  %14 = or i32 0, %13\n  %15 = shl i32 %14, 8\n  %16 = zext i8 %SymVar_1 to i32\n  %17 = or i32 %15, %16\n  %18 = lshr i32 %17, 7\n  %19 = xor i32 %18, -1\n  %20 = add i32 1, %19\n  %21 = shl i32 %20, 8\n  %22 = add i32 %21, %12\n  %23 = add i32 %22, %6\n  ret i32 %23\n}\n\n[+] EOF LLVM IR ==============================\n```\n\nThe result is pretty interesting for several reasons. First, we successfully managed to avoid as much as possible\ninstructions from the virtual machine as we went from 25085 instructions executed to 25 LLVM instructions. However,\nwe did not manage to get a good synthesized version of the output (yes, I know, we are going further than just doing\ndevirtualization). The advantage of lifting our symbolic expressions to LLVM-IR is that we can fully benefit from LLVM's\noptimization pipeline. Let's do this:\n\n```llvm\n$ opt -S -O3 ./devirt/sample3.ll\n; ModuleID = 'devirt/sample3.ll'\nsource_filename = \"tritonModule\"\n\n; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn\ndefine i32 @__triton(i8 %SymVar_0, i8 %SymVar_1) local_unnamed_addr #0 {\nentry:\n  %0 = xor i8 %SymVar_0, 92\n  %1 = zext i8 %0 to i32\n  %2 = zext i8 %SymVar_1 to i32\n  %3 = shl nuw nsw i32 %2, 1\n  %4 = and i32 %3, 256\n  %5 = add nuw nsw i32 %1, %2\n  %6 = sub nsw i32 %5, %4\n  ret i32 %6\n}\n```\n\nUsing LLVM optimizations we managed to remove noise from our devirtualized output and thus break the MBA.\nWe can see the `XOR` operation with its constant (`%0 = xor i8 %SymVar_0, 92`) and the `+ y` (`%6 = add nsw i32 %5, %1`).\nInstructions between are just dealing with the sign. To summarize this example, we fully devirtualized the `secret`\nfunction using the `attack_vmp.py` script and then we fully broke the MBA using LLVM optimizations.\n\n## Example 3: More than one basic block\n\nWe got very good results if the `secret` function only contains one basic block regardless of its size. So at this point\nwe are able to devirtualize one path. To reconstruct the whole function behavior, we have to successively devirtualize\nreachable paths. To do so, we have to perform a path coverage on user-dependent branches. At the end, we get as a result\na path tree which represents the different paths of the original function. Path tree is obtained by introducing\nif-then-else construction from two traces T1 and T2 with a same prefix followed by a condition C in T1 and a not(C)\nin T2. Once a path tree is built, we can let LLVM generate a CFG.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screen3.png\"\u003e\n\u003c/p\u003e\n\nWith the Tigress software protection, virtual jumps were implemented with real `jcc` instructions which allowed\nus to quickly identify jump condition. However, things are getting more complex when virtual jumps are involved\nwith VMProtect as it does not uses `jcc` instructions for jumping to another virtual block. We had to define\nmarkers on a dynamic trace to spot the condition involved in a user-dependent branch. This is the experimental part\nof this attack as markers are not really accurate but worked for our samples.\n\nOk, let's consider the following sample:\n\n```cpp\nint secret(int x, int y) {\n  VMProtectBegin(\"secret\");\n  int r = 0;\n  if (x + y == 1001)\n    r = x + 1;\n  else\n    r = y - 1;\n  VMProtectEnd();\n  return r;\n}\n```\n\nLike with the first examples we have to generate and analyze the trace.\n\n```\n$./pin/pin -t ./pin/source/tools/VMP_Trace/obj-intel64/VMP_Trace.so -start 4198848 -end 4198928 -- ./vmp_binaries/binaries/sample5.vmp.bin 1 2 \u0026\u003e ./vmp_traces/sample5.vmp.trace.1\n\n$ ./attack_vmp.py --trace1 ./vmp_traces/sample5.vmp.trace.1 --symsize 4\n[+] Replaying the VMP trace\n[+] Symbolize inputs\n[+] A potential symbolic jump found of AF flag: 0x80d905: cmp r11b, dl - Model: {0: x:32 = 0x0, 1: y:32 = 0x3e9}\n[+] Instruction executed: 16164\n[+] Emulation done\n[+] Return value: 0x4\n[+] Devirt expr: (bvnot (bvadd (bvand (bvnot y) (bvnot y)) (_ bv1 32)))\n[+] Synth expr: (bvadd y (_ bv4294967295 32))\n\n[+] LLVM IR ==============================\n\n; ModuleID = 'tritonModule'\nsource_filename = \"tritonModule\"\n\ndefine i32 @__triton(i32 %SymVar_1) {\nentry:\n  %0 = add i32 %SymVar_1, -1\n  ret i32 %0\n}\n\n[+] EOF LLVM IR ==============================\n```\n\nThe script tells us that there may be a potential symbolic jump found on the `AF` flag at address `0x80d905`.\nIt also provides a new model (using symbolic execution) which should take the other path. So let's generate a\nsecond trace using this model (if you take a look to the model, it is correct regarding our source code).\n\n```\n$ ./pin/pin -t ./pin/source/tools/VMP_Trace/obj-intel64/VMP_Trace.so -start 4198848 -end 4198928 -- ./vmp_binaries/binaries/sample5.vmp.bin 0 1001 \u0026\u003e ./vmp_traces/sample5.vmp.trace.2\n```\n\nOnce the second trace is generated, we have to provide those two traces to the `attack_vmp.py` script so that it\ncan merge them and create a path tree. We have extra options to define where the condition is located\nand on what flag (AF flag at `0x80d905`).\n\n```\n$ ./attack_vmp.py --trace1 ./vmp_traces/sample5.vmp.trace.1 --symsize 4 --trace2 ././vmp_traces/sample5.vmp.trace.2 --vbraddr 0x80d905 --vbrflag af\n[+] Replaying the VMP trace\n[+] Symbolize inputs\n[+] A potential symbolic jump found of AF flag: 0x80d905: cmp r11b, dl - Model: {0: x:32 = 0x0, 1: y:32 = 0x3e9}\n[+] Instruction executed: 16164\n[+] Emulation done\n[+] A second trace has been provided\n[+] Replaying the VMP trace\n[+] Symbolize inputs\n[+] Instruction executed: 15758\n[+] Emulation done\n[+] Merging expressions from trace1 and trace2\n[+] Return value: 0x3e9\n[+] Devirt expr: In: (ite (= (ite (= (_ bv16 8) (bvand (_ bv16 8) (bvxor (bvsub (_ bv80 8) ((_ extract 7 0) (bvadd (bvlsh ...\n[+] Synth expr: In: (ite (= (ite (= (_ bv16 8) (bvand (_ bv16 8) (bvxor (bvsub (_ bv80 8) ((_ extract 7 0) (bvadd (bvlsh ...\n\n[+] LLVM IR ==============================\n\n; ModuleID = 'tritonModule'\nsource_filename = \"tritonModule\"\n\ndefine i32 @__triton(i32 %SymVar_0, i32 %SymVar_1) {\nentry:\n  %0 = add i32 %SymVar_1, -1\n  %1 = add i32 %SymVar_0, 1\n  %2 = add i32 %SymVar_1, %SymVar_0\n  %3 = xor i32 %2, -1\n  %4 = xor i32 %2, -1\n  %5 = and i32 %4, %3\n  %6 = xor i32 %5, 1001\n  %7 = add i32 %5, 1001\n  %8 = xor i32 %5, 1001\n  %9 = xor i32 %8, %7\n  %10 = and i32 %9, %6\n  [... skip ...]\n  %469 = add i64 %468, 140737488347280\n  %470 = trunc i64 %469 to i8\n  %471 = xor i8 80, %470\n  %472 = sub i8 80, %470\n  %473 = xor i8 %472, %471\n  %474 = and i8 16, %473\n  %475 = icmp eq i8 16, %474\n  %476 = select i1 %475, i1 true, i1 false\n  %477 = icmp eq i1 %476, false\n  %478 = select i1 %477, i32 %1, i32 %0\n  ret i32 %478\n}\n\n[+] EOF LLVM IR ==============================\n```\n\nAt this step we devirtualized the two traces and merged them into `if-then-else` expressions.\nAfter lifting the expression to LLVM-IR we get a CFG with only 480 LLVM instruction which is\nalready a good win comparing to the thousands of instructions executed by the virtual machine.\nBut we can do better if we use LLVM optimizations:\n\n```llvm\n$ opt -S -O3 ./devirt/sample5.ll\n; ModuleID = './devirt/sample5.ll'\nsource_filename = \"tritonModule\"\n\n; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone willreturn\ndefine i32 @__triton(i32 %SymVar_0, i32 %SymVar_1) local_unnamed_addr #0 {\nentry:\n  %0 = add i32 %SymVar_0, 1\n  %1 = add i32 %SymVar_1, -1\n  %2 = add i32 %SymVar_1, %SymVar_0\n  %.not = icmp eq i32 %2, 1001\n  %3 = select i1 %.not, i32 %0, i32 %1\n  ret i32 %3\n}\n\nattributes #0 = { mustprogress nofree norecurse nosync nounwind readnone willreturn }\n```\n\nWoot, we recovered the original behavior of the `secret` function!\n\n# Conclusion and limitations\n\nWhile the approach showed very good results for functions that contain one path, the main limitation\nof the method is that it is mostly geared towards programs with a small number of paths due to the way\nVMProtect does virtual jumps. In case of a too high number of paths, parts of the original code may be\nlost, yielding an incomplete recovery. Note that we are considering executable paths rather than syntactic\npaths in the CFG. Hash and other cryptographic functions often have only very few paths - only one path\nin the case of timing-attack resistant implementations.\n\nAlso our current implementation is limited to programs without any user-dependent\nmemory access. This limitation can be partly removed by using a more symbolic handling of memory accesses in DSE.\n\nNote also that while bounded loops and non-recursive function calls are handled, they are currently recovered\nas inlined or unrolled code, causing a potential blowup in size of the devirtualized code. It would be interesting\nto have a post processing step trying to rebuild these high-level abstractions.\n\nTo conclude, please note that I'm not aiming to provide any kind of magic method, those are just some notes about\na dynamic attack against very specific cases protected by VMProtect =).\n\nIf you want to take a deeper look, check out those resources:\n\n* [The Pintool to generate trace](pin/source/tools/VMP_Trace/VMP_Trace.cpp)\n* [Script to analyze a VMP trace](attack_vmp.py)\n* [Samples source code](vmp_binaries/samples-source)\n* [Original and protected binaries](vmp_binaries/binaries)\n* [VMP traces](vmp_traces)\n* [Devirtualized results](devirt)\n\nLast but not least, special thanks to my mate [@0vercl0k](https://twitter.com/0vercl0k) for proofreading and edits :rocket:\n\n# References\n\n```\n[00] https://www.usenix.org/legacy/event/woot09/tech/full_papers/rolles.pdf\n[01] https://secret.club/2021/09/08/vmprotect-llvm-lifting-1.html\n[02] https://secret.club/2021/09/08/vmprotect-llvm-lifting-2.html\n[03] https://secret.club/2021/09/08/vmprotect-llvm-lifting-3.html\n[04] https://back.engineering/17/05/2021/\n[05] https://back.engineering/21/06/2021/\n[06] https://www.mitchellzakocs.com/blog/vmprotect3\n[07] https://github.com/can1357/NoVmp\n[08] https://github.com/archercreat/vmpfix\n[09] https://github.com/void-stack/VMUnprotect\n[10] https://github.com/JonathanSalwan/Triton/blob/master/publications/DIMVA2018-slide-deobfuscation-salwan-bardin-potet.pdf\n[11] https://whereisr0da.github.io/blog/posts/2021-02-16-vmp-3/\n[12] https://github.com/pgarba/UniTaint\n[13] https://github.com/mrexodia/VMProtectTest\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathansalwan%2Fvmprotect-devirtualization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonathansalwan%2Fvmprotect-devirtualization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonathansalwan%2Fvmprotect-devirtualization/lists"}