{"id":22639992,"url":"https://github.com/nac-l/mergen","last_synced_at":"2026-04-02T17:51:58.044Z","repository":{"id":227901729,"uuid":"718127083","full_name":"NaC-L/Mergen","owner":"NaC-L","description":"Deobfuscation via optimization with usage of LLVM IR and parsing assembly.","archived":false,"fork":false,"pushed_at":"2025-05-08T14:50:34.000Z","size":2614,"stargazers_count":570,"open_issues_count":14,"forks_count":56,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-05-08T23:41:38.649Z","etag":null,"topics":["deobfuscation","devirtualization","llvm","optimization"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NaC-L.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"NaC-L","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"custom":null}},"created_at":"2023-11-13T12:46:00.000Z","updated_at":"2025-05-08T17:45:44.000Z","dependencies_parsed_at":"2024-03-30T02:22:18.085Z","dependency_job_id":"f3061dfb-f866-4632-9f7f-f66e86b8ec11","html_url":"https://github.com/NaC-L/Mergen","commit_stats":null,"previous_names":["nac-l/mergen"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NaC-L%2FMergen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NaC-L%2FMergen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NaC-L%2FMergen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NaC-L%2FMergen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NaC-L","download_url":"https://codeload.github.com/NaC-L/Mergen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254355334,"owners_count":22057354,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deobfuscation","devirtualization","llvm","optimization"],"created_at":"2024-12-09T04:07:43.832Z","updated_at":"2026-04-02T17:51:58.038Z","avatar_url":"https://github.com/NaC-L.png","language":"C++","funding_links":["https://github.com/sponsors/NaC-L"],"categories":[],"sub_categories":[],"readme":"# Project Overview:\nMergen is a tool engineered to convert Assembly code into LLVM Intermediate Representation (IR).\nThis tool is designed for:\n- The deobfuscation or devirtualization of obfuscated binary code\n- The enhancement of the reverse engineering process, making it more efficient and effective, especially for complex software systems.\n\n## Guide to build \u0026 run\n\nTo build and run the project, take a look at [**docs/BUILDING.md**](https://github.com/NaC-L/Mergen/blob/main/docs/BUILDING.md).\n\n## Rewrite baseline gate\n\nRewrite work should keep the baseline regression gate green. The gate builds focused PE samples, runs `lifter`, and verifies lifted IR outputs.\n\n- Workflow doc: [**docs/REWRITE_BASELINE.md**](docs/REWRITE_BASELINE.md)\n- One-command gate: `scripts\\\\rewrite\\\\run.cmd`\n\n## Core Objectives:\n\n- ### Deobfuscation\n\n- ### Devirtualization\n\n- ### Optimization\n\n## How does it work?\n\nWe symbolicly execute (or symbolicly lift) the target, the idea here is not lifting individual instructions, but lifting a whole function. We dont expect one instruction nor one basic block to behave same each time, instead treat them like they can be and are for different purposes each time. We try to keep the generated IR simple and optimizeable as possible. We also have different needs than an usual compiler. We use analysis to evaluate control flow. We can't depend on LLVM for all of our analysis, because they are created for different goals and could be unoptimal for our use-case. \n\n![image](images/graph.png)\n\n## Examples\n\nThis is the practical example to illustrate how Mergen solves against virtualized programs.\n\n1. [VMProtect](#example-1-vmprotect)\n2. [Branches/Jumptables](#example-2-branchesjumptables)\n3. [Themida 3.1.6.0 LION64 (Red)](#example-3-themida-3160-lion64-red)\n\n### Example #1 (VMProtect)\n\nThis is our target program\n\n```cpp\nstruct test {\n    int a;\n    int b;\n    int c;\n};\n\nint maths(test a, int b, int c) {\n        return a.a  + b - c;\n}\n```\n![image](images/org_disass.png)\n\n![image](images/org_decomp.png)\n\nVMProtect settings, everything is turned off, we virtualize the function on ultra setting. (Tested versions 3.4.0-3.6.0 3.8.1)\n\n![image](images/vmp_settings1.png)\n\n![image](images/vmp_settings2.png)\n\nHere, we run mergen. First argument is the name of the file and the second argument is the address of the function. Look how simple it is to run. And we can compile the output so we can explore it using our favorite decompiler.\n\n![image](images/run_mergen.PNG)\n\n```llvm\n; ModuleID = 'my_lifting_module'\nsource_filename = \"my_lifting_module\"\n\n; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read)\ndefine i64 @main(i64 %rax, i64 %rcx, i64 %rdx, i64 %rbx, i64 %0, i64 %rbp, i64 %rsi, i64 %rdi, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, ptr nocapture readonly %memory) local_unnamed_addr #0 {\nentry:\n  %stackmemory = alloca i128, i128 13758960, align 8\n  %1 = trunc i64 %r8 to i32\n  %2 = trunc i64 %rdx to i32\n  %GEPLoadxd-5369456437- = getelementptr i8, ptr %memory, i64 %rcx\n  %3 = load i32, ptr %GEPLoadxd-5369456437-, align 4\n  %adc-temp-5370242400- = sub i32 %2, %1\n  %realnot-5369532059- = add i32 %adc-temp-5370242400-, %3\n  %stackmemory10243.sroa.55.1375304.insert.ext10255 = zext i32 %realnot-5369532059- to i64\n  ret i64 %stackmemory10243.sroa.55.1375304.insert.ext10255\n}\n\nattributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read) }\n```\n\nAfter compiling:\n\n![image](images/mergen_disass.png)\n\n![image](images/mergen_dec.png)\n\nNow you might notice the registers are a little bit off. This is because of we dont follow the calling conventions, if we were to follow the calling conventions, function signature would look like this:\n```llvm\ndefine i64 @main(i64 %rcx, i64 %rdx, i64 %rdx, i64 %r8, i64 %r9 ...)\n```\nSo, we just adjust the function signature to look normally. If you have more questions about this part, I suggest you research [calling conventions](https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing) and [ABI](https://learn.microsoft.com/en-us/cpp/build/x64-software-conventions?view=msvc-170\u0026source=recommendations#register-volatility-and-preservation).\n\n## Example #2 (Branches/Jumptables)\nSo, lets say we have this code. VM's will take the below code then turn it to an indirect jump, its slightly more unconvenient for the reverser.\n```cpp\nint maths(int a, int b, int c) {\n    if (a \u003e b)\n        return a + b + c;\n    else\n        return a - b - c;\n}\n```\n\n\n```\nnext_handler = xxx;\nif ( a-b \u003e 0 )\n  next_handler = yyy;\njump next_handler;\n```\n\nWe try to always analyze values and keep track of them. This allows us to understand control flow. \n[For jumptable-like branches](https://github.com/NaC-L/Mergen/blob/experimental-pattern-matching/testcases/test_branches.asm)\nOptimized output would be a simple\n```llvm\ndefine i64 @main(i64 %rax, i64 %rcx, i64 %rdx, i64 %rbx, i64 %rsp, i64 %rbp, i64 %rsi, i64 %rdi, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, ptr nocapture readnone %TEB, ptr nocapture readnone %memory) local_unnamed_addr #0 {\nfake_ret:\n  %0 = lshr i64 %rcx, 62\n  %common.ret.op = and i64 %0, 2\n  ret i64 %common.ret.op\n}\n```\nUnoptimized output. (DCE'd for readability)\n```llvm\nsource_filename = \"my_lifting_module\"\n\ndefine i64 @main(i64 %rax, i64 %rcx, i64 %rdx, i64 %rbx, i64 %rsp, i64 %rbp, i64 %rsi, i64 %rdi, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, ptr %TEB, ptr %memory) {\n  %lsb = and i64 %rcx, 255\n  %pf1 = mul i64 %lsb, 72340172838076673\n  %pf2 = and i64 %pf1, -9205322385119247871\n  %pf3 = urem i64 %pf2, 511\n  %pf4 = and i64 %pf3, 1\n  %pf5 = icmp eq i64 0, %pf4\n  %0 = zext i1 %pf5 to i64\n  %createrflag2 = shl i64 %0, 2\n  %creatingrflag = or i64 2, %createrflag2\n  %zeroflag = icmp eq i64 %rcx, 0\n  %1 = zext i1 %zeroflag to i64\n  %createrflag21 = shl i64 %1, 6\n  %creatingrflag2 = or i64 %creatingrflag, %createrflag21\n  %signflag = icmp slt i64 %rcx, 0\n  %2 = zext i1 %signflag to i64\n  %createrflag23 = shl i64 %2, 7\n  %creatingrflag4 = or i64 %creatingrflag2, %createrflag23\n  %GEPSTORE-5368713221- = getelementptr i8, ptr %memory, i64 1376032\n  store i64 %creatingrflag4, ptr %GEPSTORE-5368713221-, align 4\n  %realand-5368713229- = and i64 %creatingrflag4, 128\n  %shr-lshr-5368713233- = lshr i64 %realand-5368713229-, 7\n  %3 = mul i64 %shr-lshr-5368713233-, 4\n  %bvalue_indexvalue = add i64 5368713249, %3\n  %4 = icmp eq i64 %bvalue_indexvalue, 5368713253\n  %lolb- = select i1 %4, i64 5368713264, i64 5368713257\n  %GEPSTORE-5368713248- = getelementptr i8, ptr %memory, i64 1376032\n  store i64 %lolb-, ptr %GEPSTORE-5368713248-, align 4\n  br i1 %4, label %real_ret, label %real_ret41\n\nreal_ret:                                         ; preds = %fake_ret\n  %inc-5368713273- = add i64 %shr-lshr-5368713233-, 1\n  ret i64 %inc-5368713273-\n\nreal_ret41:                                       ; preds = %fake_ret\n  ret i64 %shr-lshr-5368713233-\n}\n```\nNotice this part\n```\n  %realand-5368713229- = and i64 %creatingrflag4, 128\n  %shr-lshr-5368713233- = lshr i64 %realand-5368713229-, 7\n```\nWe get the flags, then we get the 7th bit which is Sign Flag, then we use the Sign Flag to calculate an address. Through analysis, we determine the address could be one of two values, `5368713257` or `5368713264`, then we turn that into a comparison. If address is `5368713257`, take one branch, if other, take another. When doing this, it is also important to mark the condition appopriate value because later, we might need to calculate another jump with the same exact value.  \n\nEven though we solve the indirect jumps, jumps with more than 2 possible location are not supported. This is because the analysis for them are not implemented yet. This allows us to solve the vm-style branches, but have problem with real life jumptables.\n\n### Example #3 (Themida 3.1.6.0 LION64 (Red))\nOur target program:\n\n![image](images/themida_disas_b.png)\n\nThemida settings (we only care about vms atm):\n\n![image](images/themida_vm_v.png)\n\n![image](images/themidavm.png)\n\n![image](images/themidavm_settings.png)\n\nAfter vm:\n\n![image](images/themida_disas_v.png)\n\nRunning Mergen:\n\n![image](images/running_on_themida.png)\n\nOutput code: [click here](docs/themida_output.ll)\nSo, why our result is not succesful as lifting a binary thats protected by vmp?\n\nThemida actively writes on .themida section. Unlike stack, we cant disregard these writes, because these values might be read by other stuff later.\n\nBut, we have a temporary solution to that. Remove all stores into .themida section. Since our program doesnt write into memory, [I just commented all the stores.](docs/themida_output_lazy_fix.ll) Now we are left with this:\n\n```llvm\nsource_filename = \"my_lifting_module\"\n\ndefine i64 @main(i64 %rax, i64 %rcx, i64 %rdx, i64 %rbx, i64 %rsp, i64 %rbp, i64 %rsi, i64 %rdi, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, ptr writeonly %memory) local_unnamed_addr #0 {\n  %trunc = trunc i64 %r8 to i32\n  %trunc1 = trunc i64 %rdx to i32\n  %trunc2 = trunc i64 %rcx to i32\n  %realadd-5369771371- = add i32 %trunc1, %trunc2\n  %realadd-5369582686- = add i32 %realadd-5369771371-, %trunc\n  %trunc457139 = zext i32 %realadd-5369582686- to i64\n  ret i64 %trunc457139\n}\n\nattributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) }\n```\n# Technical challenges\n- Loops\n- Self Modifying Code ( especially with conditional modification)\n- Being in an universe where \"outlining\" and \"unrolling\" passes doesnt exist. \n\n\n# Getting in touch\nJoin our [Mergen Discord Server](https://discord.gg/e3eftYguqB) to trade ideas or just chatting in general.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnac-l%2Fmergen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnac-l%2Fmergen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnac-l%2Fmergen/lists"}