{"id":13440305,"url":"https://github.com/jjyg/metasm","last_synced_at":"2025-12-29T22:51:41.832Z","repository":{"id":41497707,"uuid":"1017361","full_name":"jjyg/metasm","owner":"jjyg","description":"This is the main repository for metasm, a free assembler / disassembler / compiler written in ruby","archived":false,"fork":false,"pushed_at":"2023-12-29T03:43:44.000Z","size":18666,"stargazers_count":463,"open_issues_count":4,"forks_count":83,"subscribers_count":35,"default_branch":"master","last_synced_at":"2024-10-28T02:20:12.279Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://metasm.cr0.org/","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jjyg.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2010-10-23T10:42:26.000Z","updated_at":"2024-09-03T17:07:35.000Z","dependencies_parsed_at":"2024-10-27T23:46:29.460Z","dependency_job_id":"80e5db32-0476-42c6-a35e-05bfddbe65d8","html_url":"https://github.com/jjyg/metasm","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jjyg%2Fmetasm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jjyg%2Fmetasm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jjyg%2Fmetasm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jjyg%2Fmetasm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jjyg","download_url":"https://codeload.github.com/jjyg/metasm/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244586009,"owners_count":20476856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T03:01:21.556Z","updated_at":"2025-12-29T22:51:41.805Z","avatar_url":"https://github.com/jjyg.png","language":"Ruby","funding_links":[],"categories":["Ruby","others","\u003ca id=\"2df6d3d07e56381e1101097d013746a0\"\u003e\u003c/a\u003eDisassemble\u0026\u0026反汇编"],"sub_categories":["\u003ca id=\"59f472c7575951c57d298aef21e7d73c\"\u003e\u003c/a\u003e工具"],"readme":"Metasm, the Ruby assembly manipulation suite\n============================================\n\n* sample scripts in samples/ -- read comments at the beginning of the files\n* all files are licensed under the terms of the LGPL\n\nAuthor: Yoann Guillot \u003cjohn at ofjj.net\u003e\n\n\nBasic overview:\n\nMetasm allows you to interact with executables formats (ExeFormat):\nPE, ELF, Mach-O, Shellcode, etc.\nThere are three approaches to an ExeFormat:\n - compiling one up, from scratch\n - decompiling an existing format\n - manipulating the file structure\n\n\nReady-to-use scripts can be found in the samples/ subdirectory, check the\ncomments in the scripts headers. You can also try the --help argument if\nyou're feeling lucky.\n\nFor more information, check the doc/ subdirectory. The text files can be\ncompiled to html using the misc/txt2html.rb script.\n\n\n\nHere is a short overview of the Metasm internals.\n\n\nAssembly:\n\nWhen compiling, you start from a source text (ruby String, consisting\nmostly in a sequence of instructions/data/padding directive), which is parsed.\n\nThe string is handed to a Preprocessor instance (which handles #if, #ifdef,\n#include, #define, /* */ etc, should be 100% compatible with gcc -E), which is\nencapsulated in an AsmPreprocessor for assembler sources (to handles asm macro\ndefinitions, 'equ' and asm ';' comments).\nThe interface to do that is ExeFormat#parse(text[, filename, lineno]) or\nExeFormat.assemble (which calls .new, #parse and #assemble).\n\nThe (Asm)Preprocessor returns tokens to the ExeFormat, which parses them as Data,\nPadding, Labels or parser directives. Parser directives always start with a dot.\nThey can be generic (.pad, .offset...) or ExeFormat-specific (.section,\n.import, .entrypoint...). They are handled by #parse_parser_instruction().\nIf the ExeFormat does not recognize a word, it is handed to its CPU instance,\nwhich is responsible for parsing Instructions (or raise an exception).\nAll those tokens are stored in one or more arrays in the @source attribute of\nthe ExeFormat (Shellcode's @source is an Array, for PE/ELF it is a hash\n[section name] =\u003e [Array of parsed data])\nEvery immediate value can be an arbitrary Expression (see later).\n\nYou can then assemble the source to binary sections using ExeFormat#assemble.\n\nOnce the section binaries are available, the whole binary executable can be\nwritten to disk using ExeFormat#encode_file(filename[, format]).\n\nPE and ELF include an autoimport feature that allows automatic creation of\nimport-related data for known OS-specific functions (e.g. unresolved calls to\n'strcpy' will generate data so that the binary is linked against the libc\nlibrary at runtime).\n\nThe samples/{exe,pe,elf}encode.rb can take an asm source file as argument\nand compile it to a working executable.\n\nThe CPU classes are responsible for parsing and encoding individual\ninstructions. The current Ia32 parser uses the Intel syntax (e.g. mov eax, 42).\nThe generic parser recognizes labels as a string at the beginning of a line\nfollowed by a colon (e.g. 'some_label:'). GCC-style local labels may be used\n(e.g. '1:', refered to using '1b' (backward) or '1f' (forward) ; may be\nredefined as many times as needed.)\nData are specified using 'db'-style notation (e.g. 'dd 42h', 'db \"blabla\", 0')\nSee samples/asmsyntax.rb\n\n\nEncodedData:\n\nIn Metasm all binary data is stored as an EncodedData.\nEncodedData has 3 main attributes:\n - #data which holds the raw binary data (generally a ruby String, but see\nVirtualString)\n - #export which is a hash associating an export name (label name) to an offset\nwithin #data\n - #reloc which is a hash whose keys are offsets within #data, and whose values\nare Relocation objects.\nA Relocation object has an endianness (:little/:big), a type (:u32 for unsigned\n32bits) and a target (the intended value stored here).\nThe target is an arbitrary arithmetic/logic Expression.\n\nEncodedData also has a #virtsize (for e.g. .bss sections), and a #ptr (internal\noffset used when decoding things)\n\nYou can fixup an EncodedData, with a Hash variable name =\u003e value (value should\nbe an Expression or a numeric value). When you do that, each relocation's target\nis bound using the binding, and if the result is calculable (no external variable\nname used in the Expression), the result is encoded using the relocation's\nsize/sign/endianness information. If it overflows (try to store 128 in an 8bit\nsigned relocation), an EncodeError exception is raised. Use the :a32 type to\nallow silent overflow truncating.\nIf the relocation's target is not numeric, the target is unchanged if you use \nEncodedData#fixup, or it is replaced with the bound target with #fixup! .\n\n\nDisassembly:\n\nThis code is found in the metasm/decode.rb source file, which defines the\nDisassembler class.\n\nThe disassembler needs a decoded ExeFormat (to be able to say what data is at\nwhich virtual address) and an entrypoint (a virtual address or export name).\nIt can then start to disassemble instructions. When it encounters an\nOpcode marked as :setip, it asks the CPU for the jump destination (an\nExpression that may involve register values, for e.g. jmp eax), and backtraces\ninstructions until it finds the numeric value.\n\nOn decoding, the Disassembler maintains a #decoded hash associating addresses\n(expressions/integer #normalize()d) to DecodedInstructions.\n\nThe disassembly generates an InstructionBlock graph. Each block holds a list of\nDecodedInstruction, and pointers to the next/previous block (by address).\n\nThe disassembler also traces data accesses by instructions, and stores Xrefs\nfor them.\nThe backtrace parameters can be tweaked, and the maximum depth to consider\ncan be specifically changed for :r/:w backtraces (instruction memory xrefs)\nusing #backtrace_maxblocks_data.\nWhen an Expression is backtracked, each walked block is marked so that loops\nare detected, and so that if a new code path is found to an existing block,\nbacktraces can be resumed using this new path.\n\nThe disassembler makes very few assumptions, and in particular does not\nsuppose that functions will return ; they will only if the backtrace of the\n'ret' instructions is conclusive. This is quite powerful, but also implies\nthat any error in the backtracking process can lead to a full stop ; and also\nmeans that the disassembler is quite slow.\n\nThe special method #disassemble_fast can be used to work around this when the\ncode is known to be well-formed (ie it assumes that all calls returns)\n\nWhen a subfunction is found, a special DecodedFunction is created, which holds\na summary of the function's effects (like a DecodedInstruction on steroids).\nThis allows the backtracker to 'step over' subfunctions, which greatly improves\nspeed. The DecodedFunctions may be callback-based, to allow a very dynamic\nbehaviour.\nExternal function calls create dedicated DecodedFunctions, which holds some\nAPI information (e.g. stack fixup information, basic parameter accesses...)\nThis information may be derived from a C header parsed beforehand.\nIf no C function prototype is available, a special 'default' entry is used,\nwhich assumes that the function has a standard ABI.\n\nIa32 implements a specific :default entry, which handles automatic stack fixup\nresolution, by assuming that the last 'call' instruction returns. This may lead\nto unexpected results ; for maximum accuracy a C header holding information for\nall external functions is recommanded (see samples/factorize-headers-peimports\nfor a script to generate such a header from a full Visual Studio installation\nand the target binary).\n\nIa32 also implements a specific GetProcAddress/dlsym callback, that will\nyield the correct return value if the parameters can be backtraced.\n\nThe scripts implementing a full disassembler are samples/disassemble{-gui}.rb\nSee the comments for the GUI key bindings.\n\n\nExeFormat manipulation:\n\nYou can encode/decode an ExeFormat (ie decode sections, imports, headers etc)\n\nConstructor: ExeFormat.decode_file(str), ExeFormat.decode_file_header(str)\nMethods: ExeFormat#encode_file(filename), ExeFormat#encode_string\n\nPE and ELF files have a LoadedPE/LoadedELF counterpart, that are able to work\nwith memory-mmaped versions of those formats (e.g. to debug running\nprocesses)\n\n\nVirtualString:\n\nA VirtualString is a String-like object: you can read and may rewrite slices of\nit. It can be used as EncodedData#data, and thus allows virtualization\nof most Metasm algorithms.\nYou cannot change a VirtualString length.\nTaking a slice of a VirtualString will return either a String (for small sizes)\nor another VirtualString (a 'window' into the other). You can force getting a\nsmall VirtualString using the #dup(offset, length) method.\nAny unimplemented method called on it is forwarded to a frozen String which is\na full copy of the VirtualString (should be avoided if possible, the underlying\nstring may be very big \u0026 slow to access).\n\nThere are currently 3 VirtualStrings implemented:\n- VirtualFile, whichs loads a file by page-sized chunks on demand,\n- WindowsRemoteString, which maps another process' virtual memory (uses the\nwindows debug api through WinDbgAPI)\n- LinuxRemoteString, which maps another process' virtual memory (need ptrace\nrights, memory reading is done using /proc/pid/mem)\n\nThe Win/Lin version are quite powerful, and allow things like live process\ndisassembly/patching easily (using LoadedPE/LoadedELF as ExeFormat)\n\n\nDebugging:\n\nMetasm includes a few interfaces to handle debugging.\nThe WinOS and LinOS classes offer access to the underlying OS processes (e.g.\nOS.current.find_process('foobar') will retrieve a running process with foobar\nin its filename ; then process.mem can be used to access its memory.)\n\nThe Windows and Linux low-level debugging APIs have a basic ruby interface\n(PTrace and WinAPI) ; which are used by the unified high-end Debugger class.\nRemote debugging is supported through the GDB server wire protocol.\n\nHigh-level debuggers can be created with the following ruby line:\nMetasm::OS.current.create_debugger('foo')\n\nOnly one kind of host debugger class can exist at a time ; to debug multiple\nprocesses, attach to other processes using the existing class. This is due\nto the way the OS debugging API works on Windows and Linux.\n\nThe low-level backends are defined in the os/ subdirectory, the front-end is\ndefined in debug.rb.\n\nA linux console debugging interface is available in samples/lindebug.rb ; it\nuses a (simplified) SoftICE-like look and feel.\nIt can talk to a gdb-server socket ; use a [udp:]\u003chost:port\u003e target.\n\nThe disassembler-gui sample allow live process interaction when using as\ntarget 'live:\u003cpid or part of program name\u003e'.\n\n\nC Parser:\n\nMetasm includes a hand-written C Parser.\nIt handles all the constructs i am aware of, except hex floats:\n - static const L\"bla\"\n - variable arguments\n - incomplete types\n - __attributes__(()), __declspec()\n - #pragma once\n - #pragma pack()\n - C99 declarators - type bla = { [ 2 ... 14 ].toto = 28 };\n - Nested functions\n - __int8 etc native types\n - Label addresses (\u0026\u0026label)\nAlso note that all those things are parsed, but most of them will fail to\ncompile on the Ia32/X64 backend (the only one implemented so far.)\n\nParsing C files should be done using an existing ExeFormat, with the\nparse_c_file method. This ensures that format-specific macros/ABI are correctly\ndefined (ex: size of the 'long' type, ABI to pass parameters to functions, etc)\n\nWhen you parse a C String using C::Parser.parse(text), you receive a Parser\nobject. It holds a #toplevel field, which is a C::Block, which holds #structs,\n#symbols and #statements. The top-level functions are found in the #symbol hash\nwhose keys are the symbol names, associated to a C::Variable object holding\nthe functions. The function parameter/attributes are accessible through\nfunc.type, and the code is in func.initializer, which is itself a C::Block.\nUnder it you'll find a tree-like structure of C::Statements (If, While, Asm,\nCExpressions...)\n\nA C::Parser may be #precompiled to transform it into a simplified version that\nis easier to compile: typedefs are removed, control sequences are transformed\ninto 'if (XX) goto YY;' etc.\n\nTo compile a C program, use PE/ELF.compile_c, that will create a C::Parser with\nexe-specific macros defined (eg __PE__ or __ELF__).\n\nVendor-specific headers may need to use either #pragma prepare_visualstudio\n(to parse the Microsoft Visual Studio headers) or prepare_gcc (for gcc), the\nlatter may be auto-detected (or may not).\nVendor headers tested are VS2003 (incl. DDK) and gcc4 ; ymmv.\n\nCurrently the CPU#compilation of a C code will generate an asm source (text),\nwhich may then be parsed \u0026 assembled to binary code.\n\nSee ExeFormat#compile_c, and samples/exeencode.rb\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjjyg%2Fmetasm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjjyg%2Fmetasm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjjyg%2Fmetasm/lists"}