{"id":19383039,"url":"https://github.com/hectorta1989/8051reverseengineering","last_synced_at":"2026-06-13T04:31:43.847Z","repository":{"id":147992987,"uuid":"419716753","full_name":"HectorTa1989/8051ReverseEngineering","owner":"HectorTa1989","description":"Applications for reverse engineering architecture 8051 firmware","archived":false,"fork":false,"pushed_at":"2021-10-27T01:45:54.000Z","size":30,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-24T17:15:17.576Z","etag":null,"topics":["8051","hacking","hacktoberfest","reverse-engineering","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HectorTa1989.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-21T12:37:13.000Z","updated_at":"2021-10-27T01:45:57.000Z","dependencies_parsed_at":"2023-05-28T04:45:19.398Z","dependency_job_id":null,"html_url":"https://github.com/HectorTa1989/8051ReverseEngineering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HectorTa1989/8051ReverseEngineering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HectorTa1989%2F8051ReverseEngineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HectorTa1989%2F8051ReverseEngineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HectorTa1989%2F8051ReverseEngineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HectorTa1989%2F8051ReverseEngineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HectorTa1989","download_url":"https://codeload.github.com/HectorTa1989/8051ReverseEngineering/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HectorTa1989%2F8051ReverseEngineering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34272603,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["8051","hacking","hacktoberfest","reverse-engineering","rust"],"created_at":"2024-11-10T09:24:17.468Z","updated_at":"2026-06-13T04:31:43.831Z","avatar_url":"https://github.com/HectorTa1989.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 8051ReverseEngineering\n\nA bunch of applications for the purpose of reverse engineering 8051 firmware.\nCurrently, there are four applications:\n* `stat`, which gives blockwise statistical information about how similar a given file's opcode distribution is to normal 8051 code\n* `base`, which determines the load address of a 8051 firmware file\n* `libfind`, which reads library files and scans the firmware file for routines from those files\n* `kinit`, which reads a specific init data structure generated by the C51 compiler\n\nThe output of each subcommand can also be used in other programs via JSON.\n\n## Installation\nDownloadable releases should be on the release page of the github repository.\n\nIn order to compile manually, only cargo is needed, which can be installed with [rustup](https://rustup.rs/).\nWith cargo one can install it with `cargo install 8051ReverseEngineering`.\n\nAlternatively, to install from the repository source, do\n```\ngit clone 'https://github.com/HectorTa1989/8051ReverseEngineering.git'\ncargo install --path 8051ReverseEngineering\n```\n\n## stat\nThis subprogram is useful for determining which regions of a file are probably 8051.\nIf you want to determine the architecture of a file in general, a useful tool might be [cpu_rec](https://github.com/airbus-seclab/cpu_rec).\n\nThis subcommand does some statistics on the firmware.\nIt steps through the file as if it was a continuous instruction stream and does some tests on those instructions.\nThe image is divided into equal-sized blocks and the value of the test for each block (which by default has a size of 512) is given back.\nThat means it is normally more suited for bigger images (in this context, something like \u003e4kB) where you want to know which regions are probably 8051 codes and which are data.\n\nBy default, it calculates the aligned jump test, which gives the percentage of relative jump instructions where the jump target is not on a start of an instructions.\nThis has a value of 0 to 1, where 0 is better and it generally works well, but has a lot of NaN on streams of 0s and similiar repeated instructions, as there are no jumps in those blocks.\nIf the location is entirely 8051 code, it should have a value of 0 (although someone might do some hacks with unaligned jumps), but it can contain small jump tables and therefore is sometimes not exactly zero, but still should be fairly low (\u003c0.1).\nOne can additionally show the number of jumps used with the `-n` flag to know how certain the value is.\nFurthermore, two other flags `-A` and `-O` exist, where the first one also includes absolute jumps in the calculation (useful if the file is already aligned and there are not enough jumps) and the second one includes jumps to outside the firmware image as misses (useful with `-A` if one knows there is no code outside the firmware and the firmware file does not cover the whole address space).\n\nIt can also do a blockwise Kullback-Leibler divergence on the distribution on the opcodes, which means each block has a value from 0 to 1, 0 being the most 8051-like.\nA default distribution derived from a corpus I did is included (which I can probably not publish due to copyright issues), but you can set your own corpus with the `-c` option.\nWith that metric, \u003c0.06 usually means it is 8051 code, 0.06-0.12 means it is probably either 8051 with some data in it (like a jump table) or it is unusual (maybe a small set of instructions repeated a lot of times).\nNote that random data is only at roughly 0.25, so the Kullback-Leibler might not be very reliable.\n\nAn alternative is a chi squared test on the distribution of opcodes, which is can have a value bigger than 1 and is not constrained in its values.\nBut as a downside, it is harder to say what ranges usually are 8051 code, as that changes for example with blocksize.\nIt is useful for comparing the 8051-ness of different blocks and is normally more reliable thatn Kullback-Leibler divergence in that case.\nAlso note that I have no experience in statistics so I may be doing things wrong.\n\nOne can also set the standard metric that gets used when no option is given in the [config](#config) under the name `stat_mode` with either `AlignedJump`, `SquareChi` or `KullbackLeibler`.\n\nI normally do not need the second or third option (Kullback-Leibler or chi squared) and they exist mostly because I didn't implement the first test until later.\n\nOne can use the output as the input for gnuplot, for example with\n```bash\n8051ReverseEngineering stat path/to/firmware | gnuplot -p -e \"plot '-' with lines\"\n```\n## base\nThis application tries to determine the load address of a firmware image (which in the best case only includes the actual firmware that will be on the device).\nIt loads the first 64k of a given file and for each offset from `0` to `0x10000` determines how many `ljmp`s/`lcall`s jump right behind `ret` instructions, as that is the place where new functions normally starts.\nThe offset can also be interpreted cyclically inside the 16-bit space (with `-c`), which means that at offset 0xffe0, the first 0x20 bytes are loaded at 0xffe0-0xffff and the rest is then loaded at the start of the address space.\nThe likeliness of the output is the amount of jumps and calls that target instructions right behind `ret`s, as in this example:\n```\nIndex by likeliness:\n\t1: 0x3fe0 with 218\n\t2: 0xc352 with 89\n\t3: 0xd096 with 87\n```\nHere the most likely load location is 0x3fe0, as it has 218 fitting `ljmp`/`lcall` instructions, in contrast to the only 89 instructions or 87 instructions of the second and third case.\nIn the example given, the load location of this particular 0x3fe0 address is caused by a 0x20 byte header and the code itself starts at 0x4000.\n\nNormally, `acall`/`ajmp` are ignored since this introduces a lot of noise by non-code data (1/16th of the 8051's instruction set is `acall`/`ajmp`) and can be enabled with the `-a` flag, but make sure that noisy/non-8051 parts of the files (as detectable with entrpoy and the `stat` application) are zeroed-out.\n\nOne can also use multiple firmware images where one knows that they are loaded at the same location (useful for smaller images where also different revisions exist), in which case the arithmetic mean of the fitting instructions on each offset is calculated.\n## libfind\nThis application loads some libraries given by the user and tries to find the standard library functions inside the firmware.\nRight now, OMF-51 libraries from C51 (which is the compiler of most firmwares in my experience) and sdld libraries from sdcc are supported\n\nIn general, library files contains some bytes of the library functions and then some \"fixup\" locations which are changed at linking time and are often targets of jumps.\nThey are normally divided into different segments and each segment can have public symbols defined for itself and each fixup location can reference other segments by id or public symbol.\n\nFor each segment, the occurences of it are found by comparing the bytes of the non-fixup locations against each possible location in the firmware.\nIt then tries to verify that it is actually the segment by following the fixups (which can be done by reading the values in the firmware that are at the fixup location) and determining if the referenced segments are at the targets referenced by the firmware.\n\nThe public symbols of each matching segment is then output, along with its location and sometimes a description.\nIf some referenced segment is not there, it is output in square brackets to signify that.\nOn the other hand, if a segment is referenced but not actually there, that is output in parentheses (this is mostly useful for finding main, as it cannot be included in the libraries, but is referenced).\nIf there are multiple segments matching, but one matches better (nothing \u003e square brackets \u003e parentheses), only the ones that match best are output.\n\nTo illustrate this, consider these three segments:\n```\nsegment 0: 01 23 45 XX XX 67\n           public symbol: \"sym1\"\n           fixup XX XX: 16-bit absolute code reference to segment 1\nsegment 1: 89 AB CD EF\nsegment 2: 01 23 45 00 08\n           public symbol: \"sym2\"\n```\nAnd then the code\n```\n      0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F\n0000: 02 25 54 01 23 45 00 12 67 52 36 14 46 39 45 23\n0010: 00 00 89 AB CD EF 33 01 23 45 00 08 67 25 34 12\n```\nThe program would search for the segments and would find segment 0 at locations 0x03 and 0x17, segment 1 at location 0x12 and segment 2 at location 0x17.\nIt would then verify the fixups for all segment occurances:\n * The segment 0 at location 0x03 has `00 12` at the fixup location, which interpreted as an absolute 16-bit address points to 0x0012, where segment 1 is.\n   Thus it is valid.\n * The segment 0 at location 0x17 points to 0x0008, however there is no occurence of segment, so it is put in brackets.\n * The segment 1 is valid, but has no public symbol and thus is not output.\n   This is mostly the case with auxillary segments inside a module and outputting them would not really give any insight.\n * The segment 2 is valid and has sym2 as public symbol.\n   It overshadows the occurence of segment 0 at the same location, as it does not have valid references.\n\nThe output would then be\n```\nAddress | Name                 | Description\n0x0003    sym1\n0x0017    sym2\n```\n\nFor C51, the relevant libraries are of the form C51\\*.LIB (not C[XHD]51\\*.LIB) and can currently be found on the internet just by searching for them (one name that might pop up is C51L.LIB), but you can of course also try to download the trial version of C51 to get the libraries from there.\n\nWhen searching for functions in a C51-compiled firmware, one thing that will often pop up is a `[?C_START]` and a `(MAIN)`.\nThis is because the compiler inserts a function called `?C_START` before main which loads variable variable from a data structure, which can be read by `at51 kinit`.\n`?C_START` is in square brackets because it references `MAIN`, which of course is not a library function, which is the same reason `(MAIN)` is in parentheses.\n\nFor sdcc, the relevant libraries are normally found at `/usr/share/sdcc/lib/{small,small-stack-auto,medium,large,huge}/` if you have a linux sdcc installation.\nNote that noise with sdcc libraries might be higher, as the fixup locations in the library files do not specify whether the target is in the code, imem etc. address space.\n\nIt is recommended to align the file to its load address before using this, since absolute locations may fail to verify otherwise.\nSegments shorter than 4 bytes are not output, since they provide much noise and don't really add any info.\n\nA list of libraries to use if no others are given as argument can be specified in the [config](#config) using the field `\"libraries\"` containing a list of library paths.\n\n### Example (on some random wifi firmware)\n\nWith `8051ReverseEngineering libfind some_random_firmware /path/to/lib/dir/`:\n```\nAddress | Name                 | Description\n0x4220    ?C?CLDOPTR             char (8-bit) load from general pointer with offset\n0x424d    ?C?CSTPTR              char (8-bit) store to general pointer\n0x425f    ?C?CSTOPTR             char (8-bit) store to general pointer with offset\n0x4281    ?C?IILDX              \n0x4297    ?C?ILDPTR              int (16-bit) load from general pointer\n0x42c2    ?C?ILDOPTR             int (16-bit) load from general pointer with offset\n0x42fa    ?C?ISTPTR              int (16-bit) store to general pointer\n0x4319    ?C?ISTOPTR             int (16-bit) store to general pointer with offset\n0x4346    ?C?LOR                 long (32-bit) bitwise or\n0x4353    ?C?LLDXDATA            long (32-bit) load from xdata\n0x435f    ?C?OFFXADD            \n0x436b    ?C?PLDXDATA            general pointer load from xdata\n0x4374    ?C?PLDIXDATA           general pointer post-increment load from xdata\n0x438b    ?C?PSTXDATA            general pointer store to xdata\n0x4394    ?C?CCASE              \n0x43ba    ?C?ICASE              \n0x46f5    [?C_START]            \n0x50e1    (MAIN)                \n```\n\nFor some symbol names, which are in a general form, there are descriptions available.\n\n## kinit\nThis application is very specific to C51 generated code in that it decodes a specific data structure used to initialize memory values on startup.\nThe structure is read by the `?C_START` procedure and the location of the structure can therefore usually be found by running libfind and looking at the two bytes after the start of `?C_START` (since it starts with a `mov dptr, #structure_address`).\nWhen `(?C_START)` is in parentheses, this is probably not the case, as `?C_START` is referenced by the `ljmp` at location 0 in the keil libraries, which happens to be the instruction at the start of most 8051 firmwares even if there is no `?C_START` function.\n\n### Example\n\nWith `8051ReverseEngineering kinit -o offset some_random_firmware`:\n```\nbit 29.6 = 0\nidata[0x5a] = 0x00\nxdata[0x681] = 0x00\nxdata[0x67c] = 0x00\nxdata[0x692] = 0x00\nxdata[0x6aa] = 0x01\nxdata[0x46f] = 0x00\nbit 27.2 = 0\nbit 27.0 = 0\nbit 26.3 = 0\nbit 26.1 = 0\nxdata[0x47d] = 0x00\nxdata[0x40c] = 0x00\nbit 25.3 = 0\nxdata[0x46d] = 0x00\nidata[0x5c] = 0x00\nxdata[0x403..0x40a] = [0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]\nxdata[0x467] = 0x00\n```\n## Config\nA (rudimentary) config file in json format can be created at `$CONFIG_PATH/8051ReverseEngineering/config.json`, where `$CONFIG_PATH` depends on the OS.\nFollowing paths are normally used:\n* `~/.config` for Linux\n* `~/Library/Preferences` for macOS\n* `~/AppData/Roaming` for Windows\n\nExample config:\n```\n{\n\t\"libraries\": [\n    \"/usr/share/sdcc/lib/small\",\n    \"/usr/share/sdcc/lib/medium\",\n    \"/usr/share/sdcc/lib/large\",\n    \"/usr/share/sdcc/lib/huge\",\n    \"/opt/C51/LIB\"\n  ],\n\t\"stat_mode\": \"AlignedJump\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhectorta1989%2F8051reverseengineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhectorta1989%2F8051reverseengineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhectorta1989%2F8051reverseengineering/lists"}