{"id":13725034,"url":"https://github.com/binref/refinery","last_synced_at":"2025-05-07T19:32:30.803Z","repository":{"id":39916367,"uuid":"228019736","full_name":"binref/refinery","owner":"binref","description":"High Octane Triage Analysis","archived":false,"fork":false,"pushed_at":"2025-05-03T12:17:16.000Z","size":18604,"stargazers_count":725,"open_issues_count":2,"forks_count":69,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-05-03T13:26:21.193Z","etag":null,"topics":["commandline","compression","cryptography","malware-analysis","triage"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/binref.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-14T12:32:06.000Z","updated_at":"2025-05-03T12:17:20.000Z","dependencies_parsed_at":"2023-10-30T15:36:13.216Z","dependency_job_id":"4f6ce576-8673-4ce5-82af-736d59bb6434","html_url":"https://github.com/binref/refinery","commit_stats":{"total_commits":2305,"total_committers":12,"mean_commits":"192.08333333333334","dds":0.008676789587852451,"last_synced_commit":"ced20d4711f193f6afe5a41e4d701f8073d5d692"},"previous_names":[],"tags_count":193,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binref%2Frefinery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binref%2Frefinery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binref%2Frefinery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/binref%2Frefinery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/binref","download_url":"https://codeload.github.com/binref/refinery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252943863,"owners_count":21829327,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["commandline","compression","cryptography","malware-analysis","triage"],"created_at":"2024-08-03T01:02:10.429Z","updated_at":"2025-05-07T19:32:30.776Z","avatar_url":"https://github.com/binref.png","language":"Python","readme":"# Binary Refinery\n[![Documentation](.docbadge.svg)][docs]\n[![Test Status](https://github.com/binref/refinery/actions/workflows/test.yml/badge.svg)][tests]\n[![Code Coverage](https://codecov.io/gh/binref/refinery/branch/master/graph/badge.svg)][codecov]\n[![PyPI Version](https://badge.fury.io/py/binary-refinery.svg)][pypi]\n```\n  __     __  High Octane Triage Analysis          __\n  ||    _||______ __       __________     _____   ||\n  ||    \\||___   \\__| ____/   ______/___ / ____\\  ||\n==||=====||  | __/  |/    \\  /==|  / __ \\   __\\===]|\n  '======||  |   \\  |   |  \\_  _| \\  ___/|  |     ||\n         ||____  /__|___|__/  / |  \\____]|  |     ||\n=========''====\\/=========/  /==|__|=====|__|======'\n                         \\  /\n                          \\/\n```\nThe Binary Refinery\u0026trade; is a collection of Python scripts that implement transformations of binary data such as compression and encryption.\nWe will often refer to it simply by _refinery_, which is also the name of the corresponding package.\nThe scripts are designed to exclusively read input from stdin and write output to stdout.\nThe main philosophy is that every script should be a unit in the sense that it does _one_ job,\nand individual units can be combined into _pipelines_ with the piping operator `|` on the commandline to perform more complex tasks.\nThe project's main focus is malware triage,\nand is an attempt to implement something like [CyberChef](https://github.com/gchq/CyberChef) on the commandline.\n\n## Short Version\n\nMake a Python virtual environment. You need Python 3.8 or later. Install refinery like this:\n```\npip install binary-refinery[extended]\n```\nRun units with `-h` to learn how they work, grep through the [docs][] or use the command `binref` to find them.\nWatch [the latest video][VOD3] if you want to see it in action.\nBut also, read the rest of this readme.\n\n## Release Schedule\n\nThere is no release schedule, but releases happen very frequently and it is recommended to update periodically.\nBugfixes are not documented outside of GIT, but all other changes (i.e. new features) are documented in the [changelog](CHANGELOG.md).\nFollow me on [Mastodon][] for updates about particularly impactful releases.\n\n\n## Documentation\n\nThe help text that is displayed when executing a unit with the `-h` or `--help` switch is its main documentation.\nThe [automatically generated documentation][docs] contains a compilation of that output for each unit at the top level,\nbut also contains the specification for the three fundamental concepts of the toolkit:\n[framing][frame], [multibin arguments][argformats], and [meta variables][meta].\nFull-text search of the description and help text for every unit is also available on the command line,\nvia the provided `binref` command. In recognition of the fact that reference documentation can be somewhat dry,\nthere is an ongoing effort to produce a series of [tutorials](tutorials); I very much recommend to check them out.\nOn top of that, I collect additional resources (including some produced by third parties) below.\n\n\u003e [!NOTE]  \n\u003e Refinery is still in alpha and the interface can sometimes change,\n\u003e i.e. units and parameters can be removed or renamed.\n\u003e Hence, it can happen that specific command lines from older videos and blog posts don't work any more.\n\n- [`2021/08`] [OALabs][OA] was kind enough to let me [demo the toolkit in a feature video][VOD1].\n  In the video, I essentially work through the contents of \n  [the first tutorial](tutorials/tbr-files.v0x01.netwalker.dropper.ipynb).\n- [`2021/11`] [Johannes Bader][JB] wrote an amazing [blog post][BLOG] about analyzing malspam with binary refinery.\n- [`2024/03`] [Malware Analysis For Hedgehogs][MH] made [a video about unpacking an XWorm sample][VOD2] using refinery.\n- [`2024/11`] [the CyberYeti][CY] had me [on stream presenting refinery][VOD3].\n  Showcases again include samples from the example section below and the [tutorials](tutorials).\n\n## License\n\nThe Binary Refinery is (c) 2019 Jesko Hüttenhain, and published under a [3-Clause BSD License][license].\nThis repository also contains [a copy of the full license text](LICENSE.md). \nIf you want to do something with it that's not covered by this license, please feel free to contact the author.\n\n## Warnings \u0026 Advice\n\nThe refinery requires at least **Python 3.8**.\nIt is recommended to install it into its own [virtual environment][venv]:\nThe package can pull in a **lot** of dependencies,\nand installing it into your global Python is somewhat prone to version conflicts.\nAlso, since the toolkit introduces a large number of new commands,\nthere is a good chance that some of these will clash on some systems,\nand keeping them in their own separate virtual environment is one way to prevent that.\n\nIf you want to have all refinery commands available in your shell at all times (i.e. without having to switch to a custom virtual environment),\nyou also have the option to choose a _prefix_ for the installation,\nwhich will be put in front of every command shim that is installed.\nFor example, if you choose `r.` as your prefix, then the [emit][] unit will be installed as the command `r.emit`.\nAn added benefit is that you can type `r.` and hammer \u003ckbd\u003eTab\u003c/kbd\u003e twice to get a list of all available refinery commands.\nNote however that no prefix is assumed in documentation and it is a development goal of refinery to _not_ clash on most systems.\nThe author does not use a prefix and provides this option as a safety blanket. \n\n## Installation\n\nThe most straightforward way to install and update refinery is via pip:\n```\npip install -U binary-refinery\n```\nIf you want to choose a prefix for all units, you can specify it via the environment variable `REFINERY_PREFIX`.\nFor example, the following command will install refinery into the current Python environment with prefix `r.` on Linux:\n```bash\nREFINERY_PREFIX=r. pip install -U binary-refinery\n```\nOn Windows, you would have to run the following commands:\n```batch\nset REFINERY_PREFIX=r.\npip install -U binary-refinery\n```\nSpecifying the special prefix `!` will have the effect that no shell commands are created at all,\nand binary refinery will be installed only as a library.\nIf you want to install the current refinery `HEAD`, you can repeat all of the above steps, specifying this repository instead of the pip package.\nFor example, the following will install the very latest refinery commit:\n```\npip install -U git+git://github.com/binref/refinery.git\n```\nFinally, if you are using [REMnux][remnux-main], you can use their [refinery docker container][remnux].\n\n## Shell Support\n\nThe following is a summary of how well various shell environments are currently supported:\n\n| Shell      | Platform | State           | Comment                                                          |\n|:-----------|:---------|:----------------|:-----------------------------------------------------------------|\n| Bash       | Posix    | 🔵 Good         | Used occasionally by the author.                                 |\n| CMD        | Windows  | 🔵 Good         | Used extensively by the author.                                  |\n| PowerShell | Windows  | 🟡 Reasonable   | It [just works if the PowerShell version is at least 7.4.][psh1] |\n| Zsh        | Posix    | 🟠 Minor Issues | Following a [discussion][zsh1], there is a [fix][zsh2].          |\n| Fish       | Posix    | 🟠 Minor Issues | See issue [#55][fsh1] and discussion [#22][fsh2].                |\n\nIf you are using a different shell and have some feedback to share, please [let me know](https://github.com/binref/refinery/discussions)!\n\n## Heavyweight Dependencies\n\nThere are some units that have rather heavy-weight dependencies.\nFor example, [pcap][] is the only unit that requires a packet capture file parsing library.\nThese libraries are not installed by default to keep the installation time for refinery at a reasonable level for first-time users.\nThe corresponding units will tell you what to do when their dependency is missing:\n```\n$ emit archive.7z | xt7z -l\n(13:37:00) failure in xt7z: dependency py7zr is missing; run pip install py7zr\n```\nYou can then install these missing dependencies manually.\nIf you do not want to be bothered by missing dependencies and don't mind a long refinery installation, you can install the package as follows:\n```\npip install -U binary-refinery[all]\n```\nwhich will install _all_ dependencies on top of the required ones.\nMore precisely, there are the following extra categories available:\n\n|       Name | Included Dependencies                                             |\n|-----------:|:------------------------------------------------------------------|\n|      `all` | all dependencies for all refinery units                           |\n|      `arc` | all archiving-related dependencies (i.e. 7zip support)            |\n|  `default` | recommended selection of reasonable dependencies, author's choice |\n|  `display` | the packages `colorama` and `jsbeautifier`                        |\n| `extended` | an extended selection, excluding only the most heavyweight ones   |\n|  `formats` | all dependencies related to parsing of various file formats       |\n|   `office` | subset of `formats`; all office-related parsing dependencies      |\n|   `python` | packages related to Python decompilation                          |\n\nYou can specify any combination of these to the installation to have some control over trading dependencies for capabilities.\n\n## Bleeding Edge\n\nAlternatively, you can clone this repository and use the scripts [update.sh](update.sh) (on Linux) or [update.ps1](update.ps1) (on Windows) to install the refinery package into a local virtual environment.\nThe installation and update process for this method is to simply run the script:\n- it pulls the repository,\n- activates the virtual environment,\n- uninstalls `binary-refinery`,\n- and then installs `binary-refinery[all]`.\n\n## Generating Documentation\n\nYou can also generate all documentation locally.\nTo do so, execute the [run-pdoc3.py](run-pdoc3.py) script.\nThis will **fail** unless you run it from an environment where binary refinery has been installed as a Python package.\nTo run it, you have to specify the path of a virtual environment as the first command line argument to [run-pdoc3.py](run-pdoc3.py),\nwhich will cause the script to run itself again using the interpreter of that environment.\nIf you are certain that you want to run [run-pdoc3.py](run-pdoc3.py),\nthere is a command line switch to force the script to run with the current default Python interpreter.\nThe script installs the [pdoc3 package][pdoc3] and uses it to generate an HTML documentation for the `refinery` package.\nThe documentation can then be found in the subdirectory `html` directly next to this readme file.\n\nThe [tutorials](tutorials) are Jupyter notebooks which you can simply run and execute if your virtual environment has [Jupyter installed][jupyter].\nIt's worth pointing out that [Visual Studio Code has very comfortable support for Jupyter][jupyter-vscode].\n\n## Examples\n\n### Basic Examples\n\nThe units [emit][] and [dump][] play a special role:\nThe former is for outputting data while the latter is for dumping data to the clipboard or to disk.\nAs an example, consider the following pipeline:\n```\nemit M7EwMzVzBkI3IwNTczM3cyMg2wQA | b64 | zl | hex \n```\nHere, we emit the string `M7EwMzVzBkI3IwNTczM3cyMg2wQA`,\nbase64-decode it using [b64][],\nzlib-decompress the result using [zl][],\nand finally [hex][]-decode the decompressed data.\nEach unit performs the _\"decoding\"_ operation of a certain transformation by default, but some of them also implement the reverse operation.\nIf they do, this is always done by providing the command line switch `-R`, or `--reverse`.\nYou can produce the above base64 string using the following command because [hex][], [zl][], and [b64][] all provide the reverse operation:\n```\nemit \"Hello World\" | hex -R | zl -R | b64 -R\n```\nGiven a file `packed.bin` containing a base64 encoded payload buffer, the following pipeline extracts said payload to `payload.bin`:\n```\nemit packed.bin | carve -l -t1 b64 | b64 | dump payload.bin\n```\nThe [carve][] unit can be used to carve blocks of data out of the input buffer,\nin this case it looks for base64 encoded data, sorts them by length (`-l`) and returns the first of these (`-t1`),\nwhich carves the largest base64-looking chunk of data from `packed.bin`.\nThe data is then base64-decoded and dumped to the file `payload.bin`. \n\nThe unit [pack][], will pick all numeric expressions from a text buffer and turn them into their binary representation.\nA simple example is the pipeline\n```\nemit \"0xBA 0xAD 0xC0 0xFF 0xEE\" | pack | hex -R \n```\nwhich will output the string `BAADC0FFEE`.\n\n### Short \u0026 Sweet\n\nExtract the largest piece of base64 encoded data from a BLOB and decode it:\n```\nemit file.exe | carve -ds b64\n```\nCarve a ZIP file from a buffer, pick a DLL from it, and display information about it:\n```\nemit file.bin | carve-zip | xtzip file.dll | pemeta\n```\nList PE file sections with their corresponding SHA-256 hash:\n```\nemit file.exe | vsect [| sha256 -t | cfmt {} {path} ]]\n```\nRecursively list all files in the current directory with their respective SHA-256 hash:\n```\nef \"**\" [| sha256 -t | cfmt {} {path} ]]\n```\nExtract indicators from all files recursively enumerated inside the current directory:\n```\nef \"**\" [| xtp -n6 ipv4 socket url email | dedup ]]\n```\nConvert the hard-coded IP address `0xC0A80C2A` in network byte order to a readable format:\n```\nemit 0xC0A80C2A | pack -EB4 | pack -R [| sep . ]\n```\nPerform a single byte XOR brute force and attempt to extract a PE file payload in every iteration:\n```\nemit file.bin | rep 0x100 [| xor v:index | carve-pe -R | peek | dump {name} ]\n```\n\n### Malware Config Examples\n\nExtract a RemCos C2 server:\n```\nemit c0019718c4d4538452affb97c70d16b7af3e4816d059010c277c4e579075c944 \\\n  | perc SETTINGS [| put keylen cut::1 | rc4 cut::keylen | xtp socket ]\n```\nExtract an AgentTesla configuration:\n```\nemit fb47a566911905d37bdb464a08ca66b9078f18f10411ce019e9d5ab747571b40 \\\n  | dnfields [| aes x::32 --iv x::16 -Q ]] \\\n  | rex -M \"((??email))\\n(.*)\\n(.*)\\n:Zone\" addr={1} pass={2} host={3}\n```\nExtract the PowerShell payload from a malicious XLS macro dropper:\n```\nemit 81a1fca7a1fb97fe021a1f2cf0bf9011dd2e72a5864aad674f8fea4ef009417b [ \\\n  | xlxtr 9.5:11.5 15.15 12.5:14.5 [ \\\n  | scope -n 3 | chop -t 5 [| sorted -a | snip 2: | sep ] \\\n  | pack 10 | alu --dec -sN B-S ]] \\\n  | dump payload.ps1\n```\nAnd get the domains for the next stage:\n```\nemit payload.ps1 \n  | carve -sd b64 | zl | deob-ps1 \n  | carve -sd b64 | zl | deob-ps1\n  | xtp -f domain\n```\nExtract the configuration of unpacked HawkEye samples:\n```\nemit ee790d6f09c2292d457cbe92729937e06b3e21eb6b212bf2e32386ba7c2ff22c \\\n  | put cfg perc[RCDATA]:c:: [\\\n  | xtp guid | pbkdf2 48 rep[8]:h:00 | cca eat:cfg | aes -Q x::32 --iv x::16 ] \\\n  | dnds\n```\nWarzone RAT:\n```\nemit 4537fab9de768a668ab4e72ae2cce3169b7af2dd36a1723ddab09c04d31d61a5 \\\n  | vsect .bss | struct I{key:{}}{} [\\\n  | rc4 eat:key | struct I{host:{}}{port:H} {host:u16}:{port} ]\n```\nExtract payload from a shellcode loader and carve its c2:\n```\nemit 58ba30052d249805caae0107a0e2a5a3cb85f3000ba5479fafb7767e2a5a78f3 \\\n  | rex yara:50607080.* [| struct LL{s:L}{} | xor -B2 accu[s]:@msvc | xtp url ]\n```\nGet the malicious VBA macros from a forgotten time when this was how it was done:\n```\nemit ee103f8d64cd8fa884ff6a041db2f7aa403c502f54e26337c606044c2f205394 \\\n  | xtvba\n```\nAnd then extract the malicious downloader payload:\n```\nemit ee103f8d64cd8fa884ff6a041db2f7aa403c502f54e26337c606044c2f205394 \\\n  | doctxt | repl drp:c: | carve -s b64 | rev | b64 | rev | ppjscript\n```\nExtract payload URLs from a malicious PDF document:\n```\nemit 066aec7b106f669e587b10b3e3c6745f11f1c116f7728002f30c072bd42d6253 \\\n  | xt JS | csd string | csd string | url | xtp url [| urlfix ]]\n```\nExtract the payload URL from an equation editor exploit document:\n```\nemit e850f3849ea82980cf23844ad3caadf73856b2d5b0c4179847d82ce4016e80ee \\\n  | officecrypt | xt oleObject | xt native | rex Y:E9[] | vstack -a=x32 -w=200 | xtp\n```\n\n### AES Encryption\n\nAssume that `data` is a file which was encrypted with 256-bit AES in CBC mode.\nThe key was derived from the secret passphrase `swordfish` using the PBKDF2 key derivation routine using the salt `s4lty`.\nThe IV is prefixed to the buffer as the first 16 bytes.\nIt can be decrypted with the following pipeline:\n```\nemit data | aes --mode cbc --iv cut::16 pbkdf2[32,s4lty]:swordfish\n```\nHere, both `cut:0:16` and `pbkdf2[32,s4lty]:swordfish` are multibin arguments that use a special handler.\nIn this case, `cut:0:16` extracts the slice `0:16` (i.e. the first 16 bytes) from the input data - after application of this multibin handler,\nthe input data has the first 16 bytes removed and the argument `iv` is set to these exact 16 bytes.\nThe final argument specifies the 32 byte encryption key:\nThe handler `pbkdf2[32,s4lty]` on the other hand instructs refinery to create an instance of the pbkdf2 unit as if it had been given the command line parameters `32` and `s4lty` in this order and process the byte string `swordfish` with this unit.\nAs a simple test, the following pipeline will encrypt and decrypt a sample piece of text:\n```\nemit \"Once upon a time, at the foot of a great mountain ...\" ^\n    | aes pbkdf2[32,s4lty]:swordfish --iv md5:X -R | ccp md5:X ^\n    | aes pbkdf2[32,s4lty]:swordfish --iv cut:0:16 \n```\n\n[OA]: https://www.youtube.com/c/OALabs\n[JB]: https://bin.re/\n[MH]: https://www.youtube.com/@MalwareAnalysisForHedgehogs\n[CY]: https://www.youtube.com/@jstrosch\n[Mastodon]: https://infosec.exchange/@rattle\n\n[BLOG]: https://bin.re/blog/analysing-ta551-malspam-with-binary-refinery/\n[VOD1]: https://www.youtube.com/watch?v=4gTaGfFyMK4\n[VOD2]: https://www.youtube.com/watch?v=5ZtmYNmVMKo\n[VOD3]: https://www.youtube.com/live/-B072w0qjNk\n\n[remnux]: https://hub.docker.com/r/remnux/binary-refinery\n[remnux-main]: https://remnux.org/\n[pdoc3]: https://pdoc3.github.io/pdoc/\n[docs]: https://binref.github.io/\n[argformats]: https://binref.github.io/lib/argformats.html\n[frame]: https://binref.github.io/lib/frame.html\n[meta]: https://binref.github.io/lib/meta.html\n[license]: https://opensource.org/licenses/BSD-3-Clause\n[tests]: https://github.com/binref/refinery/actions\n[codecov]: https://codecov.io/gh/binref/refinery/?branch=master\n[pypi]: https://pypi.org/project/binary-refinery/\n[venv]: https://docs.python.org/3/library/venv.html\n\n[zsh1]: https://github.com/binref/refinery/discussions/18\n[zsh2]: shells/zsh\n[psh1]: https://github.com/binref/refinery/issues/5\n[fsh1]: https://github.com/binref/refinery/discussions/55\n[fsh2]: https://github.com/binref/refinery/issues/22\n\n[dump]: https://binref.github.io/#refinery.dump\n[emit]: https://binref.github.io/#refinery.emit\n[stego]: https://binref.github.io/#refinery.stego\n[pcap]: https://binref.github.io/#refinery.pcap\n[hex]: https://binref.github.io/#refinery.hex\n[zl]: https://binref.github.io/#refinery.zl\n[b64]: https://binref.github.io/#refinery.b64\n[carve]: https://binref.github.io/#refinery.carve\n[pack]: https://binref.github.io/#refinery.pack\n\n[jupyter]: https://jupyter.org/install\n[jupyter-vscode]: https://code.visualstudio.com/docs/datascience/jupyter-notebooks\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinref%2Frefinery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbinref%2Frefinery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbinref%2Frefinery/lists"}