{"id":13466018,"url":"https://github.com/googleprojectzero/halfempty","last_synced_at":"2025-04-13T07:48:42.520Z","repository":{"id":39631285,"uuid":"150137236","full_name":"googleprojectzero/halfempty","owner":"googleprojectzero","description":"A fast, parallel test case minimization tool.","archived":false,"fork":false,"pushed_at":"2020-12-24T16:59:58.000Z","size":1145,"stargazers_count":957,"open_issues_count":6,"forks_count":69,"subscribers_count":52,"default_branch":"master","last_synced_at":"2025-04-13T07:48:37.119Z","etag":null,"topics":["bisection","fuzzing","security","testcase-reducer"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/googleprojectzero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-09-24T16:54:54.000Z","updated_at":"2025-04-08T11:26:08.000Z","dependencies_parsed_at":"2022-09-20T06:52:12.756Z","dependency_job_id":null,"html_url":"https://github.com/googleprojectzero/halfempty","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleprojectzero%2Fhalfempty","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleprojectzero%2Fhalfempty/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleprojectzero%2Fhalfempty/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleprojectzero%2Fhalfempty/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/googleprojectzero","download_url":"https://codeload.github.com/googleprojectzero/halfempty/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248681494,"owners_count":21144700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bisection","fuzzing","security","testcase-reducer"],"created_at":"2024-07-31T15:00:38.107Z","updated_at":"2025-04-13T07:48:42.495Z","avatar_url":"https://github.com/googleprojectzero.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"# Introducing halfempty 🥛\n-------------------------------------------------------------------------------\n\n## Fast, Parallel Testcase Minimization\n\nHalfempty is a new testcase minimization tool, designed with parallelization in\nmind. Halfempty was built to use strategies and techniques that dramatically\nspeed up the minimization process.\n\n### Background\n\nFuzzers find inputs that trigger bugs, but understanding those bugs is easier\nwhen you remove as much extraneous data as possible. This is called *testcase\nminimization* or *delta debugging*.\n\nMinimization tools use various techniques to simplify the testcase, but the\ncore algorithm is simply bisection. Bisection is an inherently serial process,\nyou can't advance through the algorithm without knowing the result of each\nstep. This data dependency problem can make minimization very slow, sometimes\ntaking hours to complete while cpu cores sit idle.\n\n\u003e ![Bisection](doc/images/bisect.png)\n\u003e\n\u003e **In this diagram you can see we progressively remove parts of the file to determine which section is interesting.**\n\nhalfempty solves this problem using *pessimistic speculative execution*. We\nbuild a binary tree of all the possible bisection steps and then idle cores can\nspeculatively test future steps ahead of our position in the algorithm. In many\ncases, the test results are already known by the time we need them.\n\nWe call it *pessimistic*, because real workloads are characterized by long\nseries of consecutive failures. We simply assume that tests are going to fail,\nand speculatively follow the failure path until proven wrong.\n\n\u003e ![Tree](doc/images/tree.png)\n\u003e\n\u003e **In this diagram, you can see we generated a binary tree of all possible outcomes, and now idle cores can speculatively work ahead of the main thread.**\n\nIf you're fuzzing a target that takes more than a few seconds to run then\nparallelizing the minimization can dramatically speedup your workflow. Real\nfuzzing inputs that take several seconds to reproduce can take many hours to\ncomplete using serial bisection, but halfempty can produce the same output in\nminutes.\n\nIn real tests, the author often finds speedup exceeding hours.\n\n\u003e ![Path](doc/images/path.png) ![Path Full](doc/images/treefull.png)\n\u003e\n\u003e **This is a real minimization path from a fuzzer generated crash.**\n\u003e\n\u003e Halfempty generates a binary tree, and this graph shows the path through the\n\u003e tree from the root to the final leaf (discarded paths are hidden on the left\n\u003e to simplify the diagram).\n\u003e\n\u003e The green nodes were successful and the red nodes were failures. The grey\n\u003e nodes in the right were explored but discarded. Because all consecutive red\n\u003e nodes are executed in parallel, the actual wall clock time required to\n\u003e minimize the input was minimal.\n\u003e\n\u003e Each crash took ~11 seconds to reproduce, requiring about  34 minutes of\n\u003e compute time - but halfempty completed in just **6 minutes**!\n\u003e\n\u003e The original input was 240K, the final output was just 75 bytes.\n\n### Building\n\nThe only dependency is `libglib2.0-dev`, used for some useful data structures, like [N-ary trees](https://developer.gnome.org/glib/stable/glib-N-ary-Trees.html).\n\nOn RedHat systems, try `glib2-devel`.\n\nJust type `make` to build the main binary.\n\nThe `--monitor` mode feature requires the `graphviz` package and a web browser.\n\nThe author has tested the following distributions:\n\n* CentOS 6 amd64\n* Ubuntu 14 amd64\n\n#### Mac OS X\n\nHalfempty has preliminary macOS support using [homebrew](https://brew.sh/).\n\nPlease use `brew install pkg-config glib` to install necessary dependencies, then `make` to build\nthe main binary.\n\n### Usage\n\nFirst, create a shell script that when given your input on stdin, returns zero.\n\nA simple example might look like this if you wanted to test a `gzip` crash:\n\n```bash\n#!/bin/sh\n\ngzip -dc\n\n# Check if we were killed with SIGSEGV\nif test $? -eq 139; then\n    exit 0 # We want this input\nelse\n    exit 1 # We don't want this input\nfi\n```\n\nMake the file executable and verify it works:\n\n```\n$ chmod +x testgzip.sh\n$ ./testgzip.sh \u003c crashinput.gz \u0026\u0026 echo success || echo failure\nsuccess\n```\n\nNow simply run halfempty with your input and it will find the smallest version that still returns zero.\n\n\u003e **Note:** If you need to create temporary files, see some advanced examples in the documentation.\n\n\n```\n$ halfempty testgzip.sh crashinput.gz\n```\n\nIf everything worked, there should be a minimal output file in `halfempty.out`.\n\n![Screenshot](doc/images/screenshot.png)\n\nIf you want to monitor what halfempty is doing, you can use `--monitor` mode,\nwhich will generate graphs you can watch in realtime. halfempty will generate a\nURL you can open, and you can view the data in your web browser.\n\n\u003e **Note:** `--monitor` mode requires the graphviz package to be installed.\n\n[![Screenshot](doc/images/smallmonitor.png)](doc/images/monitor.png)\n\n\n### Options\n\nHalfempty includes many options to fine tune the execution environment for the\nchild processes, and tweak performance options. The full documentation can be\nshown with `--help-all`, but here are the most commonly useful parameters.\n\n| Parameter                                  | Description                                     |\n|:-------------------------------------------|:------------------------------------------------|\n| `--num-threads=threads`                    | Halfempty will default to using all available cores, but you can tweak this if you prefer. |\n| `--stable`                                 | Sometimes different strategies can shake out new potential for minimizing.\u003cbr\u003eIf you enable this, halfempty will repeat all strategies until the output doesn't change.\u003cbr\u003e(Slower, but recommended). |\n| `--timeout=seconds`                        | If tested programs can run too long, we can send them a SIGALRM.\u003cbr\u003eYou can catch this in your test script (see `help trap`) and cleanup if you like, or accept the default action and terminate. |\n| `--limit RLIMIT_???=N`                     | You can fine tune the resource limits available to child processes.\u003cbr\u003ePerhaps you want to limit how much memory they can allocate, or enable core dumps.\u003cbr\u003eAn example might be `--limit RLIMIT_CPU=600` |\n| `--inherit-stdout`\u003cbr\u003e`--inherit-stderr`   | By default, we discard all output from children.\u003cbr\u003eIf you want to see the output instead, you can disable this and you can see child error messages. |\n| `--zero-char=byte`                         | Halfempty tries to simplify files by overwriting data with nul bytes. This makes sense for binary file formats.\u003cbr\u003e If you're minimizing text formats (`html`, `xml`, `c`, etc) then you might want whitespace instead.\u003cbr\u003eSet this to `0x20` for space, or `0x0a` for a newline. |\n| `--monitor`                                | If you have the `graphviz` package installed, halfempty can generate graphs so you watch the progress. |\n| `--no-terminate`                           | If halfempty guesses wrong, it might already be running your test on an input we know we don't need.\u003cbr\u003eBy default, we will try to kill it so we can get back to using that thread sooner.\u003cbr\u003eYou can disable this if you prefer. |\n| `--output=filename`                        | By default your output is saved to `halfempty.out`, but you can save it anywhere you like. |\n| `--noverify`                               | If tests are very slow, you can skip the initial verification and go straight to parallelization.\u003cbr\u003e(Faster, but not recommended). |\n| `--generate-dot`                           | Halfempty can generate a dot file of the final tree state that you can inspect with xdot. |\n| `--gen-intermediate`                       | Save the best result as it's found, so you don't lose your progress if halfempty is interrupted. |\n\n### Examples\n\nThere are more examples available in the wiki.\n\n#### Creating temporary files\n\n\u003e Note: Are you sure you need temporary files? Many programs will accept `/dev/stdin`.\n\nIf you need to create temporary files to give to your target program, you can simply do something like this.\n\n```bash\n#!/bin/sh\ntempfile=`mktemp` \u0026\u0026 cat \u003e ${tempfile}\n\nyourprogram ${tempfile}\n```\n\nRemember to clean it up when you're done, you can do this if you like:\n\n```bash\n#!/bin/sh\ntempfile=`mktemp` \u0026\u0026 cat \u003e ${tempfile}\nresult=1\n\ntrap 'rm -f ${tempfile}; exit ${result}' EXIT TERM ALRM\n\nyourprogram ${tempfile}\n\nif test $? -eq 139; then\n    result=0\nfi\n```\n\n#### Verifying crashes\n\nSometimes your target program might crash with a different crash accidentally\nfound during minimization. One solution might be to use gdb to verify the crash\nsite.\n\n```bash\n#!/bin/sh\nexec gdb -q                                                                 \\\n         -ex 'r'                                                            \\\n         -ex 'q !($_siginfo.si_signo == 11 \u0026\u0026 $pc == 0x00007ffff763f2e7)'   \\\n         -ex 'q 1'                                                          \\\n         --args yourprogram --yourparams\n```\n\nThis will exit 0 if the signal number and crash address match, or 1 otherwise.\n\nYou can test various things such as registers (`$rip`, `$eax`, etc), fault\naddress (`$_siginfo._sifields._sigfault.si_addr`), and many more. If you want\nto see more things you can test, try the command `show conv` in gdb.\n\n### FAQ\n\n**Q**. **What does finalized mean in halfempty output?**\n\n**A**. Halfempty works by guessing what the results of tests will be before the\nreal result is known. If the path through the bisection tree from the root node\nto the final leaf was entirely through nodes where we knew the result, then the\npath is *finalized* (as opposed to *pending*).\n\n**Q**. **Where does the name come from?**\n\n**A**. We use *pessimistic* speculative execution, so the glass is always half\nempty? ....? Sorry. 🥛\n\n**Q**. **How can I kill processes that take too long?**\n\n**A**. Use `--timeout 10` to send a signal that can be caught after 10 seconds,\nor `--limit RLIMIT_CPU=10` to enforce a hard limit.\n\n**Q**. **Halfempty wastes a lot of CPU time exploring paths, so is it really faster?**\n\n**A**. It's significantly faster in real time (i.e. wall clock time), that's what counts!\n\n**Q**. **I have a very large input, what do I need to know?**\n\n**A**. Halfempty is less thorough by default on very large inputs that don't\nseem to minimize well. Removing each byte from multi-gigabyte inputs just takes\ntoo long, even when run in parallel.\n\nIf you really *want* halfempty to be thorough, you can do this:\n\n`$ halfempty --bisect-skip-multiplier=0 --zero-skip-multiplier=0 --stable --gen-intermediate harness.sh input.bin`\n\n* `--bisect-skip-multiplier=0` and `--zero-skip-multiplier=0` means to try removing every single byte.\n* `--stable` means to keep retrying minimization until it no further removals work.\n* `--gen-intermediate` means to save the best result as it's found, so you\nwon't lose your work if you change your mind.\n\nOn the other hand, if you just want halfempty to be faster and don't care if\nit's not very thorough, you can do the opposite. Something like this:\n\n`$ halfempty --bisect-skip-multiplier=0.01 --zero-skip-multiplier=0.01 harness.sh input.bin`\n\nThe reasonable range for the multiplier is `0` to `0.1`.\n\n### BUGS\n\n* If your program intercepts signals or creates process groups, it might be difficult to cleanup.\n* For very long trees, we keep an fd open for each successful node. It's possible we might exhaust fds.\n\nPlease report more bugs or unexpected results to \u003ctaviso@google.com\u003e. The\nauthor intends to maintain this tool and make it a stable and reliable\ncomponent of your fuzzing workflow.\n\nBetter quality bug reports require simpler reproducers, and that requires good\nquality tools.\n\n### FUTURE\n\n* The next version will allow the level of pessimism to be controlled at runtime.\n\n### AUTHORS\n\nTavis Ormandy \u003ctaviso@google.com\u003e\n\n### LICENSE\n\nApache 2.0, See LICENSE file for details.\n\n### NOTICE\n\nThis is not an officially supported Google product.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleprojectzero%2Fhalfempty","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogleprojectzero%2Fhalfempty","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleprojectzero%2Fhalfempty/lists"}