{"id":19927230,"url":"https://github.com/hexhive/igor","last_synced_at":"2025-05-03T09:32:10.607Z","repository":{"id":37828200,"uuid":"405895189","full_name":"HexHive/Igor","owner":"HexHive","description":null,"archived":false,"fork":false,"pushed_at":"2022-08-02T04:09:39.000Z","size":3805,"stargazers_count":71,"open_issues_count":1,"forks_count":15,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-07T14:12:47.444Z","etag":null,"topics":["cluster","crash","deduplication","fuzzing","grouping","security","similarity","trace"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HexHive.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-13T08:37:50.000Z","updated_at":"2024-12-05T08:44:54.000Z","dependencies_parsed_at":"2022-07-11T23:16:13.954Z","dependency_job_id":null,"html_url":"https://github.com/HexHive/Igor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HexHive%2FIgor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HexHive%2FIgor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HexHive%2FIgor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HexHive%2FIgor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HexHive","download_url":"https://codeload.github.com/HexHive/Igor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252169026,"owners_count":21705366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","crash","deduplication","fuzzing","grouping","security","similarity","trace"],"created_at":"2024-11-12T22:32:49.689Z","updated_at":"2025-05-03T09:32:06.080Z","avatar_url":"https://github.com/HexHive.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Igor: Crash Deduplication Through Root-Cause Clustering\r\n## Overview\r\n\r\nFuzzing has emerged as the most effective bug-finding technique. The output of a\r\nfuzzer is a set of proof-of-concept (PoC) test cases for all observed “unique”\r\ncrashes. It costs developers substantial efforts to analyze each crashing test\r\ncase. This, mostly manual, process has lead to the number of reported crashes\r\nout-pacing the number of bug fixes. Automatic crash deduplication techniques,\r\nwhich mostly rely on coverage profiles and stack hashes, are supposed to\r\nalleviate these pressures. However, these techniques both inflate actual bug\r\ncounts and falsely conflate unrelated bugs. This hinders, rather than helps,\r\ndevelopers, and calls for more accurate techniques.\r\n\r\nIgor is a tool for automated crash grouping/deduplication. By minimizing each\r\nPoC’s execution trace, it can obtain pruned test cases that exercise the\r\ncritical behavior necessary for triggering a bug. Then, Igor use a graph\r\nsimilarity comparison to cluster crashes based on the control-flow graph of the\r\nminimized execution traces, with each cluster mapping back to a single, unique\r\nroot cause.\r\n\r\nIgor helps a lot when you have many PoCs and would like to classify them into\r\nseveral groups according to the root cause, so that you don't need to analyze\r\nthe PoCs one by one. \r\n\r\n[Here](https://github.com/HexHive/Igor/tree/main/images/Igor_overview.pdf)\r\nis a flow chart for overviewing the Igor's workflow.\r\n\r\nMore details about the project can be found at the [paper](https://hexhive.epfl.ch/publications/files/21CCS.pdf).\r\n\r\nOur presentation about Igor can be found at the [video](https://www.youtube.com/watch?v=V06x1Ad5dRo)\r\n\r\n\r\n## Components\r\nThis repository is structured as follows:\r\n\r\n1. IgorFuzz (AFLplusplus): Our coverage decreasing fuzzer for test cases reduction.\r\n2. Smart_tracer (Pin): Our tracer to record control flow.\r\n3. Analyzer: Prune recorded execution traces and construct control flow graphs \r\n4. TraceClusterMaker: Our cluster tool based on graph similar matrixs\r\n5. Evaluation: Our evaluation scripts used in Igor paper\r\n\r\n\r\n\r\n## IgorFuzz\r\nWe developed IgorFuzz based on AFLplusplus crash exploration mode. It can prune\r\nthe paths that unnecessary for bug triggering very fast. Before using IgorFuzz,\r\nwe suggest use afl-tmin to shrink the size of crash first, so that IgorFuzz will\r\nhave better performance. \r\n\r\n### Installation and Usage\r\nThe installation and usage of IgorFuzz is completely same to the AFLplusplus'\r\ncrash mode. Even time you want to launch IgorFuzz, you must confirm that you\r\nhave put a PoC in input directory and set up output directory properly.\r\n\r\n\r\n### Reduction in parallel\r\n\r\nIgorFuzz reduces one PoC at one time. To apply IgorFuzz on many PoCs parallelly,\r\nwe provide users with `mass_fuzz.sh`. It will automatically run over and over\r\nagain untill all PoCs in input dir are fuzzed.\r\n\r\n\r\nCollect all PoCs you want to reduce in input directory(e.g., `/home/my_pocs`), and set up output dir(e.g., `/home/trimmed/my_pocs`). \r\n\r\nThe third arg is the number of PoCs you want to fuzz parallelly each time. The\r\nlast arg is the duration the fuzzing last for(e.g., 1h2m3s).\r\n\r\nExample:\r\n```console\r\n$ ./mass_fuzz.sh /home/my_pocs /home/trimmed/my_pocs 30 10h\r\n```\r\n\r\n\r\nThe form of result is: `/home/trimmed/my_pocs/$the-name-of-a-PoC(like: id:000000,xxxxxxxx)/`\r\n\r\n\r\n`mass_fuzz.sh` renames fuzzed PoCs like: `fzd_id:000000,xxxxx`. So if there's something wrong with IgorFuzz or you want to shirnk all PoCs again, you can use `./clear_fzd.sh $INPUT_DIR` to remove \"fzd\" prefix. \r\n\r\n\r\n\r\n## Tracing and Analyzing\r\n\r\nTo obtain precise execution traces (basic block level in default) of a specific\r\nbinary, we need the following tools:\r\n\r\n- `smart_tracer/calltrace_wrapper.py`\r\n- `analyzer/breakpoint_hit_counter.py`\r\n- `analyzer/find_crashing_addr.py`\r\n- `analyzer/trace_shrinker.py`\r\n- `analyzer/trace_pruner.py`\r\n\r\nFor usages of the above tools, please check `analyzer/README` and `smart_tracer/README`.\r\n\r\n### Workflow\r\nExecution traces need to be filted before constructing the control flow graph to\r\nbe used to calculate the graph similarity. Follwing steps show how to do that.\r\n\r\n#### STEP 1 - In the ASAN disabled environment\r\n\r\n- Using `calltrace_wrapper.py` to collect execution traces of the binary under\r\n  test. Users can confiure which granularity they want to use, for now, we\r\n  support instruction level, basic block level, and function call level.\r\n- Using `trace_shrinker.py` to filter out execution traces related to shared\r\n  libraries.\r\n\r\n#### STEP 2 - In the ASAN enabled environment\r\n\r\n- Using `find_crashing_addr.py` to find out the number of crashing addresses(the line number observed when the binary crashes). For each crashing address, repeat the following three steps:\r\n  - Debug the binary under test, find the last function the binary calls before\r\n    crashing, take down its caller's address(usually, the `call` instruction's\r\n    address).\r\n  - Using `breakpoint_hit_counter.py` to find out how many times the address\r\n    mentioned above is hit before the binary crashes.\r\n  - Copy the breakpoint hit count folder to our ASAN disabled environment.\r\n\r\n#### STEP 3 - In the ASAN disabled environment \r\n\r\n- Debug the binary under test, find the same caller as the one in the ASAN enabled environment.\r\n- Using `trace_pruner.py` to prune redundant trace entries that are recorded after the point that the binary should have crashed. This step gives you a pruned traces directory for clustering.\r\n\r\n\r\n\r\n## Clustering\r\n\r\nThe `TraceClusterMaker` folder contains the utilities for clustering.\r\n\r\n`TraceClusterMaker/ClusterMaker.py` will do everything for you, including construct\r\ncontrol flow graphs based on pruned traces, calculate graph similarities and\r\nclustering.\r\n\r\n## Ground-truth Benchmark\r\nThere are few public benchmark designed for the verification of crash grouping,\r\nespecially for real world programs. In order to promote the research of crash\r\ngrouping, we provide a a ground-truth benchmark for evaluating crash grouping\r\ntechniques, containing 52 CVEs and more than 250,000 crashing test cases from 14\r\nreal world programs (generated over 58.7 CPU-years of fuzzing) for subsequent\r\nresearchers, Igor also used this dataset to do the evaluations.\r\n\r\nWe are grateful to [Magma](https://hexhive.epfl.ch/magma/) and\r\n[Moonlight](https://hexhive.epfl.ch/publications/files/21ISSTA2.pdf) for the\r\noriginal data and the methodology of establishing the ground truth data set. We\r\nused all their crashes and labels in the process of building the our benchmark,\r\nand further expanded the scale of the data set on their basis (more fuzzing time\r\nto generate more crashes).\r\n\r\nHere are the links to our ground-truth benchmark:  \r\n[benchmark](https://drive.google.com/drive/folders/1LgkVh1GpFMyIQ7kJSA-oYj-mr7S99edi?usp=sharing)  \r\nEvery PoC is labeled with its root cause, user can get the label by parse the name of the PoC.\r\n\r\nBuilding Magma targets approach can be found [here](https://hexhive.epfl.ch/magma/docs/getting-started.html)  \r\nBuilding MoonLight targets approach can be found [here](https://datacommons.anu.edu.au/DataCommons/rest/records/anudc:5927/data/Binaries/)\r\n\r\n## Contact\r\nQuestions? Concerns? Feel free to ping me via [E-mail](supermolejzy@gmail.com)  \r\nFor recent update and new features implementation, please ping Sonic who is pushing this project forward via [E-mail](observer000@qq.com)\r\n\r\n## TODO\r\n- ~~Provide evaluation scripts we used in Igor paper~~\r\n- ~~Provide link to Igor's dataset~~\r\n- Provide detailed tutorial for Igor system\r\n- Provide README for evaluation scripts\r\n- Provide scripts to do trace analyzing stuff automaticlly\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexhive%2Figor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhexhive%2Figor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhexhive%2Figor/lists"}