{"id":13746361,"url":"https://github.com/joxeankoret/pigaios","last_synced_at":"2025-04-04T17:06:01.186Z","repository":{"id":44877658,"uuid":"132592232","full_name":"joxeankoret/pigaios","owner":"joxeankoret","description":"A tool for matching and diffing source codes directly against binaries.","archived":false,"fork":false,"pushed_at":"2023-01-09T06:52:55.000Z","size":25370,"stargazers_count":643,"open_issues_count":14,"forks_count":68,"subscribers_count":36,"default_branch":"master","last_synced_at":"2025-04-02T21:44:05.251Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/joxeankoret.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-08T10:21:05.000Z","updated_at":"2025-03-28T11:35:11.000Z","dependencies_parsed_at":"2023-02-08T09:16:42.214Z","dependency_job_id":null,"html_url":"https://github.com/joxeankoret/pigaios","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joxeankoret%2Fpigaios","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joxeankoret%2Fpigaios/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joxeankoret%2Fpigaios/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/joxeankoret%2Fpigaios/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/joxeankoret","download_url":"https://codeload.github.com/joxeankoret/pigaios/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247217175,"owners_count":20903008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T06:00:52.472Z","updated_at":"2025-04-04T17:06:01.164Z","avatar_url":"https://github.com/joxeankoret.png","language":"Python","funding_links":["https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=LKGZZNUCZFYG8\u0026source=url"],"categories":["\u003ca id=\"02088f4884be6c9effb0f1e9a3795e58\"\u003e\u003c/a\u003e签名(FLIRT等)\u0026\u0026比较(Diff)\u0026\u0026匹配","使用","Secure Programming"],"sub_categories":["\u003ca id=\"161e5a3437461dc8959cc923e6a18ef7\"\u003e\u003c/a\u003eDiff\u0026\u0026Match工具","\u003ca id=\"02088f4884be6c9effb0f1e9a3795e58\"\u003e\u003c/a\u003e签名(FLIRT等)\u0026\u0026比较(Diff)\u0026\u0026匹配","Tokens"],"readme":"# Pigaios\n\nPigaios ('πηγαίος', Greek for 'source' as in 'source code') is a tool for diffing/matching source codes directly against binaries. The idea is to point a tool to a code base, regardless of it being compilable or not (for example, partial source code or source code for platforms not at your hand), extract information from that code base and, then, import in an IDA database function names (symbols), structures and enumerations. It uses the Python CLang bindings (which are very limited, but still better than using pycparser).\n\nBasically, the tool does the following:\n\n * Parse C source code and extract features from the Abstract Syntax Tree (AST) of each function.\n * Export the same data extracted from C source codes from IDA databases.\n * Find matches between the features found in C source codes and IDA databases.\n * After an initial set of matches with no false positive is found, find more matches from the callgraph.\n * Rate the matches using both an \"expert system\" and a \"machine learning\" based system.\n * Also, import into the IDA database all the required structures and enumerations of a given code base (something not trivial in IDA).\n \nThe tool was released in October 2018, during the [Hacktivity](https://www.hacktivity.com/) conference.\n\nNOTE: If you're looking for a tool for diffing or matching between binaries or if you can properly build binaries, you might want to take a look to [Diaphora](https://github.com/joxeankoret/diaphora).\n\n## Donations\n\nYou can help (or thank) the author of Pigaios by making a donation, if you feel like doing so: [![Donate](https://img.shields.io/badge/Donate-PayPal-green.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick\u0026hosted_button_id=LKGZZNUCZFYG8\u0026source=url)\n\n## Requirements\n\nThis project requires the installation of the CLang's Python bindings, Colorama is required for displaying colours (but is optional) and SciKit Learn is required for the Machine Learning part (which is also optional). You can install in Debian based Linux distros the dependencies with the following command:\n \n```\n$ sudo apt-get install clang python-clang-5.0 libclang-5.0-dev python-colorama python-sklearn\n```\n\nIn other operating systems, like in Windows, you can install them by issuing the following commands:\n\n```\n$ pip install clang-5\n$ pip install colorama\n$ pip install scikit-learn\n```\n\nIn Windows, it's also required to install LLVM *and* to add LLVM to the system PATH for all users or at least the current user. You can use the pre-built binaries: http://releases.llvm.org/download.html\n\nNOTE: There is no strong requirement on the specific 5.0 version of the Python CLang bindings, it should work with any CLang version higher or equal to 3.9. However, most of the testing have been done with version 5.0.\n\n## Using srctobindiff\n\nWe will use as an example the source code tarball of [Zlib 1.2.11](https://zlib.net/zlib-1.2.11.tar.gz). Download it and untar the archive in a directory. Then enter into that directory and run the command \"srcbindiff.py -create\":\n\n```\n$ wget https://zlib.net/zlib-1.2.11.tar.gz\n$ tar -xzf zlib-1.2.11.tar.gz \n$ cd zlib-1.2.11\n$ srcbindiff.py -create\nProject file 'sbd.project' created.\n```\n\nBy default, a project file 'sbd.project' will be created. Open this newly generated file in your favourite text editor, you will see something like the following:\n\n```\n$ cat sbd.project \n####################################################\n# Default Source-Binary-Differ project configuration\n####################################################\n[GENERAL]\nincludes = /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include\n\n[PROJECT]\ncflags = -Izlib_dir -Izlib_dir/include\ncxxflags = -Izlib_dir -Izlib_dir/include\nexport-file = zlib-1.2.11.sqlite\n\n[FILES]\nexamples/gzjoin.c = 1\nexamples/fitblk.c = 1\nexamples/enough.c = 1\nexamples/gzappend.c = 1\nexamples/zran.c = 1\nexamples/zpipe.c = 1\nexamples/gzlog.c = 1\nexamples/gun.c = 1\ncontrib/testzlib/testzlib.c = 1\ncontrib/iostream/test.cpp = 1\n(...many other files stripped...)\n```\n\nIn this file we can see various directives:\n\n * The compiler/frontend required include headers.\n * The CFLAGS and CXXFLAGS that we want to use for parsing the source code files.\n * A list of source files and a number indicating if the files are enabled for compilation or not (1 or 0).\n \nWe will just remove all the lines for the files in \"examples/\" or \"test/\". After that, we will run again the \"srcbindiff.py\" program passing the \"-export\" command line option:\n\n```\n$ srcbindiff.py -export\nUsing a total of 8 thread(s)\n[+] CC examples/gzjoin.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/fitblk.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/enough.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/gzappend.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/zran.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/zpipe.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/gzlog.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CC examples/gun.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\nexamples/zran.c:402,68: warning: format specifies type 'unsigned long long' but the argument has type 'off_t' (aka 'long')\n[+] CC contrib/testzlib/testzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CXX contrib/iostream/test.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\ncontrib/testzlib/testzlib.c:3,10: fatal: 'windows.h' file not found\n[+] CXX contrib/iostream/zfstream.cpp -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\ncontrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found\ncontrib/iostream/zfstream.h:5,10: fatal: 'fstream.h' file not found\n[+] CXX contrib/iostream3/test.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n[+] CXX contrib/iostream3/zfstream.cc -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include\n(...)\n[+] Building definitions...\n[i] Creating headers definition file zlib-1.2.11-exported.h...\n[+] Building the callgraphs...\n[+] Building the constants table...\n[+] Creating indexes...\n```\n\nAs we can see, it compiled, parsed and generated everything from the source code and the process generated various warnings and errors. The errors are because I'm compiling the ZLib source code in Linux and I don't have the windows.h header, for example. We can remove the files that are failing or we can just ignore them as one feature of this project is that it can parse both partial and non compilable source codes. Whatever we decide to do, we will have a SQLite database called \"zlib-1.2.11.sqlite\" in the same directory where we ran the command. We can open that database with whatever tool that supports SQLite databases, if we want to do so, like its command line tool:\n\n```\n$ sqlite3 zlib-1.2.11.sqlite \nSQLite version 3.11.0 2016-02-15 17:29:24\nEnter \".help\" for usage hints.\nsqlite\u003e select name from functions limit 5;\nBeginCountPerfCounter\nBeginCountRdtsc\nDisplay64BitsSize\nExprMatch\nExprMatch\n```\n\n## Importing symbols in IDA\n\nOnce we have a binary opened in IDA that we know is using ZLib we can match functions directly from the source code by running the IDAPython script ```sourceimp_ida.py``` and selecting in the dialog the zlib-1.2.11.sqlite file we just exported before. After a few seconds, it will discover various functions by, first, just issuing some simple SQL queries and, later on, will find many more symbols by traversing the call graph of the initial matches (that should have near zero false positives) and find more matches. At the same time, you should have all the structures and enumerations that were found while parsing the ZLib source code.\n\nAnd that's it! Hopefully, it will make the life of reverse engineers easier and we will have to spend less time doing boring tasks like importing symbols or waste time reverse engineering open source libraries statically compiled in our targets.\n\n## Screenshots\n\nList of matches between a Busybox 1.26.2 PowerPC binary and the 1.28 source code from the GIT repository:\n\n![List of matches between a Busybox 1.26.2 PPC binary and the 1.28 source code from the GIT repository](https://user-images.githubusercontent.com/2945834/49733950-2961f100-fc83-11e8-8a1d-254791382314.png)\n\nVisually diffing the pseudo-code of a function in some ```xmllint``` binary and the source code of libxml2:\n\n![Visually diffing the pseudo-code of a function in some xmllint binary and the source code of libxml2](https://user-images.githubusercontent.com/2945834/49734123-8eb5e200-fc83-11e8-956c-f9b029f331f8.png)\n\nLocal types IDA view **before** importing symbols from the matches found between a Busybox 1.26.2 PowerPC binary and the 1.28 source code from the GIT repository:\n\n![image](https://user-images.githubusercontent.com/2945834/49734194-d3da1400-fc83-11e8-8380-91837bb7ca16.png)\n\nAnd the same view **after** importing symbols:\n![image](https://user-images.githubusercontent.com/2945834/49734286-1d2a6380-fc84-11e8-9560-d2fb054a4c70.png)\n\n## License\n\nPigaios is released under the GPL v3 license but commercial licenses for proprietary developments can be purchased. Contact admin AT joxeankoret DOT com for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoxeankoret%2Fpigaios","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjoxeankoret%2Fpigaios","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjoxeankoret%2Fpigaios/lists"}