{"id":20198327,"url":"https://github.com/syssec-kaist/firmkit","last_synced_at":"2025-04-10T10:46:05.228Z","repository":{"id":86730253,"uuid":"441478881","full_name":"SysSec-KAIST/FirmKit","owner":"SysSec-KAIST","description":"IoT firmware vulnerability analysis tool based on binary code similarity analysis (BCSA)","archived":false,"fork":false,"pushed_at":"2022-07-06T02:27:09.000Z","size":935,"stargazers_count":19,"open_issues_count":0,"forks_count":9,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-24T09:38:40.960Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SysSec-KAIST.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-24T13:56:27.000Z","updated_at":"2025-02-13T03:08:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"dee4fec6-d412-4c73-beab-923de72ed9d2","html_url":"https://github.com/SysSec-KAIST/FirmKit","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SysSec-KAIST%2FFirmKit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SysSec-KAIST%2FFirmKit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SysSec-KAIST%2FFirmKit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SysSec-KAIST%2FFirmKit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SysSec-KAIST","download_url":"https://codeload.github.com/SysSec-KAIST/FirmKit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248201333,"owners_count":21064098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T04:30:30.755Z","updated_at":"2025-04-10T10:46:05.222Z","avatar_url":"https://github.com/SysSec-KAIST.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Description\nFirmKit is an IoT vulnerability analysis tool based on binary code similarity\nanalysis (BCSA). FirmKit includes **ground truth vulnerabilities** in custom\nbinaries, such as CGI binaries, for the top eight wireless router and IP camera\nvendors.\n\nCurrently, the FirmKit utilizes\n[TikNib](https://github.com/SoftSec-KAIST/TikNib), which is a simple\ninterpretable BCSA tool. In addition to TikNib's numeric presemantic features,\nFirmKit implements two additional features based on heuristic knowledge. Through\nempirical analysis of diverse binaries in our previous studies, we discovered\nthat IoT binaries frequently contain function names. Thus, rather than comparing\nabstracted numeric features, we can directly compare caller and callee names.\nFor example, we can use the names of internal and library functions instead of\nthe numbers of callers and callees.\nAdditionally, we discovered that data strings contained in IoT binaries often\ncontain useful information. CGI binaries include hard-coded strings for parsing\nURLs, such as HTTP, POST, answer, or password. Therefore, we can use these\nstrings to compute the similarity score. Therefore, we chose to use 1) the\nstrings to which the function refers and 2) the names of the callee functions.\n\nFirmkit computes the string similarity score between two heuristic features\nusing a Jaccard index. Then, the similarity scores of these two heuritic\nfeatures are averaged with the score obtained by numeric presemantic features.\n\nFor more details, please check [my thesis\npaper](https://0xdkay.me/pub/2022/kim-phdthesis2022.pdf) (Chapter 6).\n\n\n# Ground Truth Vulnerability Dataset\nTo build the ground truth vulnerability dataset, we manually marked the\naddresses of vulnerable functions identified in previous studies,\n[FirmAE](https://github.com/pr0v3rbs/FirmAE) and\n[BaseSpec](https://github.com/SysSec-KAIST/BaseSpec).\n\nFor each vulnerability, we manually analyzed the binaries in the firmware images\nusing IDA Pro and obtained the addresses of the vulnerable functions. We\nexcluded functions that IDA Pro was unable to analyze. As a result, the final\nnumber of vulnerabilities differs from the number discovered in the previous\nstudies.\n\nFor more information, please check the [ground_truth](/ground_truth) directory\nor [Ground Truth Results.xlsx](/ground_truth/Ground_Truth_Results.xlsx).\nBaseband binary names are anonymized upon the vendor's request.\n\n**You need to keep the format correctly!**\n\nFor the OpenSSL vulnerabilities, we utilized the ASE dataset of\n[BinKit](https://github.com/SoftSec-KAIST/BinKit).\n\n\n# How to use\n\n## Extract firmware images using FirmAE (Optional)\n\nFirst, target firmware images should be unpacke using\n[FirmAE](https://github.com/pr0v3rbs/FirmAE).\n\n\n## Prepare presemantic features using BinKit and TikNib (Optional)\n\nNext, we train [TikNib](https://github.com/SoftSec-KAIST/TikNib) to select\nnumeric presemantic features for target compiler options and architectures. In\nour experiments, we used three architectures (arm, mips, x86 at 32 bits), two\noptimization levels (O2, O3), and four compilers (gcc-4.9.4, gcc-8.2.0,\nclang-4.0, calng-7.0). Please check [/config/firmae_gcc](/config/firmae_gcc).\n\nFor building the cross-compiling environment and dataset, we used\n[BinKit](https://github.com/SoftSec-KAIST/BinKit).\n\nTo see the scripts used for this step, please check\n[run_firmae_gcc_roc.sh](/helper/run_firmae_gcc_roc.sh). The shell scripts are\nused when running TikNib. Please setup right paths for them.\n\n\n## Run FirmKit\nWe assume that binaries in a target dataset is already unpacked.\n\nPlease replace the values in the config file correctly. For example, please\ncheck\n[/config/config_openssl_heartbeat.yml](/config/config_openssl_heartbeat.yml).\nFor the target vulnerable functions in this configuration, we used the OpenSSL\nbinaries in the ASE dataset of\n[BinKit](https://github.com/SoftSec-KAIST/BinKit).\n\nFirst, FirmKit extracts .tar.gz files for processing the FirmAE dataset.\nFor the BaseSpec dataset, it does not conduct this.\n\nThen, FirmKit processes the binaries in a target dataset using IDA Pro.\nFor this, we slightly modified the `fetch_funcdata_v7.5.py` of\n[TikNib](https://github.com/SoftSec-KAIST/TikNib). Please check\n[fetch_funcdata.py](/helper/fetch_funcdata.py). We used IDA Pro v7.6.\n\nNext, FirmKit extracts the features selected in the previous step, from the\nbinaries in a target dataset. If you skipped the previous step, you need to\nselect your own features in the config file. Please check\n[config_firmae_gcc.yml](/config/config_firmae_gcc.yml) for example.\n\nFinally, FirmKit calculates the similarity score.\n\nAll above steps is done by running below commands.\n\n```bash\n# For testing firmae vulnerabilities\n$ python firmkit_firmae.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_firmae_gcc.yml\n\n# For testing openssl CVE-2014-0160 (Heartbleed) in the firmae dataset\n$ python firmkit_firmae.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_openssl_heartbeat.yml\n\n# For testing openssl CVE-2015-1791 in firmae dataset\n$ python firmkit_firmae.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_openssl_vulseeker.yml\n\n# For testing basespec vulnerabilities\n$ python firmkit_basespec.py \\\n    --image_list helper/images_basespec.txt \\\n    --outdir output_basepsec \\\n    --config config/config_basespec.yml\n```\n\nTo check the similarity scores in a nice format, please check the below\ncommands. Notably, `check_score_openssl.py` averages all similarity scores and\ncomputes the results.\n\n```bash\n# For checking firmae vulnerabilities\n$ python check_score.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_firmae_gcc.yml \\\n    --ground ground_truth/ground_truth_firmae.csv\n\n# For checking openssl CVE-2014-0160 (Heartbleed) in the firmae dataset\n$ python check_score_openssl.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_openssl_heartbeat.yml\n\n# For checking openssl CVE-2015-1791 in firmae dataset\n$ python check_score_openssl.py \\\n    --image_list helper/images_firmae.txt \\\n    --outdir output \\\n    --config config/config_openssl_vulseeker.yml\n\n# For checking basespec vulnerabilities\n$ python check_score.py \\\n    --image_list helper/images_basespec.txt \\\n    --outdir output_basepsec \\\n    --config config/config_basespec.yml \\\n    --ground ground_truth/ground_truth_basespec.csv\n```\n\n## Results\nThe results will be stored in a comma separated file in the output directory. An\nexample of the result file would be [example_104.csv](/helper/example_104.csv).\n\nFor the full results, please check [Similarity Matching\nResults.xlsx](/helper/Similarity_Matching_Results.xlsx).\n\n\n# TODO\nCurrently, to fully utilize FirmKit, the experimental environment should be set\nup first. In the next release we need to automate this procedure with more kind\ndescriptions.\n\n\n# Issues\n\n### Tested environment\nWe ran all our experiments on a server equipped with four Intel Xeon E7-8867v4\n2.40 GHz CPUs (total 144 cores), 896 GB DDR4 RAM, and 4 TB SSD. We setup Ubuntu\n18.04.5 LTS with IDA Pro v7.6 and Python 3.8.9 on the server.\n\n# Authors\nThis project has been conducted by the below authors at KAIST.\n* [Dongkwan Kim](https://0xdkay.me/)\n\n# Citation\nWe would appreciate if you consider citing the previous papers,\n[FirmAE](https://github.com/pr0v3rbs/FirmAE),\n[BaseSpec](https://github.com/SysSec-KAIST/BaseSpec),\n[BinKit](https://github.com/SoftSec-KAIST/BinKit) \u0026\n[TikNib](https://github.com/SoftSec-KAIST/TikNib).\n\n```bibtex\n@inproceedings{kim:2020:firmae,\n  author = {Mingeun Kim and Dongkwan Kim and Eunsoo Kim and Suryeon Kim and Yeongjin Jang and Yongdae Kim},\n  title = {{FirmAE}: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis},\n  booktitle = {Annual Computer Security Applications Conference (ACSAC)},\n  year = 2020,\n  month = dec,\n  address = {Online}\n}\n\n@article{kim:2021:basespec,\n  author = {Eunsoo Kim and Dongkwan Kim and CheolJun Park and Insu Yun and Yongdae Kim},\n  title = {{BaseSpec}: Comparative Analysis of Baseband Software and Cellular Specifications for L3 Protocols},\n  booktitle = {Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS)},\n  year = 2021,\n  month = feb,\n  address = {Online}\n}\n\n@article{kim:2022:binkit,\n  author={Kim, Dongkwan and Kim, Eunsoo and Cha, Sang Kil and Son, Sooel and Kim, Yongdae},\n  journal={IEEE Transactions on Software Engineering (TSE)}, \n  title={Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned}, \n  year={2022},\n  volume={},\n  number={},\n  pages={1-23},\n  doi={10.1109/TSE.2022.3187689}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyssec-kaist%2Ffirmkit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyssec-kaist%2Ffirmkit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyssec-kaist%2Ffirmkit/lists"}