{"id":13658792,"url":"https://github.com/gadiluna/SAFE","last_synced_at":"2025-04-24T11:32:50.111Z","repository":{"id":47649436,"uuid":"157537382","full_name":"gadiluna/SAFE","owner":"gadiluna","description":"SAFE: Self-Attentive Function Embeddings for binary similarity","archived":false,"fork":false,"pushed_at":"2023-07-17T08:23:08.000Z","size":178,"stargazers_count":172,"open_issues_count":9,"forks_count":40,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-12-06T22:47:38.831Z","etag":null,"topics":["binary","machine-learning","neural-networks","scientific-research"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gadiluna.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-11-14T11:16:25.000Z","updated_at":"2024-12-01T16:38:37.000Z","dependencies_parsed_at":"2023-01-19T08:31:03.676Z","dependency_job_id":"64b11c73-5f5b-4969-b131-6b3cdb39bcfe","html_url":"https://github.com/gadiluna/SAFE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gadiluna%2FSAFE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gadiluna%2FSAFE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gadiluna%2FSAFE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gadiluna%2FSAFE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gadiluna","download_url":"https://codeload.github.com/gadiluna/SAFE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250618652,"owners_count":21460131,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary","machine-learning","neural-networks","scientific-research"],"created_at":"2024-08-02T05:01:02.677Z","updated_at":"2025-04-24T11:32:49.823Z","avatar_url":"https://github.com/gadiluna.png","language":"Python","funding_links":[],"categories":["Python (144)"],"sub_categories":[],"readme":"# SAFE : Self Attentive Function Embedding\n\nPaper\n---\nThis software is the outcome of our accademic research. See our arXiv paper: [arxiv](https://arxiv.org/abs/1811.05296)\n\nIf you use this code, please cite our accademic paper as:\n\n```bibtex\n@inproceedings{massarelli2018safe,\n  title={SAFE: Self-Attentive Function Embeddings for Binary Similarity},\n  author={Massarelli, Luca and Di Luna, Giuseppe Antonio and Petroni, Fabio and Querzoni, Leonardo and Baldoni, Roberto},\n  booktitle={Proceedings of 16th Conference on Detection of Intrusions and Malware \u0026 Vulnerability Assessment (DIMVA)},\n  year={2019}\n}\n```\n\nWhat you need  \n-----\nYou need [radare2](https://github.com/radare/radare2) installed in your system. \n  \nQuickstart\n-----\nTo create the embedding of a function:\n```\ngit clone https://github.com/gadiluna/SAFE.git\npip install -r requirements\nchmod +x download_model.sh\n./download_model.sh\npython safe.py -m data/safe.pb -i helloworld.o -a 100000F30\n```\n#### What to do with an embedding?\nOnce you have two embeddings ```embedding_x``` and ```embedding_y``` you can compute the similarity of the corresponding functions as: \n```\nfrom sklearn.metrics.pairwise import cosine_similarity\n\nsim=cosine_similarity(embedding_x, embedding_y)\n \n```\n\n\nData Needed\n-----\nSAFE needs few information to work. Two are essentials, a model that tells safe how to \nconvert assembly instructions in vectors (i2v model) and a model that tells safe how\nto convert an binary function into a vector.\nBoth models can be downloaded by using the command\n```\n./download_model.sh\n```\nthe downloader downloads the model and place them in the directory data.\nThe directory tree after the download should be.\n```\nsafe/-- githubcode\n     \\\n      \\--data/-----safe.pb\n               \\\n                \\---i2v/\n            \n```\nThe safe.pb file contains the safe-model used to convert binary function to vectors.\nThe i2v folder contains the i2v model. \n\n\nHardcore Details\n----\nThis section contains details that are needed to replicate our experiments, if you are an user of safe you can skip\nit. \n\n### Safe.pb\nThis is the freezed tensorflow trained model for AMD64 architecture. You can import it in your project using:\n\n```\n import tensorflow as tf\n \n with tf.gfile.GFile(\"safe.pb\", \"rb\") as f:\n    graph_def = tf.GraphDef()\n    graph_def.ParseFromString(f.read())\n\n with tf.Graph().as_default() as graph:\n    tf.import_graph_def(graph_def)\n    \n sess = tf.Session(graph=graph)\n``` \n\nsee file: neural_network/SAFEEmbedder.py\n\n### i2v\nThe i2v folder contains two files. \nA Matrix where each row is the embedding of an asm instruction.\nA json file that contains a dictonary mapping asm instructions into row numbers of the matrix above.\nsee file: asm_embedding/InstructionsConverter.py\n\n\n\n## Train the model\nIf you want to train the model using our datasets you have to first use:\n```\n python3 downloader.py -td\n```\nThis will download the datasets into data folder. Note that the datasets are compressed so you have to decompress them yourself.\nThis data will be an sqlite databases.\nTo start the train use neural_network/train.sh.\nThe db can be selected by changing the parameter into train.sh.\nIf you want information on the dataset see our paper.\n\n## Create your own dataset\nIf you want to create your own dataset you can use the script ExperimentUtil into the folder\ndataset creation.\n\n## Create a functions knowledge base\nIf you want to use SAFE binary code search engine you can use the script ExperimentUtil to create\nthe knowledge base.\nThen you can search through it using the script into function_search\n\n\nRelated Projects\n---\n\n* YARASAFE: Automatic Binary Function Similarity Checks with Yara (https://github.com/lucamassarelli/yarasafe) \n* SAFEtorch: Pytorch implemenation of the SAFE neural network (https://github.com/facebookresearch/SAFEtorch)\n\nThanks\n---\nIn our code we use [godown](https://github.com/circulosmeos/gdown.pl) to download data from Google drive. We thank \ncirculosmeos, the creator of godown.\n\nWe thank Davide Italiano for the useful discussions. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgadiluna%2FSAFE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgadiluna%2FSAFE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgadiluna%2FSAFE/lists"}