{"id":15398647,"url":"https://github.com/lancern/asm2vec","last_synced_at":"2025-09-14T22:26:53.108Z","repository":{"id":37648614,"uuid":"205378674","full_name":"Lancern/asm2vec","owner":"Lancern","description":"An unofficial implementation of asm2vec as a standalone python package","archived":false,"fork":false,"pushed_at":"2021-01-29T21:55:59.000Z","size":65,"stargazers_count":160,"open_issues_count":8,"forks_count":38,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-12-20T02:12:21.981Z","etag":null,"topics":["asm2vec","binary-analysis","machine-learning","nlp","numpy","python","python3","unofficial","word2vec"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Lancern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-30T12:33:10.000Z","updated_at":"2024-11-12T13:28:31.000Z","dependencies_parsed_at":"2022-09-09T04:01:32.290Z","dependency_job_id":null,"html_url":"https://github.com/Lancern/asm2vec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lancern%2Fasm2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lancern%2Fasm2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lancern%2Fasm2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Lancern%2Fasm2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Lancern","download_url":"https://codeload.github.com/Lancern/asm2vec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230542288,"owners_count":18242332,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asm2vec","binary-analysis","machine-learning","nlp","numpy","python","python3","unofficial","word2vec"],"created_at":"2024-10-01T15:44:59.959Z","updated_at":"2024-12-20T06:06:43.125Z","avatar_url":"https://github.com/Lancern.png","language":"Python","readme":"# asm2vec\n\nThis is an unofficial implementation of the `asm2vec` model as a standalone python package. The details of the model can be found in the original paper: [(sp'19) Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo)\n\n## Requirements\n\nThis implementation is written in python 3.7 and it's recommended to use python 3.7+ as well. The only dependency of this package is `numpy` which can be installed as follows:\n\n```shell\npython3 -m pip install numpy\n```\n\n## How to use\n\n### Import\n\nTo install the package, execute the following commands:\n\n```shell\ngit clone https://github.com/lancern/asm2vec.git\n```\n\nAdd the following line to the `.bashrc` file to add `asm2vec` to your python interpreter's search path for external packages:\n\n```shell\nexport PYTHONPATH=\"path/to/asm2vec:$PYTHONPATH\"\n```\n\nReplace `path/to/asm2vec` with the directory you clone `asm2vec` into. Then execute the following commands to update `PYTHONPATH`:\n\n```shell\nsource ~/.bashrc\n```\n\nYou can also add the following code snippets to your python source code referring `asm2vec` to guide python interpreter finding the package successfully:\n\n```python\nimport sys\nsys.path.append('path/to/asm2vec')\n```\n\nIn your python code, use the following `import` statement to import this package:\n\n```python\nimport asm2vec.\u003cmodule-name\u003e\n```\n\n### Define CFGs And Training\n\nYou have 2 approaches to define the binary program that will be sent to the `asm2vec` model. The first approach is to build the CFG manually, as shown below:\n\n```python\nfrom asm2vec.asm import BasicBlock\nfrom asm2vec.asm import Function\nfrom asm2vec.asm import parse_instruction\n\nblock1 = BasicBlock()\nblock1.add_instruction(parse_instruction('mov eax, ebx'))\nblock1.add_instruction(parse_instruction('jmp _loc'))\n\nblock2 = BasicBlock()\nblock2.add_instruction(parse_instruction('xor eax, eax'))\nblock2.add_instruction(parse_instruction('ret'))\n\nblock1.add_successor(block2)\n\nblock3 = BasicBlock()\nblock3.add_instruction(parse_instruction('sub eax, [ebp]'))\n\nf1 = Function(block1, 'some_func')\nf2 = Function(block3, 'another_func')\n\n# block4 is ignore here for clarity\nf3 = Function(block4, 'estimate_func')\n```\n\nAnd then you can train a model with the following code:\n\n```python\nfrom asm2vec.model import Asm2Vec\n\nmodel = Asm2Vec(d=200)\ntrain_repo = model.make_function_repo([f1, f2, f3])\nmodel.train(train_repo)\n```\n\nThe second approach is using the `parse` module provided by `asm2vec` to build CFGs automatically from an assembly code source file:\n\n```python\nfrom asm2vec.parse import parse_fp\n\nwith open('source.asm', 'r') as fp:\n    funcs = parse_fp(fp)\n```\n\nAnd then you can train a model with the following code:\n\n```python\nfrom asm2vec.model import Asm2Vec\n\nmodel = Asm2Vec(d=200)\ntrain_repo = model.make_function_repo(funcs)\nmodel.train(train_repo)\n```\n\n### Estimation\n\nYou can use the `asm2vec.model.Asm2Vec.to_vec` method to convert a function into its vector representation.\n\n### Serialization\n\nThe implementation support serialization on many of its internal data structures so that you can serialize the internal state of a trained model into disk for future use.\n\nYou can serialize two data structures to primitive data: the function repository and the model memento.\n\n\u003e To be finished.\n\n## Hyper Parameters\n\nThe constructor of `asm2vec.model.Asm2Vec` class accepts some keyword arguments as hyper parameters of the model. The following table lists all the hyper parameters available:\n\n| Parameter Name          | Type    | Meaning                                                                                                | Default Value |\n| ----------------------- | ------- | ------------------------------------------------------------------------------------------------------ | ------------- |\n| `d`                     | `int`   | The dimention of the vectors for tokens.                                                               | `200`         |\n| `initial_alpha`         | `float` | The initial learning rate.                                                                             | `0.05`        |\n| `alpha_update_interval` | `int`   | How many tokens can be processed before changing the learning rate?                                    | `10000`       |\n| `rnd_walks`             | `int`   | How many random walks to perform to sequentialize a function?                                          | `3`           |\n| `neg_samples`           | `int`   | How many samples to take during negative sampling?                                                     | `25`          |\n| `iteration`             | `int`   | How many iterations to perform? (This parameter is reserved for future use and is not implemented now) | `1`           |\n| `jobs`                  | `int`   | How many tasks to execute concurrently during training?                                                | `4`           |\n\n## Notes\n\nFor simplicity, the Selective Callee Expansion is not implemented in this early implementation. You have to do it manually before sending CFG into `asm2vec` .\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancern%2Fasm2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancern%2Fasm2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancern%2Fasm2vec/lists"}