{"id":13767335,"url":"https://github.com/oalieno/asm2vec-pytorch","last_synced_at":"2025-05-10T22:31:43.794Z","repository":{"id":40452828,"uuid":"334627060","full_name":"oalieno/asm2vec-pytorch","owner":"oalieno","description":"Unofficial implementation of asm2vec using pytorch ( with GPU acceleration )","archived":false,"fork":false,"pushed_at":"2023-10-25T16:06:26.000Z","size":62,"stargazers_count":75,"open_issues_count":8,"forks_count":21,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-17T02:34:34.099Z","etag":null,"topics":["asm2vec","gpu-acceleration","machine-learning","neural-language-processing","python","pytorch","unofficial"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oalieno.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-31T10:21:51.000Z","updated_at":"2024-11-08T16:56:16.000Z","dependencies_parsed_at":"2024-01-07T03:40:16.535Z","dependency_job_id":"77bb17a3-ed15-4744-ace6-2cb80aec9eb0","html_url":"https://github.com/oalieno/asm2vec-pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oalieno%2Fasm2vec-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oalieno%2Fasm2vec-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oalieno%2Fasm2vec-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oalieno%2Fasm2vec-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oalieno","download_url":"https://codeload.github.com/oalieno/asm2vec-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253492529,"owners_count":21916959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asm2vec","gpu-acceleration","machine-learning","neural-language-processing","python","pytorch","unofficial"],"created_at":"2024-08-03T16:01:07.598Z","updated_at":"2025-05-10T22:31:41.717Z","avatar_url":"https://github.com/oalieno.png","language":"Python","funding_links":[],"categories":["Research"],"sub_categories":["Asm2Vec"],"readme":"# asm2vec-pytorch\n\n\u003ca\u003e\u003cimg alt=\"release 1.0.0\" src=\"https://img.shields.io/badge/release-v1.0.0-yellow?style=for-the-badge\"\u003e\u003c/a\u003e\n\u003ca\u003e\u003cimg alt=\"mit\" src=\"https://img.shields.io/badge/license-MIT-brightgreen?style=for-the-badge\"\u003e\u003c/a\u003e\n\u003ca\u003e\u003cimg alt=\"python\" src=\"https://img.shields.io/badge/-python-9cf?style=for-the-badge\u0026logo=python\"\u003e\u003c/a\u003e\n\nUnofficial implementation of `asm2vec` using pytorch ( with GPU acceleration )  \nThe details of the model can be found in the original paper: [(sp'19) Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo)  \n\n## Requirements\n\npython \u003e= 3.6\n\n| packages | for |\n| --- | --- |\n| r2pipe | `scripts/bin2asm.py` |\n| click | `scripts/*` |\n| torch | almost all code need it |\n\nYou also need to install `radare2` to run `scripts/bin2asm.py`. `r2pipe` is just the python interface to `radare2`\n\nIf you only want to use the library code, you just need to install `torch`\n\n## Install\n\n```\npython setup.py install\n```\n\nor\n\n```\npip install git+https://github.com/oalieno/asm2vec-pytorch.git\n```\n\n## Benchmark\n\nAn implementation already exists here: [Lancern/asm2vec](https://github.com/Lancern/asm2vec)  \nFollowing is the benchmark of training 1000 functions in 1 epoch.\n\n| Implementation | Time (s) |\n| :-: | :-: |\n| [Lancern/asm2vec](https://github.com/Lancern/asm2vec) | 202.23 |\n| [oalieno/asm2vec-pytorch](https://github.com/oalieno/asm2vec-pytorch) (with CPU) | 9.11 |\n| [oalieno/asm2vec-pytorch](https://github.com/oalieno/asm2vec-pytorch) (with GPU) | 0.97 |\n\n## Get Started\n\n```bash\npython scripts/bin2asm.py -i /bin/ -o asm/\n```\n\nFirst generate asm files from binarys under `/bin/`.  \nYou can hit `Ctrl+C` anytime when there is enough data.\n\n```bash\npython scripts/train.py -i asm/ -l 100 -o model.pt --epochs 100\n```\n\nTry to train the model using only 100 functions and 100 epochs for a taste.  \nThen you can use more data if you want.\n\n```bash\npython scripts/test.py -i asm/123456 -m model.pt\n```\n\nAfter you train your model, try to grab an assembly function and see the result.  \nThis script will show you how the model perform.  \nOnce you satisfied, you can take out the embedding vector of the function and do whatever you want with it.\n\n## Usage\n\n### bin2asm.py\n\n```\nUsage: bin2asm.py [OPTIONS]\n\n  Extract assembly functions from binary executable\n\nOptions:\n  -i, --input TEXT   input directory / file  [required]\n  -o, --output TEXT  output directory\n  -l, --len INTEGER  ignore assembly code with instructions amount smaller\n                     than minlen\n\n  --help             Show this message and exit.\n```\n\n```bash\n# Example\npython bin2asm.py -i /bin/ -o asm/\n```\n\n### train.py\n\n```\nUsage: train.py [OPTIONS]\n\nOptions:\n  -i, --input TEXT                training data folder  [required]\n  -o, --output TEXT               output model path  [default: model.pt]\n  -m, --model TEXT                load previous trained model path\n  -l, --limit INTEGER             limit the number of functions to be loaded\n  -d, --ebedding-dimension INTEGER\n                                  embedding dimension  [default: 100]\n  -b, --batch-size INTEGER        batch size  [default: 1024]\n  -e, --epochs INTEGER            training epochs  [default: 10]\n  -n, --neg-sample-num INTEGER    negative sampling amount  [default: 25]\n  -a, --calculate-accuracy        whether calculate accuracy ( will be\n                                  significantly slower )\n\n  -c, --device TEXT               hardware device to be used: cpu / cuda /\n                                  auto  [default: auto]\n\n  -lr, --learning-rate FLOAT      learning rate  [default: 0.02]\n  --help                          Show this message and exit.\n```\n\n```bash\n# Example\npython train.py -i asm/ -o model.pt --epochs 100\n```\n\n### test.py\n\n```\nUsage: test.py [OPTIONS]\n\nOptions:\n  -i, --input TEXT              target function  [required]\n  -m, --model TEXT              model path  [required]\n  -e, --epochs INTEGER          training epochs  [default: 10]\n  -n, --neg-sample-num INTEGER  negative sampling amount  [default: 25]\n  -l, --limit INTEGER           limit the amount of output probability result\n  -c, --device TEXT             hardware device to be used: cpu / cuda / auto\n                                [default: auto]\n\n  -lr, --learning-rate FLOAT    learning rate  [default: 0.02]\n  -p, --pretty                  pretty print table  [default: False]\n  --help                        Show this message and exit.\n```\n\n```bash\n# Example\npython test.py -i asm/123456 -m model.pt\n```\n\n```\n┌──────────────────────────────────────────┐\n│    endbr64                               │\n│  ➔ push r15                              │\n│    push r14                              │\n├────────┬─────────────────────────────────┤\n│ 34.68% │ [rdx + rsi*CONST + CONST]       │\n│ 20.29% │ push                            │\n│ 16.22% │ r15                             │\n│ 04.36% │ r14                             │\n│ 03.55% │ r11d                            │\n└────────┴─────────────────────────────────┘\n```\n\n### compare.py\n\n```\nUsage: compare.py [OPTIONS]\n\nOptions:\n  -i1, --input1 TEXT          target function 1  [required]\n  -i2, --input2 TEXT          target function 2  [required]\n  -m, --model TEXT            model path  [required]\n  -e, --epochs INTEGER        training epochs  [default: 10]\n  -c, --device TEXT           hardware device to be used: cpu / cuda / auto\n                              [default: auto]\n\n  -lr, --learning-rate FLOAT  learning rate  [default: 0.02]\n  --help                      Show this message and exit.\n```\n\n```bash\n# Example\npython compare.py -i1 asm/123456 -i2 asm/654321 -m model.pt -e 30\n```\n\n```\ncosine similarity : 0.873684\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foalieno%2Fasm2vec-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foalieno%2Fasm2vec-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foalieno%2Fasm2vec-pytorch/lists"}