{"id":37238802,"url":"https://github.com/br0kej/bin2ml","last_synced_at":"2026-01-22T10:01:21.426Z","repository":{"id":190663814,"uuid":"642800749","full_name":"br0kej/bin2ml","owner":"br0kej","description":"A command line tool for extracting machine learning ready data from software binaries powered by Radare2","archived":false,"fork":false,"pushed_at":"2025-05-03T08:18:35.000Z","size":1693,"stargazers_count":69,"open_issues_count":2,"forks_count":5,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-03T09:28:37.575Z","etag":null,"topics":["binary-analysis","data-generation","graph-neural-networks","machine-learning","ml4sec","nlp","radare2","reverse-engineering"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/br0kej.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-05-19T11:26:53.000Z","updated_at":"2025-04-02T09:53:10.000Z","dependencies_parsed_at":"2023-11-22T21:25:31.502Z","dependency_job_id":"37b833a6-b672-4447-8159-6d4472b9da09","html_url":"https://github.com/br0kej/bin2ml","commit_stats":null,"previous_names":["br0kej/bin2ml"],"tags_count":19,"template":false,"template_full_name":null,"purl":"pkg:github/br0kej/bin2ml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/br0kej%2Fbin2ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/br0kej%2Fbin2ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/br0kej%2Fbin2ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/br0kej%2Fbin2ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/br0kej","download_url":"https://codeload.github.com/br0kej/bin2ml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/br0kej%2Fbin2ml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28661007,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T01:17:37.254Z","status":"online","status_checked_at":"2026-01-22T02:00:07.137Z","response_time":144,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-analysis","data-generation","graph-neural-networks","machine-learning","ml4sec","nlp","radare2","reverse-engineering"],"created_at":"2026-01-15T06:00:38.752Z","updated_at":"2026-01-22T10:01:21.419Z","avatar_url":"https://github.com/br0kej.png","language":"Rust","funding_links":[],"categories":["Rust","Binary Feature Extraction"],"sub_categories":[],"readme":"# `bin2ml`\n\n`bin2ml` is a command line tool to extract machine learning ready data from software binaries. It's ideal for researchers and hackers to easily extract data suitable for training machine learning approaches such as natural language processing (NLP) or Graph Neural Networks (GNN's) models using data derived from software binaries.\n\n- Extract a range of different data from binaries such as Attributed Control Flow Graphs, Basic Block random walks and function instructions strings powered by [Radare2](https://github.com/radareorg/radare2).\n- Multithreaded data processing throughout powered by [Rayon](https://github.com/rayon-rs/rayon).\n- Save processed data in ready to go formats such as graphs saved as [NetworkX](https://networkx.org/) compatible JSON objects.\n- Experimental support for creating machine learning embedded basic block CFG's using `tch-rs` and TorchScript traced models.\n\n\u003e `bin2ml` is under active development and is in an alpha state. Things will change as the tool is developed and built upon further.\n\n## Pre-Requisites\n- Radare2 Installed - Info on how to do this can be found [here](https://github.com/radareorg/radare2).\n\n## Quickstart\n```bash\ngit clone https://github.com/br0kej/bin2ml\ncd bin2ml\ncargo build --release\n```\nAlternatively, there are two Dockerfile's provided. `Dockerfile.build` can be used to build the `bin2ml` binary without having to have cargo on your workstation or `Dockerfile` builds `bin2ml` as well as installing radare2 to provide a means of doing processing within the container.\n## Docs\n`bin2ml` does come with some documentation (albeit incomplete) and has been developed using `mdbook`. The documentation can be locally served by installing the platform relevant version of `mdbook` from [here](https://github.com/rust-lang/mdBook/releases)\nand then executing the commands below:\n```bash\ncd bin2ml/docs\nmdbook serve\n```\nAlternatively, they can be viewed raw by going to the docs folder [here](docs/src/README.md)\n## License\n\nThe `bin2ml` source and documentation are released under the MIT license.\n\n## Citation\n\n```bibtex\n@misc{collyer2023bin2ml,\n  author = {Josh Collyer},\n  title = {bin2ml},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/br0kej/bin2ml/}},\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbr0kej%2Fbin2ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbr0kej%2Fbin2ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbr0kej%2Fbin2ml/lists"}