{"id":22066398,"url":"https://github.com/jaketae/deep-malware-detection","last_synced_at":"2025-06-12T17:04:08.718Z","repository":{"id":37782736,"uuid":"308901326","full_name":"jaketae/deep-malware-detection","owner":"jaketae","description":"A neural approach to malware detection in portable executables","archived":false,"fork":false,"pushed_at":"2023-03-20T18:04:06.000Z","size":48892,"stargazers_count":79,"open_issues_count":3,"forks_count":17,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-13T01:54:46.823Z","etag":null,"topics":["deep-learning","malware-detection","malware-research","pe-file","pe-format","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaketae.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-31T14:36:26.000Z","updated_at":"2025-04-06T07:09:53.000Z","dependencies_parsed_at":"2024-11-30T19:38:10.549Z","dependency_job_id":null,"html_url":"https://github.com/jaketae/deep-malware-detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fdeep-malware-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fdeep-malware-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fdeep-malware-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaketae%2Fdeep-malware-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaketae","download_url":"https://codeload.github.com/jaketae/deep-malware-detection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253856640,"owners_count":21974577,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","malware-detection","malware-research","pe-file","pe-format","pytorch"],"created_at":"2024-11-30T19:27:57.538Z","updated_at":"2025-05-13T01:54:55.441Z","avatar_url":"https://github.com/jaketae.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neural Network Malware Binary Classification\n\nPyTorch implementation of [Malware Detection by Eating a Whole EXE](https://arxiv.org/abs/1710.09435), [Learning the PE Header, Malware Detection with Minimal Domain Knowledge](https://arxiv.org/abs/1709.01471), and other derived models for malware detection.\n\nAll model checkpoints are available at [`assets/checkpoints`](assets/checkpoints).\n\n## Quickstart\n\n1. Clone this repository via\n\n```\n$ git clone https://github.com/jaketae/deep-malware-detection.git\n$ cd pytorch-malware-detection\n```\n\n2. Create a Python virtual environment and install dependencies.\n\n```\n$ python -m venv venv\n$ source venv/bin/activate\n$ pip install -U pip wheel # update pip\n$ pip install -r requirements.txt\n```\n\n3. Prepare PE files. `src/bin` provides scrapers to download malware. For instance, to download files from [dalswerk](https://das-malwerk.herokuapp.com), run\n\n```\n$ python -m src.bin.dasmalwerk\n```\n\nBy default, this will download the files under the `raw` folder of the root directory.\n\n4. Train the model.\n\n```\n$ cd src/deep_malware_detection\n$ python train.py --benign_dir=YOUR_PATH_TO_BENIGN --malware_dir=YOUR_PATH_TO_MALWARE\n```\n\n## Data\n\nThis project was developed in late 2020, and unfortunately I lost access to the server where I collected data and ran experiments. While replicating all training data exactly may be infeasible, here are some resources for data collection.\n\n1. [Wikidll.com](https://wikidll.com): Online website with downloadable benign `.dll` files. [Scraper](src/bin/dll.py).\n2. [Dasmalwerk](https://das-malwerk.herokuapp.com): Online website with downloadable malware for research. [Scraper](src/bin/dasmalwerk.py).\n3. [Malshare.com](https://malshare.com): Online website with downloadable malware for research. [Scraper](src/bin/malshare.py).\n4. [EMBER](https://github.com/elastic/ember): Open dataset for malware detection research.\n5. [Kaggle dataset](https://www.kaggle.com/datasets/amauricio/pe-files-malwares): PE file dataset availalbe on Kaggle, including both benign and malicious files.\n\n\n## Implementation Notes\n\n1. While Raff et. al used LSTMs for the sequential model, we tested both GRU and LSTMs and found that the former was easier to train.\n2. We combined models presented in the two papers to derive a custom model that uses concatenated feature vector produced by the entry point 1D-CNN layer as well as the RNN units that follow. We denote these custom models with a \"Res\" prefix in the table below.\n3. We also further develop the attention-based model in Raff et. al with this residual approach.\n4. Due to computational constraints, we decided to only use PE file headers up to their 4096th bytes, thus creating a 4096 dimensional sequential feature vector for every file.\n\n## Results\n\nPresented below is a table detailing the performance of each model.\n\n| Architecture   | Acc | F1   |\n| -------------- | --- | ---- |\n| MalConvBase    | 91  | .931 |\n| MalConv+       | 94  | .951 |\n| MalConv+ (E16) | 93  | .944 |\n| MalConv+ (W64) | 94  | .949 |\n| MC+ (E16,W64)  | 94  | .950 |\n| MC+ (C256)     | 91  | .930 |\n| GRU-CNN        | 93  | .946 |\n| BiGRU-CNN      | 91  | .931 |\n| GRU-CNN (H128) | 93  | .946 |\n| ResGRU-CNN     | 94  | .948 |\n| AttnGRU-CNN    | 94  | .952 |\n| AttnResGRU-CNN | 94  | .952 |\n\nFor visualizations of training and model evaluation, refer to images in the `figures` directory.\n\n## Contributing\n\nThe coding style is dictated by [black](https://black.readthedocs.io/en/stable/) and [isort](https://pycqa.github.io/isort/). You can apply them via \n\n```\n# pip install black isort\nmake style\n```\n\nPlease feel free to submit issues or pull requests.\n\n## Citation\n\nIf you find this repository helpful for your research, please cite as follows.\n\n```\n@misc{dmd,\n\ttitle        = {Deep Malware Detection: A neural approach to malware detection in portable executables},\n\tauthor       = {Tae, Jaesung},\n\tyear         = 2020,\n\thowpublished = {\\url{https://github.com/jaketae/deep-malware-detection}}\n}\n```\n\n## References\n\n```\n@misc{raff2017malware,\n\ttitle        = {Malware Detection by Eating a Whole EXE},\n\tauthor       = {Edward Raff and Jon Barker and Jared Sylvester and Robert Brandon and Bryan Catanzaro and Charles Nicholas},\n\tyear         = 2017,\n\teprint       = {1710.09435},\n\tarchiveprefix = {arXiv},\n\tprimaryclass = {stat.ML}\n}\n@article{Raff_2017,\n\ttitle        = {Learning the PE Header, Malware Detection with Minimal Domain Knowledge},\n\tauthor       = {Raff, Edward and Sylvester, Jared and Nicholas, Charles},\n\tyear         = 2017,\n\tjournal      = {Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec  ’17},\n\tpublisher    = {ACM Press},\n\tdoi          = {10.1145/3128572.3140442},\n\tisbn         = 9781450352024,\n\turl          = {http://dx.doi.org/10.1145/3128572.3140442}\n}\n```\n\n## License\n\nReleased under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaketae%2Fdeep-malware-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaketae%2Fdeep-malware-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaketae%2Fdeep-malware-detection/lists"}