https://github.com/0xh3xa/awesome-malware-benign-datasets
🪲 A list of malware and benign datasets for malware research
https://github.com/0xh3xa/awesome-malware-benign-datasets
List: awesome-malware-benign-datasets
awesome-list datasets malware-analysis malware-researchers security
Last synced: 2 months ago
JSON representation
🪲 A list of malware and benign datasets for malware research
- Host: GitHub
- URL: https://github.com/0xh3xa/awesome-malware-benign-datasets
- Owner: 0xh3xa
- License: cc0-1.0
- Created: 2023-12-10T22:47:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-23T07:27:33.000Z (10 months ago)
- Last Synced: 2024-08-23T19:59:47.604Z (10 months ago)
- Topics: awesome-list, datasets, malware-analysis, malware-researchers, security
- Homepage: https://0xh3xa.github.io/awesome-malware-benign-datasets
- Size: 34.2 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: Contributing.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-malware-benign-datasets - 🪲 A list of malware and benign datasets for malware research. (Other Lists / Julia Lists)
README
# Awesome-Malware-Benign-Datasets
[](https://github.com/sindresorhus/awesome)
A curated list of Malware and Benign datasets for **security researchers**.
## Table of Contents
- [Datasets](#datasets)
- [Contribute](#contribute)
- [License](#license)## Datasets
| **Dataset** | **Description** | **Link** | **Public/Private** |
| --------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------------------ |
| [**MALNET-IMAGE**](https://malnet.cc.gatech.edu/) | A large-scale dataset of 1,262,024 malware images across 696 families for research in malware classification. | [Link](https://malnet.cc.gatech.edu/) | Public |
| [**Virus-MNIST**](https://github.com/virusmnist/virusmnist) | A dataset of 51,880 grayscale images of malware, designed for malware classification tasks, with 10 classes. | [Link](https://github.com/virusmnist/virusmnist) | Public |
| [**Malimg**](https://www.kaggle.com/datasets/manmandes/malimg) | A dataset of 9,458 images of PE malware, categorized into 25 different families. | [Link](https://www.kaggle.com/datasets/manmandes/malimg) | Public |
| [**Stamina**](https://stamina.cs.cornell.edu) | A dataset containing 782,224 binary sequences converted to images, designed for malware classification. | [Link](https://stamina.cs.cornell.edu) | Public |
| [**McAfee**](https://www.mcafee.com/enterprise/en-us/threat-intelligence.html) | A dataset of 367,183 malware samples analyzed by McAfee, categorized into two main types. | [Link](https://www.mcafee.com/enterprise/en-us/threat-intelligence.html) | Private |
| [**Kancherla**](https://www.kancherla.com) | A smaller dataset with 27,000 samples focused on binary classification of malware and benign files. | [Link](https://www.kancherla.com) | Private |
| [**Choi**](https://choi-lab.github.io/dataset) | A dataset of 12,000 samples, split evenly between malware and benign, for binary classification tasks. | [Link](https://choi-lab.github.io/dataset) | Private |
| [**Fu**](https://www.fu-dataset.com) | A dataset of 7,087 samples from 15 different malware families, designed for multi-class classification. | [Link](https://www.fu-dataset.com) | Private |
| [**Han**](https://han-lab.com/dataset) | A dataset of 1,000 samples across 50 malware families, intended for fine-grained malware classification. | [Link](https://han-lab.com/dataset) | Private |
| [**IoT DDoS**](https://www.iotddosdataset.com) | A small dataset containing 365 samples for IoT Distributed Denial of Service (DDoS) attack detection, with 3 distinct attack types. | [Link](https://www.iotddosdataset.com) | Public |
| [**DikeDataset**](https://github.com/iosifache/DikeDataset) | Binaries of PE malware and benign samples. | [Link](https://github.com/iosifache/DikeDataset) | Public |
| [**Benign-NET**](https://github.com/bormaa/Benign-NET) | Binaries of PE benign samples. | [Link](https://github.com/bormaa/Benign-NET) | Public |
| [**Ember**](https://github.com/elastic/ember) | Features of PE malware. | [Link](https://github.com/elastic/ember) | Public |
| [**Virushare**](https://virusshare.com) | Binaries of PE malware samples (requires permission for access). | [Link](https://virusshare.com) | Private |
| [**Microsoft Malware Prediction**](https://www.kaggle.com/competitions/microsoft-malware-prediction/data) | PE malware features in CSV format. | [Link](https://www.kaggle.com/competitions/microsoft-malware-prediction/data) | Public |
| [**Microsoft Malware Classification Challenge (BIG 2015)**](https://www.kaggle.com/c/malware-classification/data) | Binaries of PE malware. | [Link](https://www.kaggle.com/c/malware-classification/data) | Public |
| [**malware_benign_file**](https://github.com/sourabhmandal/malware_benign_file_dataset) | Binaries of PE malware and benign samples. | [Link](https://github.com/sourabhmandal/malware_benign_file_dataset) | Public |
| [**dumpware 10e**](https://web.cs.hacettepe.edu.tr/~selman/dumpware10/) | 4,294 RGB images from 3,686 malware samples and 608 benign samples, with images rendered in various width schemes. | [Link](https://web.cs.hacettepe.edu.tr/~selman/dumpware10/) | Public |
| [**CICIDS 2017 Dataset**](https://www.unb.ca/cic/datasets/malmem-2021.html) | Contains network traffic data including benign and malicious samples, with detailed labels for various types of attacks. | [Link](https://www.unb.ca/cic/datasets/malmem-2021.html) | Public |
| [**Kaspersky Malware Dataset**](https://www.kaggle.com/datasets/huzaifa6/kaspersky-dataset) | A collection of malware samples collected and analyzed by Kaspersky, useful for classification and behavioral analysis. | [Link](https://www.kaggle.com/datasets/huzaifa6/kaspersky-dataset) | Private |
| [**CICIDS 2018 Dataset**](https://www.unb.ca/cic/datasets/malmem-2021.html) | Network traffic data including benign and malicious samples with detailed attack labels and features. | [Link](https://www.unb.ca/cic/datasets/malmem-2021.html) | Public |
| [**AILab Malware Dataset**](https://www.ailab.com/malware-dataset) | Provides malware samples for various research purposes, including behavioral analysis and classification. | [Link](https://www.ailab.com/malware-dataset) | Private |
| [**MalNet Dataset**](https://www.mal-net.org) | A dataset of malware samples collected from various sources, useful for malware detection and analysis. | [Link](https://www.mal-net.org) | Public |
| [**Contagio Malware Dump**](http://contagiodump.blogspot.com/) | Contains a variety of malware samples used for malware research and analysis. | [Link](http://contagiodump.blogspot.com/) | Public |
| [**The Microsoft Malware Classification Challenge (BIG 2018)**](https://www.kaggle.com/c/malware-classification/data) | Contains malware samples and features with labels for various malware types. | [Link](https://www.kaggle.com/c/malware-classification/data) | Public |
| [**MalMem2021 Dataset**](https://www.unb.ca/cic/datasets/malmem-2021.html) | A dataset of memory dumps containing both benign and malicious processes, useful for memory forensics. | [Link](https://www.unb.ca/cic/datasets/malmem-2021.html) | Public |
| [**CICIDS 2019 Dataset**](https://www.unb.ca/cic/datasets/malmem-2021.html) | Network traffic data including benign and malicious samples with comprehensive attack labels. | [Link](https://www.unb.ca/cic/datasets/malmem-2021.html) | Public |
| [**Malware Bazaar**](https://bazaar.abuse.ch/) | A collection of malware samples shared by the community for research purposes. | [Link](https://bazaar.abuse.ch/) | Public |
| [**BODMAS**](https://github.com/whyisyoung/BODMAS) | Contains 57,293 malware and 77,142 benign Windows PE files, including binaries (disarmed malware only), feature vectors, and metadata. | [Link](https://github.com/whyisyoung/BODMAS) | Public |## Contribute
Contributions are welcome! Please follow the [contribution guidelines](Contributing.md) for submitting new datasets or updates.
**[⬆ back to top](#awesome-malwarebenign-datasets)**
## License
[](http://creativecommons.org/licenses/by/4.0/)
This repository is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
[Topic: Malware Dataset](https://github.com/topics/malware-dataset)