Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/marcoramilli/MalwareTrainingSets
Free Malware Training Datasets for Machine Learning
https://github.com/marcoramilli/MalwareTrainingSets
machine-learning malware malware-analysis training-set
Last synced: 2 months ago
JSON representation
Free Malware Training Datasets for Machine Learning
- Host: GitHub
- URL: https://github.com/marcoramilli/MalwareTrainingSets
- Owner: marcoramilli
- Created: 2016-12-11T18:11:55.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2021-01-03T20:58:37.000Z (over 3 years ago)
- Last Synced: 2024-01-26T00:39:39.750Z (5 months ago)
- Topics: machine-learning, malware, malware-analysis, training-set
- Language: Python
- Size: 49.2 MB
- Stars: 208
- Watchers: 13
- Forks: 101
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
Lists
- awesome-ml-for-cybersecurity - Malware Training Data Sets
- awesome-ml-for-cybersecurity - Malware Training Data Sets
- awesome-ml-for-cybersecurity - Malware Training Data Sets
- awesome-ml-for-cybersecurity - Malware Training Data Sets
- awesome-ml-for-cybersecurity - Malware Training Data Sets
- Awesome-ML-Cybersecurity - Malware training data set
README
# MalwareTrainingSets
Please check it out: https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/For an updated followUP please check it out: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/
**Cite The DataSet**
If you find those results useful please cite them :
@misc{ MR,
author = "Marco Ramilli",
title = "Malware Training Sets: a machine learning dataset for everyone",
year = "2016",
url = "https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/",
note = "[Online; December 2016]"
}*UPDATE*
Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section).
You might use `mist_json.py` as a reporting module from CuckooSandbox and the script `fromMongoToARFF.py` to generate ARFF files suitables for WEKA.If you are going to create new datasets by running your local CuckooSandbox using `mist_json.py` module and you wanto to share them, please feel free to make pool requests !
If you want to know more about the working flow, please check this update: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/