Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/marcoramilli/MalwareTrainingSets

Free Malware Training Datasets for Machine Learning
https://github.com/marcoramilli/MalwareTrainingSets

machine-learning malware malware-analysis training-set

Last synced: 2 months ago
JSON representation

Free Malware Training Datasets for Machine Learning

Lists

README

        

# MalwareTrainingSets
Please check it out: https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/

For an updated followUP please check it out: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/

**Cite The DataSet**
If you find those results useful please cite them :


@misc{ MR,
author = "Marco Ramilli",
title = "Malware Training Sets: a machine learning dataset for everyone",
year = "2016",
url = "https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/",
note = "[Online; December 2016]"
}

*UPDATE*
Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section).
You might use `mist_json.py` as a reporting module from CuckooSandbox and the script `fromMongoToARFF.py` to generate ARFF files suitables for WEKA.

If you are going to create new datasets by running your local CuckooSandbox using `mist_json.py` module and you wanto to share them, please feel free to make pool requests !

If you want to know more about the working flow, please check this update: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/