An open API service indexing awesome lists of open source software.

https://github.com/alex-snd/malwareclassifier

👾 Malware Classification using Deep Learning and Cuckoo Sandbox
https://github.com/alex-snd/malwareclassifier

cuckoo-sandbox cvae data-science deep-learning malware malware-classification malware-detection python pytorch vae

Last synced: 11 months ago
JSON representation

👾 Malware Classification using Deep Learning and Cuckoo Sandbox

Awesome Lists containing this project

README

          

# Malware Classifier

This is the code repository for **Malware Classification Research**. All the deep learning models are implemented with Python 3.6+ and PyTorch 1.9.

## Data
The source data is the json reports generated by malicious software dynamic analysis system [Cuckoo Sandbox](https://cuckoosandbox.org/).
The data was analyzed in order to extract the most useful information about malicious samples. As a result of the analysis, 3698 features were selected, on the basis of which further classification will be carried out. Thus, each instance of malware is assigned a binary feature vector of dimension 3698, the label of which is the result of classification by Kaspersky anti-virus. The database contains about 10,000 labeled samples from 8 different types of malware and about 14,000 unlabeled samples.

## Data Visualization
The normalized vector of dimension 3698 is represented as an RGB image of the size 61 × 61 (61 ≈ √3698), in which the color of each pixel is set by the value of the corresponding feature.



## Autoencoder
An autoencoder model with a latent space dimension of 200 was trained on the unlabeled data for further malware classification using pretrained encoder.





AE performance, the first row is input, the second is AE output

Also the autoencoder was trained with the size of the latent space equal to 2 for its subsequent visualization on a two-dimensional plane.





Changing the latent space in the learning process





Labeled malware samples displayed in latent space

## Classifier
Сlassifier results: