Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tanaybhadula/malware-image-detection

A deep learning project which uses a method that converts malware .bytes files into gray-scale images and uses a CNN deep learning model to classify the converted malware image and identify the malware family it belongs to.
https://github.com/tanaybhadula/malware-image-detection

classification cnn cybersecurity deep-learning keras machine-learning malware python scipy tensorflow

Last synced: 2 months ago
JSON representation

A deep learning project which uses a method that converts malware .bytes files into gray-scale images and uses a CNN deep learning model to classify the converted malware image and identify the malware family it belongs to.

Awesome Lists containing this project

README

        

# :desktop_computer: Image-based Malware Classification using CNN

## Introduction
Analyzing a huge amount of malware is a major burden for security analysts.Malware
developers have been highly successful in evading signature-based detection techniques.
Most of the prevailing static analysis techniques involve a tool to parse the executable, and
extract features or signatures. Most of the dynamic analysis techniques involve the binary file
to be run in a sand-boxed environment to examine its behaviour. This can be easily thwarted
by hiding the malicious activities of the file if it is being run inside a virtual environment.
Hence, there has been a need to explore new approaches to overcome the limitations of static
or dynamic analysis such as time intensity, resource consumption, scalability.

We propose a method for visualizing and classifying malware using image processing
techniques. Malware binaries are visualized as gray-scale images, with the observation that
for many malware families, the images belonging to the same family appear very similar in
layout and texture. By converting the executable into an image representation, we have made
our analysis process free from the problems faced by standard static and dynamic analyses

## Dataset Used
For the training and evaluation of our proposed model we have used the Malimg Dataset. The Malimg Dataset contains 9349 malware images, belonging to 25 families/classes. Thus,
our goal is to perform a multi-class classification of malware.

Link - https://drive.google.com/drive/folders/1CnFx26NfWfQchIU85wRNfHjqfk7Up6hl?usp=sharing

A Malware can belong to one of the following class :
* Adialer.C
* Agent.FYI
* Allaple.A
* Allaple.L
* Alueron.gen!J
* Autorun.K
* C2LOP.P
* C2LOP.gen!g
* Dialplatform.B
* Dontovo.A
* Fakerean
* Instantaccess
* Lolyda.AA1
* Lolyda.AA2
* Lolyda.AA3
* Lolyda.AT
* Malex.gen!J
* Obfuscator.AD
* Rbot!gen
* Skintrim.N
* Swizzor.gen!E
* Swizzor.gen!I
* VB.AT
* Wintrim.BX
* Yuner.A

## Converting malware binaries to gray-scale images

To convert the binary files into gray scale images we make use of the hexadecimal representation of the file's binary content and convert those files
into PNG images. For example the resulting image after converting the **0ACDbR5M3ZhBJajygTuf.bytes** binary file into a **PNG**.


binary to gray scale

## CNN Model Architecture
CNN model includes following layers to make it perform feature and pattern extractions from images and help classify the malware family.
* Convolutional Layer : 30 filters, (3 * 3) kernel size
* Max Pooling Layer : (2 * 2) pool size
* Convolutional Layer : 15 filters, (3 * 3) kernel size
* Max Pooling Layer : (2 * 2) pool size
* DropOut Layer : Dropping 25% of neurons.
* Flatten Layer
* Dense/Fully Connected Layer : 128 neurons, Relu activation function
* DropOut Layer : Dropping 50% of neurons.
* Dense/Fully Connected Layer : 50 neurons, Softmax activation function
* Dense/Fully Connected Layer : num_class neurons, Softmax activation function

The input has a shape of **[64 * 64 * 3] : [width * height * depth]**. In our case, each Malware is
a RGB image.

## Block Diagram


Block Diagram

## Future Work
* Future work will be focused on conducting results using more advanced CNN models like Inception V3, VGG16-Net, ResNet50, CNN-SVM, MLP-SVM ,GRU-SVM etc.
* We also want to convert malware images into color RGB images before classification to enhance the accuracy and precision.
* We also want to implement a web based or GUI based tool to convert malware binary files into images and then classifying them.