https://github.com/tanaybhadula/malware-image-detection

A deep learning project which uses a method that converts malware .bytes files into gray-scale images and uses a CNN deep learning model to classify the converted malware image and identify the malware family it belongs to.
https://github.com/tanaybhadula/malware-image-detection

classification cnn cybersecurity deep-learning keras machine-learning malware python scipy tensorflow

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/tanaybhadula/malware-image-detection
Owner: TanayBhadula
License: mit
Created: 2022-10-06T14:01:10.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-10-06T20:46:16.000Z (over 2 years ago)
Last Synced: 2025-03-18T20:06:11.016Z (3 months ago)
Topics: classification, cnn, cybersecurity, deep-learning, keras, machine-learning, malware, python, scipy, tensorflow
Language: Jupyter Notebook
Homepage:
Size: 5.16 MB
Stars: 24
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        
# :desktop_computer: Image-based Malware Classification using CNN

## Introduction

Analyzing a huge amount of malware is a major burden for security analysts.Malware

developers have been highly successful in evading signature-based detection techniques.

Most of the prevailing static analysis techniques involve a tool to parse the executable, and

extract features or signatures. Most of the dynamic analysis techniques involve the binary file

to be run in a sand-boxed environment to examine its behaviour. This can be easily thwarted

by hiding the malicious activities of the file if it is being run inside a virtual environment.

Hence, there has been a need to explore new approaches to overcome the limitations of static

or dynamic analysis such as time intensity, resource consumption, scalability.

We propose a method for visualizing and classifying malware using image processing

techniques. Malware binaries are visualized as gray-scale images, with the observation that

for many malware families, the images belonging to the same family appear very similar in

layout and texture. By converting the executable into an image representation, we have made

our analysis process free from the problems faced by standard static and dynamic analyses

## Dataset Used

For the training and evaluation of our proposed model we have used the Malimg Dataset. The Malimg Dataset contains 9349 malware images, belonging to 25 families/classes. Thus,

our goal is to perform a multi-class classification of malware.

Link - https://drive.google.com/drive/folders/1CnFx26NfWfQchIU85wRNfHjqfk7Up6hl?usp=sharing

A Malware can belong to one of the following class : 

 * Adialer.C

 * Agent.FYI

 * Allaple.A

 * Allaple.L

 * Alueron.gen!J

 * Autorun.K

 * C2LOP.P

 * C2LOP.gen!g

 * Dialplatform.B

 * Dontovo.A

 * Fakerean

 * Instantaccess

 * Lolyda.AA1

 * Lolyda.AA2

 * Lolyda.AA3

 * Lolyda.AT

 * Malex.gen!J

 * Obfuscator.AD

 * Rbot!gen

 * Skintrim.N

 * Swizzor.gen!E

 * Swizzor.gen!I

 * VB.AT

 * Wintrim.BX

 * Yuner.A

 ## Converting malware binaries to gray-scale images 

To convert the binary files into gray scale images we make use of the hexadecimal representation of the file's binary content and convert those files

into PNG images. For example the resulting image after converting the **0ACDbR5M3ZhBJajygTuf.bytes** binary file into a **PNG**.



    



## CNN Model Architecture

CNN model includes following layers to make it perform feature and pattern extractions from images and help classify the malware family.

 * Convolutional Layer : 30 filters, (3 * 3) kernel size

 * Max Pooling Layer : (2 * 2) pool size

 * Convolutional Layer : 15 filters, (3 * 3) kernel size

 * Max Pooling Layer : (2 * 2) pool size

 * DropOut Layer : Dropping 25% of neurons.

 * Flatten Layer

 * Dense/Fully Connected Layer : 128 neurons, Relu activation function

 * DropOut Layer : Dropping 50% of neurons.

 * Dense/Fully Connected Layer : 50 neurons, Softmax activation function

 * Dense/Fully Connected Layer : num_class neurons, Softmax activation function

The input has a shape of **[64 * 64 * 3] : [width * height * depth]**. In our case, each Malware is

a RGB image.

 

## Block Diagram

 


    



## Future Work

* Future work will be focused on conducting results using more advanced CNN models like Inception V3, VGG16-Net, ResNet50, CNN-SVM, MLP-SVM ,GRU-SVM etc.

* We also want to convert malware images into color RGB images before classification to enhance the accuracy and precision.

* We also want to implement a web based or GUI based tool to convert malware binary files into images and then classifying them.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tanaybhadula/malware-image-detection

Awesome Lists containing this project

README