https://github.com/vidhi1290/malware-detection

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files 🔥🔐
https://github.com/vidhi1290/malware-detection

clustering-algorithm cybersecurity hierarchical-clustering k-means-clustering machine-learning malware-detection python silhouette

Last synced: 3 months ago
JSON representation

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files 🔥🔐

Host: GitHub
URL: https://github.com/vidhi1290/malware-detection
Owner: Vidhi1290
Created: 2023-09-16T11:53:38.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-09-16T12:01:34.000Z (almost 2 years ago)
Last Synced: 2025-02-02T18:33:26.205Z (5 months ago)
Topics: clustering-algorithm, cybersecurity, hierarchical-clustering, k-means-clustering, machine-learning, malware-detection, python, silhouette
Language: Jupyter Notebook
Homepage:
Size: 12.7 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Malicious Executable Detection using Cluster Analysis 📊

Welcome to the Malicious Executable Detection project! This repository explores the world of machine learning and clustering analysis to detect malicious executable files. 🔍🤖

## Problem Statement 🎯
In an era where cyber warfare is on the rise, detecting malicious code has become crucial. This project aims to develop a machine learning approach to identify malicious executable files. 💻🦠

## Understanding the Data and Attributes 📚
The dataset contains features extracted from both malicious and non-malicious Windows executable files. It includes a total of 373 samples, with 301 being malicious and 72 non-malicious files. The dataset is imbalanced, with 531 features represented as F1, F2, and so on, and a label column indicating whether the file is malicious or non-malicious. 📈🧐

## Data Preparation 🛠️
- **Imputation**: Rows and columns with missing data exceeding 70% are removed. 🧹
- **Feature Selection**: Relevant features are chosen for analysis. 🎯
- **Data Standardization**: Standardization is applied to make the data suitable for clustering. 📊

## K-Means Clustering 📈
K-Means clustering is applied to group similar instances together. The Silhouette method is used to determine the optimal number of clusters. 🧩

## Silhouette Analysis 📊
Silhouette analysis helps evaluate the quality of clustering. A higher silhouette score indicates better clustering. 📈🔍

## Cluster Stability Check 🔒
Cluster stability is assessed by comparing clusters with and without random sampling of data. 🔄

## Categorizing New Samples 🆕
The model is used to predict clusters for new executable files. 📋

## Learning Outcomes 📚
- Implementing cluster analysis in Python
- Pre-processing data for analysis
- Hierarchical clustering and dendrogram visualization
- Implementing K-Means clustering
- Determining the optimal number of clusters
- Cluster stability evaluation
- Predicting clusters for new samples

Feel free to explore the notebooks and the code to dive deeper into the analysis!

## Kaggle Notebook 📊
You can also view this project on [Kaggle](#Kaggle). 📑

## Open in Colab 🚀
Want to run the notebooks in Google Colab? Click [here](#Open-In-Colab) to open them directly! 💡

## Connect with Us 🌐
Join our community and stay updated on our latest projects:

- 🌐 [GitHub](https://github.com/Vidhi1290)
- 🔗 [LinkedIn](https://www.linkedin.com/in/vidhi-waghela-434663198/)
- 🐦 [Twitter](https://twitter.com/VidhiWaghela)
- 📝 [Medium](https://medium.com/@datasciencemeetscybersecurity)

Happy coding! 👩‍💻👨‍💻

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vidhi1290/malware-detection

Awesome Lists containing this project

README