https://github.com/sam120204/ddos-attacks-classification

Distributed Denial of Service (DDoS) attacks are significant threats to the stability and reliability of online services. Detecting and mitigating these attacks is crucial to maintaining the integrity of networks and services. This project focuses on classifying DDoS attacks using various machine learning models.
https://github.com/sam120204/ddos-attacks-classification

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/sam120204/ddos-attacks-classification
Owner: Sam120204
Created: 2024-07-13T18:17:22.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-13T20:00:56.000Z (over 1 year ago)
Last Synced: 2025-06-13T20:07:14.586Z (7 months ago)
Language: Jupyter Notebook
Homepage:
Size: 6.48 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # DDoS Attack Detection Using Machine Learning

## Introduction

Distributed Denial of Service (DDoS) attacks are significant threats to the stability and reliability of online services. Detecting and mitigating these attacks is crucial to maintaining the integrity of networks and services. This project focuses on classifying DDoS attacks using various machine learning models. The dataset used for this project is the IDS 2017 dataset, which is publicly available and provides a comprehensive set of features for detecting DDoS attacks.

The project involves several key steps: data preprocessing, exploration, splitting, model training, evaluation, and comparison. Each step is crucial to building an effective DDoS detection model. We employ multiple machine learning algorithms, including Random Forest, Logistic Regression, and Neural Networks, to classify the attacks and evaluate their performance using various metrics.

## Table of Contents

1. [Importing Libraries](#1-importing-libraries)

2. [Data Pre-processing](#2-data-pre-processing)

3. [Data Exploring](#3-data-exploring)

4. [Data Splitting](#4-data-splitting)

5. [Model Training](#5-model-training)

    - [Random Forest](#random-forest)

    - [Logistic Regression](#logistic-regression)

    - [Neural Network](#neural-network)

6. [Model Evaluation](#6-model-evaluation)

    - [Accuracy](#accuracy)

    - [F1 Score](#f1-score)

    - [Recall](#recall)

    - [Precision](#precision)

    - [Confusion Matrix](#confusion-matrix)

7. [Model Comparison](#7-model-comparison)

## 1. Importing Libraries

This chapter covers the importation of essential libraries used for data manipulation, visualization, model training, and evaluation. Libraries such as Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn are utilized.

```python

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score, confusion_matrix, roc_curve, auc

```

## 2. Data Pre-processing

This section involves preparing the data for analysis by cleaning and transforming it. Steps include handling missing values, converting categorical labels to numerical values, and ensuring data types are appropriate for analysis.

```python

# Example code for data pre-processing

df = pd.read_csv("DDoS.csv")

df.replace([np.inf, -np.inf], np.nan, inplace=True)

df.dropna(inplace=True)

# Convert categorical labels to numerical values if necessary

df['Label'] = df['Label'].map({'BENIGN': 0, 'DDoS': 1})

```

## 3. Data Exploring

Data exploration involves generating descriptive statistics and visualizations to understand the distribution and relationships within the dataset. This step helps in identifying important features and potential issues with the data.

```python

# Example code for data exploration

print(df.describe())

sns.pairplot(df)

plt.show()

```

## 4. Data Splitting

In this chapter, the data is split into training and testing sets. This step is crucial for evaluating the model's performance on unseen data, ensuring that the model generalizes well.

```python

X = df.drop('Label', axis=1)

y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

```

## 5. Model Training

This section covers the training of different machine learning models:

### Random Forest

An ensemble method that uses multiple decision trees to improve predictive accuracy.

```python

# Random Forest

rf_model = RandomForestClassifier(n_estimators=50, random_state=42)

rf_model.fit(X_train, y_train)

rf_pred = rf_model.predict(X_test)

```

### Logistic Regression

A statistical model used for binary classification.

```python

# Logistic Regression

lr_model = LogisticRegression()

lr_model.fit(X_train, y_train)

lr_pred = lr_model.predict(X_test)

```

### Neural Network

A computational model inspired by the human brain, capable of capturing complex patterns in the data.

```python

# Neural Network

nn_model = MLPClassifier(hidden_layer_sizes=(50,), max_iter=1000, random_state=42)

nn_model.fit(X_train, y_train)

nn_pred = nn_model.predict(X_test)

```

## 6. Model Evaluation

The trained models are evaluated using various metrics:

### Accuracy

The proportion of correctly predicted instances.

```python

print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))

print("Logistic Regression Accuracy:", accuracy_score(y_test, lr_pred))

print("Neural Network Accuracy:", accuracy_score(y_test, nn_pred))

```

### F1 Score

The harmonic mean of precision and recall, useful for imbalanced datasets.

```python

print("Random Forest F1 Score:", f1_score(y_test, rf_pred))

print("Logistic Regression F1 Score:", f1_score(y_test, lr_pred))

print("Neural Network F1 Score:", f1_score(y_test, nn_pred))

```

### Recall

The ability of the model to identify all relevant instances.

```python

print("Random Forest Recall:", recall_score(y_test, rf_pred))

print("Logistic Regression Recall:", recall_score(y_test, lr_pred))

print("Neural Network Recall:", recall_score(y_test, nn_pred))

```

### Precision

The accuracy of the positive predictions.

```python

print("Random Forest Precision:", precision_score(y_test, rf_pred))

print("Logistic Regression Precision:", precision_score(y_test, lr_pred))

print("Neural Network Precision:", precision_score(y_test, nn_pred))

```

### Confusion Matrix

A table that describes the performance of the classification model.

```python

print("Random Forest Confusion Matrix:\n", confusion_matrix(y_test, rf_pred))

print("Logistic Regression Confusion Matrix:\n", confusion_matrix(y_test, lr_pred))

print("Neural Network Confusion Matrix:\n", confusion_matrix(y_test, nn_pred))

```

## 7. Model Comparison

In this chapter, the performance of the different models is compared using ROC curves and AUC scores. This comparison helps in identifying the best-performing model for DDoS attack classification.

```python

# Example code for ROC curve and AUC

rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_model.predict_proba(X_test)[:,1])

lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_model.predict_proba(X_test)[:,1])

nn_fpr, nn_tpr, _ = roc_curve(y_test, nn_model.predict_proba(X_test)[:,1])

plt.figure()

plt.plot(rf_fpr, rf_tpr, label='Random Forest (AUC = %0.2f)' % auc(rf_fpr, rf_tpr))

plt.plot(lr_fpr, lr_tpr, label='Logistic Regression (AUC = %0.2f)' % auc(lr_fpr, lr_tpr))

plt.plot(nn_fpr, nn_tpr, label='Neural Network (AUC = %0.2f)' % auc(nn_fpr, nn_tpr))

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive Rate')

plt.title('Receiver Operating Characteristic')

plt.legend(loc='lower right')

plt.show()

```

## Conclusion

The project systematically addresses the detection and classification of DDoS attacks using multiple machine learning models. By following the structured approach outlined in the chapters, we aim to build a robust model that can effectively distinguish between benign and malicious network traffic.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sam120204/ddos-attacks-classification

Awesome Lists containing this project

README