https://github.com/petridhsg/firewall-data-classification

A single implementation of a machine learning algorithm for a firewall data classification task
https://github.com/petridhsg/firewall-data-classification

machine-learning matplotlib numpy python seaborn

Last synced: 3 months ago
JSON representation

A single implementation of a machine learning algorithm for a firewall data classification task

Host: GitHub
URL: https://github.com/petridhsg/firewall-data-classification
Owner: PetridhsG
Created: 2025-02-14T20:16:56.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-02-14T20:38:56.000Z (12 months ago)
Last Synced: 2025-02-14T21:31:12.367Z (12 months ago)
Topics: machine-learning, matplotlib, numpy, python, seaborn
Language: Jupyter Notebook
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Internet Firewall Data Classification

## Overview
This project applies several machine learning algorithms to classify internet firewall data into different action categories. The dataset used for this classification task comes from the [Internet Firewall Data](https://archive.ics.uci.edu/dataset/542/internet+firewall+data) repository.

### Objective
The goal of this project is to implement and evaluate commonly used machine learning algorithms on a multi-class classification problem. By analyzing network traffic attributes, we aim to distinguish between different firewall actions, enhancing network security decision-making.

## Machine Learning Algorithms Implemented
This project explores and implements the following machine learning techniques:

1. **Principal Component Analysis (PCA)** - Used for dimensionality reduction.
2. **Least Squares Classification** - A simple linear classification approach.
3. **Logistic Regression** - A probabilistic model for binary and multi-class classification.
4. **K-Nearest Neighbors (KNN)** - A distance-based classification method.
5. **Naïve Bayes** - A probabilistic classifier based on Bayes' theorem.
6. **Multilayer Perceptron (MLP)** - A feedforward neural network model.
7. **Support Vector Machines (SVM)** - A powerful classification method using hyperplanes.
8. **K-Means** - A clustering algorithm to identify patterns in the data.

Each algorithm is tested on the firewall dataset to evaluate its performance in classifying network traffic behavior.

## Dataset Description
The dataset consists of 12 features, with the **'Action'** feature representing the target variable. Below is the description of each feature:

| Variable Name | Description |
|------------------------|----------------------------------------|
| Source Port | Sender's initiating port. |
| Destination Port | Receiver's target port. |
| NAT Source Port | Sender's port after NAT. |
| NAT Destination Port | Receiver's port after NAT. |
| Bytes | Packet size in bytes. |
| Bytes Sent | Bytes sent by the sender. |
| Bytes Received | Bytes received by the receiver. |
| Packets | Total packets transmitted. |
| Elapsed Time (sec) | Duration of communication. |
| pkts_sent | Packets sent by the sender. |
| pkts_received | Packets received by the receiver. |
| Action | Class label (e.g., allow, block, etc.).|

### Classification Task
The goal is to classify each network traffic observation into one of the following four classes:

- **allow**
- **deny**
- **drop**
- **reset-both**

Each record belongs to only one of these classes. The classification models are evaluated based on their accuracy and ability to generalize to unseen data.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/petridhsg/firewall-data-classification

Awesome Lists containing this project

README