https://github.com/taruchit/bits_dissertation

Binary and multi-class classifiers to classify benign and malicious events, by using nature inspired Heuristic algorithms for feature selection.
https://github.com/taruchit/bits_dissertation

artificial-bee-colony-algorithm data-science data-visualization exploratory-data-analysis feature-engineering feature-selection flower-pollination-algorithm k-nearest-neighbor-classifier machine-learning python recall

Last synced: 3 months ago
JSON representation

Binary and multi-class classifiers to classify benign and malicious events, by using nature inspired Heuristic algorithms for feature selection.

Host: GitHub
URL: https://github.com/taruchit/bits_dissertation
Owner: taruchit
Created: 2024-12-15T08:24:41.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-03-29T10:40:11.000Z (6 months ago)
Last Synced: 2025-03-29T11:30:47.555Z (6 months ago)
Topics: artificial-bee-colony-algorithm, data-science, data-visualization, exploratory-data-analysis, feature-engineering, feature-selection, flower-pollination-algorithm, k-nearest-neighbor-classifier, machine-learning, python, recall
Language: Jupyter Notebook
Homepage:
Size: 82.4 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # To Establish Baseline for Threat Detection

**1. Objective: -** To build binary and multi-class classifiers for differentiation between benign and malicious events, and type of malicious events over a large and imbalanced cybersecurity dataset.

**2. Feature selection: -** Nature inspired Heuristic alhgorithms were used




  2.1 Artificial Bee Colony optimization (ABC)




  2.2 Flower Pollination Algorithm (FPA)

**3. Machine Learning algorithm: -** KNN (K Nearest Neighbor)

   

**4. Objective function for feature selection: -** Recall - Penalty




  Recall: To maximize True positives and minimize False negatives




  Penalty: To reduce the number of features obtained out of feature selection

**5. Evaluation: -** 13 metrics were used to evaluate performance of classifiers

**6. Dataset: -** CIC dataset (Canada Institute of Cybersecurity)

**7. Scaling approaches: -** Two independent methods were used




   7.1 Standard Scaler




   7.2 Robust Scaler

**8. Results: -**




   8.1 When number of generations=5, models trained using FPA performed better than ABC.




   8.2 When number of generations=100, models trained using ABC performed better than FPA.




   8.3 Models trained with Standard scaler performed better than the models trained with Robust Scaler, irrespective of feature selection method and the number of generations.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/taruchit/bits_dissertation

Awesome Lists containing this project

README