https://github.com/taruchit/bits_dissertation
Binary and multi-class classifiers to classify benign and malicious events, by using nature inspired Heuristic algorithms for feature selection.
https://github.com/taruchit/bits_dissertation
artificial-bee-colony-algorithm data-science data-visualization exploratory-data-analysis feature-engineering feature-selection flower-pollination-algorithm k-nearest-neighbor-classifier machine-learning python recall
Last synced: 3 months ago
JSON representation
Binary and multi-class classifiers to classify benign and malicious events, by using nature inspired Heuristic algorithms for feature selection.
- Host: GitHub
- URL: https://github.com/taruchit/bits_dissertation
- Owner: taruchit
- Created: 2024-12-15T08:24:41.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-03-29T10:40:11.000Z (3 months ago)
- Last Synced: 2025-03-29T11:30:47.555Z (3 months ago)
- Topics: artificial-bee-colony-algorithm, data-science, data-visualization, exploratory-data-analysis, feature-engineering, feature-selection, flower-pollination-algorithm, k-nearest-neighbor-classifier, machine-learning, python, recall
- Language: Jupyter Notebook
- Homepage:
- Size: 82.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# To Establish Baseline for Threat Detection
**1. Objective: -** To build binary and multi-class classifiers for differentiation between benign and malicious events, and type of malicious events over a large and imbalanced cybersecurity dataset.
**2. Feature selection: -** Nature inspired Heuristic alhgorithms were used
2.1 Artificial Bee Colony optimization (ABC)
2.2 Flower Pollination Algorithm (FPA)**3. Machine Learning algorithm: -** KNN (K Nearest Neighbor)
**4. Objective function for feature selection: -** Recall - Penalty
Recall: To maximize True positives and minimize False negatives
Penalty: To reduce the number of features obtained out of feature selection**5. Evaluation: -** 13 metrics were used to evaluate performance of classifiers
**6. Dataset: -** CIC dataset (Canada Institute of Cybersecurity)
**7. Scaling approaches: -** Two independent methods were used
7.1 Standard Scaler
7.2 Robust Scaler**8. Results: -**
8.1 When number of generations=5, models trained using FPA performed better than ABC.
8.2 When number of generations=100, models trained using ABC performed better than FPA.
8.3 Models trained with Standard scaler performed better than the models trained with Robust Scaler, irrespective of feature selection method and the number of generations.