https://github.com/dmtkfs/ics-modbus-anomaly-detection

Unified heuristics + machine learning framework for detecting Modbus/TCP anomalies in industrial control systems. Implements our Evaluation Integrity Protocol (EIP) for dataset, metrics and reproducibility consistency.
https://github.com/dmtkfs/ics-modbus-anomaly-detection

anomaly-detection heuristics ics-cybersecurity ids-algorithm machine-learning modbus

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/dmtkfs/ics-modbus-anomaly-detection
Owner: dmtkfs
License: mit
Created: 2025-09-23T18:26:30.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-10-23T01:35:46.000Z (8 months ago)
Last Synced: 2025-10-23T02:41:22.564Z (8 months ago)
Topics: anomaly-detection, heuristics, ics-cybersecurity, ids-algorithm, machine-learning, modbus
Language: Python
Homepage:
Size: 1.01 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ICS Modbus Anomaly Detection
![EIP Audit](https://github.com/dmtkfs/ics-modbus-anomaly-detection/actions/workflows/eip-audit.yml/badge.svg)

**Baseline intrusion-detection framework for Industrial Control Systems (ICS) using Modbus/TCP traffic.**
Implements two complementary detection layers — **rule-based heuristics** and **machine learning baselines** — unified by a strict **Evaluation Integrity Protocol (EIP)** that guarantees reproducibility, dataset consistency and comparable metrics.

## Overview

This project analyzes the **CIC Modbus 2023 dataset** to detect anomalous behavior in industrial network traffic.

* **Heuristic detectors** provide interpretable, lightweight rule checks
* **Machine learning models** (Logistic Regression, Random Forest, Isolation Forest) provide adaptive statistical detection
* Both layers share the same dataset, schema, metrics and seed under the **EIP** standard
* A **PowerShell script** automates end-to-end evaluation for reproducibility

## Repository Structure

```
ics-modbus-anomaly-detection/
│
├── .github/
│ └── workflows/
│ └── eip-audit.yml
│
├── configs/
│ ├── dataset.yaml
│ └── ml.yaml
│
├── docs/
│ ├── appendix_ml_final_run.md
│ ├── EIP_Checklist.md
│ └── Evaluation_Integ
│
├── figures/
│ └── ml/
│ └── .gitkeep
│
├── results/
│ └── ml/
│ └── .gitkeep
│
├── scripts/
│ ├── __init__.py
│ ├── aggregate_phase3_metrics.py
│ ├── compute_checksum.py
│ ├── eip_audit.py
│ ├── proc_dataset_audit.py
│ ├── run_baselines.py
│ ├── run_calibration.py
│ ├── run_calibration_balanced.py
│ ├── run_final_ml.ps1
│ ├── run_loao.py
│ ├── run_loao_ml.py
│ ├── run_loao_ml_balanced.py
│ ├── smoke_dataset.py
│ └── smoke_heuristics.py
│
├── src/
│ ├── ml/
│ │ ├── balanced.py
│ │ └── calibration.py
│ ├── utils/
│ │ ├── data_prep.py
│ │ ├── metrics.py
│ │ ├── ml_data_prep.py
│ │ └── plot_utils.py
│ ├── heuristics.py
│ └── __init__.py
│
├── .gitignore
├── LICENSE
├── matplotlibrc
├── requirements.txt
└── README.md
``` # GitHub Actions CI audit enforcing EIP # Dataset path, SHA-256, schema, label map # ML configuration (features, labels, seed) # Final Phase III ML notes (artifacts & metrics) # Tick-before-merge reproducibility checklist rity_Protocol.md # Full EIP specification # Placeholder (figures generated locally) # Placeholder (CSV results generated locally) # Aggregates calibration + LOAO outputs # Computes and pins dataset SHA-256 # Validates schema, checksum, matplotlibrc # Optional preprocessing audit # Trains LR/RF/IF baselines (80/20 split) # Legacy calibrator (unbalanced) # Final constrained calibration (balanced) # Full PowerShell pipeline (audit→train→LOAO→aggregate) # Simple LOAO prototype # ML-only LOAO (legacy) # Balanced LOAO for LR/RF/IF (Phase III) # Dataset presence & schema sanity check # Quick heuristics dry-run on subset # Class balancing and tree growth logic # Calibration sweep & constraint selection # Dataset/config loaders, checksum utilities # Metric computation & CSV writer # ML-specific data preparation helpers # Standardized figure styling # Implements H1/H2F detectors # Excludes data/, cache, and local artifacts # Open license declaration # Unified plotting style (DPI, fonts) # Stable dependencies (NumPy, Pandas, etc.)

## Evaluation Integrity Protocol (EIP)

EIP enforces **reproducibility and comparability** across all runs.

| Standard | Description |
| -------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| **Dataset identity** | `data/processed/master.csv` pinned via SHA-256 in `configs/dataset.yaml` |
| **Schema** | 10 columns – `[Time, Source, Destination, Length, Source Port, Destination Port, Function Code, Label, Attack Family, FunctionCodeNum]` |
| **Labels** | `Attack = 1`, `Benign = 0` |
| **Families order** | `[External, Compromised-IED, Compromised-SCADA]` |
| **Random seed** | 42 |
| **Metrics** | Precision, Recall, F1 (+ ROC-AUC / PR-AUC for ML) |
| **Figures** | DPI 300, standard fonts per `matplotlibrc` |
| **Audit** | `python -m scripts.eip_audit` → **“ALL GREEN”** before merge |

A lightweight version of this audit runs automatically in **GitHub Actions** for every push or pull request.

## How to Run

### 1. Dataset Checksum & Audit

```bash
python -m scripts.compute_checksum # write SHA-256 into configs/dataset.yaml
python -m scripts.eip_audit # full integrity check
```

### 2. Heuristic Detection

```bash
python -m src.heuristics
```

Generates:

* `results/heuristics_metrics.csv`
* `figures/heuristics/confusion_combined.png`
* `figures/heuristics/performance_comparison.png`
* `figures/heuristics/recall_by_attack_family.png`

Executes H1 (Write Rate Spike) and H2 (Function Code + Role Anomaly) in ~5 minutes on standard CPU.

### 3. Machine-Learning Baselines

Train baseline models (80/20 split):

```bash
python -m scripts.run_baselines
```

Calibrate thresholds and LOAO (Leave-One-Attack-Out) evaluation:

```bash
python -m scripts.run_calibration_balanced
python -m scripts.run_loao_ml_balanced
python -m scripts.aggregate_phase3_metrics
```

### 4. Fully Automated ML Pipeline (PowerShell)

Run every step under EIP control:

```powershell
.\run_final_ml.ps1
```

Performs:
Audit → Baselines → Balanced calibration → LOAO (simple + balanced) → Aggregate → Light audit
Outputs stored in `results/ml/final_/` and `figures/ml/final_/`.

## Key Findings (Shortened)

| Detector | Precision | Recall | F1 | Notes |
| -------------------------------------- | --------- | ------ | ----- | ----------------------------------- |
| **H1: Write-Rate Spike** | 0.948 | 0.866 | 0.905 | Detects write surges |
| **H2: Function-Code & Role Anomaly** | 1.000 | 0.306 | 0.469 | Flags mixed-role clients |
| **Combined (H1 ∨ H2)** | 0.948 | 0.866 | 0.905 | Balanced precision-recall |
| **Logistic Regression (80/20)** | 0.955 | 0.462 | 0.623 | Supervised baseline |
| **Random Forest (80/20)** | 0.962 | 0.305 | 0.463 | Tree-based baseline |
| **Isolation Forest (unsupervised)** | 0.948 | 0.786 | 0.860 | Generalizes best to unseen families |

**Interpretation:** Heuristics excel in precision and clarity, ML extends coverage to novel patterns. Both combined offer a reproducible baseline for ICS intrusion detection.

## Continuous Integration (CI)

GitHub Actions workflow `.github/workflows/eip-audit.yml` performs a **light EIP audit** on each push/PR:

* verifies config files, schema fields, and matplotlib setup
* ensures dataset checksum present
* blocks merge if audit fails

Full audits can be run locally with:

```bash
python -m scripts.eip_audit --full
```

## Dataset Reference

Canadian Institute for Cybersecurity (CIC).
*Modbus 2023 Dataset.*
[https://www.unb.ca/cic/datasets/modbus-2023.html](https://www.unb.ca/cic/datasets/modbus-2023.html)

Raw PCAPs and the merged `master.csv` are excluded from the repo for size and license reasons.

## Acknowledgements

Developed as part of **INSE 6640 - Smart Grids and Control System Security**, Concordia University (2025).

All processing and evaluations follow the Evaluation Integrity Protocol (EIP) to ensure reproducibility and cross-phase consistency.

The complete final report and executive summary are available upon request.

## How to Cite

If you use this repository or its evaluation framework in academic or research work, please cite it as:

> **Baseline Anomaly Detection for ICS Modbus Traffic: Heuristics vs Machine Learning under Leave-One-Attack-Out Evaluation**,
> *Concordia University - INSE 6640: Smart Grids and Control System Security*, 2025.
> Available at: [https://github.com/dmtkfs/ics-modbus-anomaly-detection](https://github.com/dmtkfs/ics-modbus-anomaly-detection)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmtkfs/ics-modbus-anomaly-detection

Awesome Lists containing this project

README