Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ate329/nsl-kdd-feature-extractor
Python-based tool designed to process network traffic packets and extract features compliant with the NSL-KDD dataset format.
https://github.com/ate329/nsl-kdd-feature-extractor
cyber-security cybersecurity data data-science extractor feature-extraction machine-learning network-analysis nsl-kdd nsl-kdd-dataset
Last synced: about 1 month ago
JSON representation
Python-based tool designed to process network traffic packets and extract features compliant with the NSL-KDD dataset format.
- Host: GitHub
- URL: https://github.com/ate329/nsl-kdd-feature-extractor
- Owner: Ate329
- License: mit
- Created: 2024-11-16T12:47:31.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-16T13:09:07.000Z (about 2 months ago)
- Last Synced: 2024-11-16T14:19:24.470Z (about 2 months ago)
- Topics: cyber-security, cybersecurity, data, data-science, extractor, feature-extraction, machine-learning, network-analysis, nsl-kdd, nsl-kdd-dataset
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NSL-KDD Feature Extractor
## **Introduction**
The **NSL-KDD Feature Extractor** is a Python-based tool designed to process network traffic packets and extract features compliant with the NSL-KDD dataset format. It enables researchers and developers to analyze network traffic and apply machine learning models for intrusion detection, anomaly detection, or other cybersecurity applications.
Dataset used for testing: https://www.kaggle.com/datasets/hassan06/nslkdd/data
## **Features**
1. **Packet Analysis**
- Supports live packet capture using `scapy`.
- Processes TCP, UDP, ICMP, ARP, and DNS packets.2. **Feature Extraction**
- Generates NSL-KDD dataset-compatible features for machine learning.
- Includes connection-based and statistical features such as `same_srv_rate`, `srv_serror_rate`, and more.3. **Customizable and Scalable**
- Easily extendable for new protocols or custom features.
- Handles both live traffic and offline packet captures.4. **Internal Traffic Filtering**
- Option to exclude internal traffic during feature extraction.## **How It Works**
### **Workflow Diagram**
```
+------------------+
| Network Traffic|
+------------------+
|
v
+-------------------------------+
| Packet Capturing |
| (Using Scapy Framework) |
+-------------------------------+
|
v
+----------------------------------------+
| NSL-KDD Feature Extraction |
| (network_feature_extractor.py) |
+----------------------------------------+
|
v
+----------------------------------------+
| Generated Feature Set |
| - Duration, Protocol Type, Service |
| - Flag, Src Bytes, Dst Bytes |
| - Statistical Features (e.g., |
| srv_serror_rate, same_srv_rate) |
+----------------------------------------+
```## **Setup**
### **Prerequisites**
- **Python 3.11** or later
- **Scapy** for packet capture
- **Pandas** for data manipulation### **Installation**
1. Clone the repository:
```
git clone https://github.com/Ate329/NSL-KDD-feature-extractor.git
cd nsl-kdd-feature-extractor
```2. Install required dependencies:
```
pip install -r requirements.txt
```## **Usage**
### **1. Extracting Features**
```python
from network_feature_extractor import NetworkFeatureExtractor# Initialize the extractor
extractor = NetworkFeatureExtractor(interface="eth0", timeout=60)# Capture live traffic and extract features
def process_packet(packet):
features = extractor.extract_features(packet)
if features:
print(features)extractor.start_capture(callback=process_packet)
```### **2. Example Output**
Extracted features will include:
```json
{
"duration": 1.23,
"protocol_type": "tcp",
"service": "http",
"flag": "SF",
"src_bytes": 345,
"dst_bytes": 512,
"same_srv_rate": 0.75,
"srv_serror_rate": 0.0,
...
}
```## **Customization**
1. **Add New Features**:
- Extend the `extract_features()` method to compute additional metrics.2. **Handle Custom Protocols**:
- Add specific processing for protocols like DNS or HTTP in `_extract_ip_features()` or `_extract_arp_features()`.3. **Exclude Internal Traffic**:
- Enable internal traffic detection using the `detect_internal=True` parameter.## **Development Notes**
- This feature extractor aligns with the NSL-KDD dataset specification, enabling seamless integration with machine learning models trained on similar datasets.
- The modular structure makes it adaptable for other datasets or real-world scenarios.## **Contributing**
We welcome contributions! If you’d like to extend the functionality or report a bug, feel free to submit a pull request or open an issue.
## **License**
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.