Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/raj-tyagi/big-data-ip-anomaly-detection-bloomfilter-misragries
A scalable and efficient real-time IP anomaly detection system leveraging Bloom Filters and the Misra-Gries Algorithm to identify malicious traffic, detect heavy hitters, and dynamically update blacklists with minimal memory usage
https://github.com/raj-tyagi/big-data-ip-anomaly-detection-bloomfilter-misragries
algorithm big-data bloom-filter heavy-hitters ip-address-tracker memory-optimization misra-gries network-security probabilistic-datastructure python
Last synced: 9 days ago
JSON representation
A scalable and efficient real-time IP anomaly detection system leveraging Bloom Filters and the Misra-Gries Algorithm to identify malicious traffic, detect heavy hitters, and dynamically update blacklists with minimal memory usage
- Host: GitHub
- URL: https://github.com/raj-tyagi/big-data-ip-anomaly-detection-bloomfilter-misragries
- Owner: raj-tyagi
- License: mit
- Created: 2024-11-29T16:21:53.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-01-28T16:32:22.000Z (11 days ago)
- Last Synced: 2025-01-28T17:34:13.725Z (11 days ago)
- Topics: algorithm, big-data, bloom-filter, heavy-hitters, ip-address-tracker, memory-optimization, misra-gries, network-security, probabilistic-datastructure, python
- Language: Jupyter Notebook
- Homepage:
- Size: 119 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Big-Data-IP-Anomaly-Detection-BloomFilter-MisraGries
A scalable and efficient real-time IP anomaly detection system leveraging Bloom Filters and the Misra-Gries Algorithm to identify malicious traffic, detect heavy hitters, and dynamically update blacklists with minimal memory usage
Here's a structured and professional README template for your project, complete with placeholders for analysis graphs and detailed explanations of the **Bloom Filter** and **Misra-Gries Algorithm** concepts.## Overview
This project presents a scalable, efficient, and real-time approach to detecting anomalous IP addresses in network traffic. By combining **Bloom Filters** with the **Misra-Gries Algorithm**, we provide a memory-efficient solution capable of identifying suspicious IPs, heavy hitters, and dynamically updating blacklists. This approach is ideal for applications in cybersecurity, DDoS prevention, and enterprise network monitoring.
---
## Workflow of the Project
The project workflow systematically processes incoming IP addresses through a multi-stage detection pipeline:
1. **Initial IP Check**:
- Incoming IP addresses are first checked against the **Bad IP Bloom Filter**.
- If a match is found, the IP is immediately classified as **Prohibited** and restricted from entering the system.2. **Safe IP Check**:
- IPs that pass the Bad IP Bloom Filter are then checked against the **Good IP Bloom Filter**.
- If a match is found, the IP is considered **Safe**, and the system does not track its further entries to save memory and processing power.3. **Risky IP Tagging**:
- IPs that do not match either the Bad or Good Bloom Filter are classified as **Risky**.
- The system begins tracking their occurrences in the data stream.4. **Heavy Hitter Detection**:
- The **Misra-Gries Algorithm** is applied to the **Risky IPs** to count their occurrences and identify **Heavy Hitters** (frequently appearing IPs that exceed a predefined threshold).
- These heavy hitters are considered **Hazardous** and flagged for further action.5. **Blacklist Update**:
- Hazardous IPs identified as heavy hitters are dynamically added to the **Bad IP Bloom Filter**.
- This ensures that future occurrences of these IPs are immediately blocked, improving the system's adaptability and security.6. **Result Aggregation and Storage**:
- The system aggregates results into four classifications:
- **Prohibited IPs**: Detected in the Bad IP Bloom Filter.
- **Safe IPs**: Detected in the Good IP Bloom Filter.
- **Risky IPs**: Neither in the Good nor Bad filters, tracked for analysis.
- **Heavy Hitters**: Risky IPs identified by the Misra-Gries Algorithm as frequent offenders.
- The updated **Bad IP Bloom Filter** is persisted for future use.---
## Features
- **Real-Time IP Anomaly Detection**: Immediate identification of malicious IPs.
- **Scalable and Efficient**: Handles large datasets with minimal memory overhead.
- **Dynamic Blacklist Updates**: Ensures continuous threat adaptability.
- **Performance Benchmarking**: Demonstrates superior speed and memory efficiency compared to traditional methods.---
## Project Workflow
1. **Data Acquisition**: Simulated IP traffic is ingested as a continuous stream.
2. **Preprocessing**: Validates data and initializes Bloom Filters.
3. **IP Classification**:
- **Safe**: Found in Good IP Bloom Filter.
- **Risky**: Not found in any filter; analyzed further.
4. **Detection**:
- Identifies heavy hitters using the **Misra-Gries Algorithm**.
- Updates blacklist dynamically.
5. **Visualization**: Provides insights through comparison graphs.---
## Key Concepts
### Bloom Filter
A **Bloom Filter** is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. It can result in false positives but guarantees no false negatives.**Advantages**:
- **Memory Efficient**: Represents a large dataset compactly.
- **Fast Lookup**: Performs membership checks in constant time.**How it Works**:
1. Hash each input element using multiple hash functions.
2. Update bits in a fixed-size bit array corresponding to hash outputs.
3. To check membership, rehash the query and verify if all corresponding bits are set.---
### Misra-Gries Algorithm
The **Misra-Gries Algorithm** identifies frequent elements in a stream using a fixed amount of memory. It is well-suited for detecting "heavy hitters" under memory constraints.**Steps**:
1. Track a limited number of counters for unique elements.
2. Increment counters for elements in the set, or reduce all counters if full.
3. Extract elements exceeding a frequency threshold as heavy hitters.**Applications**:
- Identifying top traffic sources in network streams.
- Detecting DDoS attack patterns.---
## Analysis Graphs
### As Per Dataset
![image](https://github.com/user-attachments/assets/ffd68191-d870-47e7-8f8d-46f57631a77b)### 1. **Time Comparison**
This graph demonstrates the processing time advantage of our Bloom Filter-based system over traditional brute-force methods.![image](https://github.com/user-attachments/assets/80ac2e50-5dd9-41d9-97ff-ab887bf895c2)
### 2. **Memory Usage**
Illustrates the significantly reduced memory footprint of our approach compared to traditional systems.![image](https://github.com/user-attachments/assets/929a7611-4f48-40b2-94c4-7cab719348b7)
### 3. **Accuracy Comparison**
Highlights the accuracy trade-off (minor false positives) for the memory and speed benefits.![image](https://github.com/user-attachments/assets/dcf56960-0464-4969-ab56-fd3aa6553f08)
Here are the tables for direct copying:
### Feature Comparison: Traditional vs. Bloom Filter + Misra-Gries
| **Feature** | **Traditional Approach** | **Bloom Filter + Misra-Gries** |
|----------------------------------|---------------------------------|-------------------------------------|
| **Scalability** | Limited | High |
| **Memory Usage** | High | Low |
| **Processing Time** | Longer | Faster |
| **False Positives** | None | Minor (tunable error rate) |
| **Real-Time Detection** | No | Yes |
| **Blacklist Updates** | Static | Dynamic |---
### Performance Metrics
| **Metric** | **Traditional Approach** | **Bloom Filter Approach** |
|----------------------------------|---------------------------------|------------------------------------|
| **Processing Time** | 0.04 seconds | 0.03 seconds |
| **Memory Usage** | 17 KB | 14 KB |
| **Detection Accuracy** | 98% | 92% |Feel free to use or adapt these!
---
## Getting Started
### Prerequisites
- Python 3.10 or higher
- Libraries: `pandas`, `matplotlib`, `pybloom-live`### Installation
1. Clone the repository:
```bash
git clone https://github.com/username/repo-name.git
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the project:
```bash
python main.py
```### Usage
- Replace the dataset file (`IP_dataset.csv`) with your own network traffic data.
- Adjust parameters like `BLOOM_FILTER_SIZE` and `ERROR_RATE` in the configuration.---
## Results
- **Processing Time**: 0.03 seconds (Bloom Filter) vs. 0.04 seconds (Traditional).
- **Memory Usage**: 14 KB (Bloom Filter) vs. 17 KB (Traditional).
- **Accuracy**: 92% (Bloom Filter) vs. 98% (Traditional).---
## Contribution Guidelines
We welcome contributions! Please follow these steps:
1. Fork the repository.
2. Create a new branch: `git checkout -b feature-name`.
3. Commit changes: `git commit -m "Description of changes"`.
4. Push to the branch: `git push origin feature-name`.
5. Open a pull request.---
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.---
## Contact
For any inquiries or feedback, please contact the project maintainer:
https://www.linkedin.com/in/raj-tyagi-83765b21b/